ontologizing
[Top] [All Lists]

[ontologizing] Re: [OntologTaxoThesaurus] Status of the inventory

To: Ontologizing-Ontolog <ontologizing@xxxxxxxxxxxxxxxx>, Denise Bedford <dbedford@xxxxxxxxxxxxx>, Bob Smith <Bob@xxxxxxxxxxxxxx>
From: "Peter P. Yim" <peter.yim@xxxxxxxx>
Date: Fri, 05 May 2006 11:41:40 -0700
Message-id: <445B9C64.4080004@xxxxxxxx>
Hi Denise & Bob,    (01)

This is exciting! Some thoughts ...    (02)

1. I agree with Denise, if you mine 6-levels deeps, but restrict 
the mining to within the our domain*, then there's probably not 
much more to gain out of the crawl.    (03)

The CWE infrastructure design is inspired by Engelbart's notion 
of an OHS (Open Hyperdocument System). Where one could treat all 
knowledge objects as being interlinked into a 'huge' 
hyperdocument, with the hyperlink being the integration point(s). 
Since its purpose is to augment the human, the most 'interesting' 
things should always be one or two (say three) click away. 
Therefore, keeping the crawl shallow is both efficient and 
effective (much less noisy.)    (04)

I wonder if you can set it to inventory ONE (or, at most, Two) 
level(s) of content that is outside of our (cim3.net) domain? 
[Disclaimer: one probably has to look into IPR issues too, even 
if we do this.]    (05)

2. *Depending on the scope of this project (someone has to make a 
call here), we should decide whether we want to restrict the 
inventory to the domain of (a) <ontolog.cim3.net>, or (b) 
<cim3.net>.    (06)

I have good things to say about each of these two options. This 
is quite similar to how we needed to decide on the search corpus 
for our google search - see http://ontolog.cim3.net - where we 
ended up offering both.    (07)

3. one thing that is special about the wiki is that it keeps an 
entire version history of any page. (see, for example, the 
"UpperOntologySummit" page 
http://ontolog.cim3.net/cgi-bin/wiki.pl?UpperOntologySummit has 
all these historic pages available: 
http://ontolog.cim3.net/cgi-bin/wiki.pl?action=history&id=UpperOntologySummit 
and version #180 of this particular page can be displayed as: 
http://ontolog.cim3.net/cgi-bin/wiki.pl?action=browse&id=UpperOntologySummit&revision=180    (08)

).    (09)

In this example where there is more than 180 versions of the same 
page, probably only the "current" version (the snapshot of that) 
is of interest, as far as the exercise is concerned. Mind you, 
one actually does not need to "dig deeper" to reach those older 
versions. Therefore, when we fine tune our google search 
appliance (I believe I heard Denise mentioned that you use the 
google search appliance too, right?) we specifically filter out 
crawls to "http://ontolog.cim3.net/cgi-bin/wiki.pl?action="; (we 
didn't index the historic versions) so that we keep the search 
corpus to just the "current" pages. ... Denise, you might want to 
consider if this is applicable to your exercise now.    (010)

4. Yes ... I totally agree that our work should best be shared 
with other (sub)projects ... I am still very hopeful that we will 
be integrating all the work so that in the end, "Ontologizing 
Ontolog" will just be one big exercise.    (011)

5. Lastly, this conversation is useful and interesting, and 
should be archived. I am taking the liberty (once again) to 
direct it to the [ontologizing] list, and urge that we all 
continue the discussion via that forum.    (012)

Thanks & regards.  =ppy
--    (013)


Bob Smith wrote Fri, 5 May 2006 08:15:23 -0700:
> Hello,
> 
> Good idea to make the inventory available to other teams.
> 
> Keeping the "Context" explicit as much as possible is enabled by the full
> path url complemented by dates.
> 
> Do any community issues surface from including CIM3 websites such as the
> Upper Ontology Preparations or the Ontology Implementation preparation
> files?
> 
> For example, here is one item:
> 
>  http://colab.cim3.net/forum/
> 
> (Blush... But I have not updated our own URL mission statement because of a
> local problem...which should be included in the next round, if possible)
> 
> Peter, what are your thoughts?
> 
> Cheers,
> 
> Bob    (014)


> -----Original Message-----
> From: dbedford@xxxxxxxxxxxxx [mailto:dbedford@xxxxxxxxxxxxx] 
> Sent: Friday, May 05, 2006 7:26 AM
> To: Bob Smith
> Cc: Lisa Colvin; 'Peter P. Yim'
> Subject: RE: Status of the inventory
> 
> Bob,
> 
> Yes, because I get the full path url, as well as the specific file name and
> characteristics.  We could create a single excel spreadsheet of all relevant
> ontolog content, integrating from multiple site crawls.   It occurs to me
> that
> we can also hand this inventory over to the other teams working on different
> approaches.   The inventory might provide us with a basic test corpus for an
> ontology ontology.
> 
> Best regards,
> Denise    (015)


>              "Bob Smith"
> 
>              <Bob@1talltrees
> 
>              .com>
> To 
>                                      <dbedford@xxxxxxxxxxxxx>, "'Peter P.
> Yim'" 
>              05/05/2006              <peter.yim@xxxxxxxx>, "Lisa Colvin"
> 
>              10:11 AM                <lisadawncolvin@xxxxxxxxx>
> 
>  
> cc 
>  
> 
>  
> Subject 
>                                      RE: Status of the inventory
> 
>  
> 
> 
> 
> Hi Denise, Peter;
> 
> I can almost hear that chugging on the East Coast...
> 
> If you run an inventory on other related sites, such as the Health Care site
> which has only a few (perhaps a dozen) Documents, is the fact that an item
> is from a different site noted in your inventory?
> 
> Peter, do you have an expectation that more related sites are needed at this
> stage?
> 
> Warm regards,
> 
> Bob    (016)


> -----Original Message-----
> From: dbedford@xxxxxxxxxxxxx [mailto:dbedford@xxxxxxxxxxxxx]
> Sent: Friday, May 05, 2006 6:35 AM
> To: Peter P. Yim; Bob Smith
> Subject: Status of the inventory
> 
> Peter and Bob,
> 
> Just to let you know the COAST inventory is still chugging along - it is at
> level 6 and has inventoried about 51,000 items.   I'm not sure that all of
> these
> are 'content rich' items.   Once the inventory finishes, we can take a quick
> look.   We may want to rerun it and set a constraint not to go below a
> certain
> level in the tree, depending on what we find.
> 
> This is probably the slowest part of the process -- it is fast against a
> shallow
> site or database.   I am using the Coast Webmaster software to do this work.
> 
> If you want me to run inventories of other related sites, please send me the
> url's.   Any content that you want us to include in the ontologizing
> exercises.
> 
> Best regards,
> Denise    (017)

_________________________________________________________________
Community Portal: http://ontolog.cim3.net/
Msg Archives: http://ontolog.cim3.net/forum/ontologizing/
Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog    (018)
<Prev in Thread] Current Thread [Next in Thread>