ontologizing
[Top] [All Lists]

[ontologizing] RE: [OntologTaxoThesaurus] Status of the inventory

To: "'Peter P. Yim'" <peter.yim@xxxxxxxx>, "'Ontologizing-Ontolog'" <ontologizing@xxxxxxxxxxxxxxxx>, "'Denise Bedford'" <dbedford@xxxxxxxxxxxxx>
From: "Bob Smith" <Bob@xxxxxxxxxxxxxx>
Date: Fri, 5 May 2006 12:05:27 -0700
Message-id: <200605051905.k45J5S3Y006475@xxxxxxxxxxxxxxxxxxxxxxx>
Hi,    (01)

So we have the classic AI tradeoff between depth and breadth of initial
search, with some hope of a T compromise pattern.    (02)

I have done file inventory projects before, but usually in manual mode using
no tools but the dos command line DIR >Filename or Tree /f /a >Filename. The
results are large and ugly and very time-consuming.    (03)

With the capabilities of Denise, why don't we go for the balance she feels
comfortable with using?    (04)

Peter, do you see any IPR issues that constrain the next step?    (05)

Cheers,    (06)

Bob     (07)

-----Original Message-----
From: Peter P. Yim [mailto:peter.yim@xxxxxxxx] 
Sent: Friday, May 05, 2006 11:42 AM
To: Ontologizing-Ontolog; Denise Bedford; Bob Smith
Cc: lisa colvin
Subject: Re: [OntologTaxoThesaurus] Status of the inventory    (08)

Hi Denise & Bob,    (09)

This is exciting! Some thoughts ...    (010)

1. I agree with Denise, if you mine 6-levels deeps, but restrict the mining
to within the our domain*, then there's probably not much more to gain out
of the crawl.    (011)

The CWE infrastructure design is inspired by Engelbart's notion of an OHS
(Open Hyperdocument System). Where one could treat all knowledge objects as
being interlinked into a 'huge' 
hyperdocument, with the hyperlink being the integration point(s). 
Since its purpose is to augment the human, the most 'interesting' 
things should always be one or two (say three) click away. 
Therefore, keeping the crawl shallow is both efficient and effective (much
less noisy.)    (012)

I wonder if you can set it to inventory ONE (or, at most, Two)
level(s) of content that is outside of our (cim3.net) domain? 
[Disclaimer: one probably has to look into IPR issues too, even if we do
this.]    (013)

2. *Depending on the scope of this project (someone has to make a call
here), we should decide whether we want to restrict the inventory to the
domain of (a) <ontolog.cim3.net>, or (b) <cim3.net>.    (014)

I have good things to say about each of these two options. This is quite
similar to how we needed to decide on the search corpus for our google
search - see http://ontolog.cim3.net - where we ended up offering both.    (015)

3. one thing that is special about the wiki is that it keeps an entire
version history of any page. (see, for example, the "UpperOntologySummit"
page http://ontolog.cim3.net/cgi-bin/wiki.pl?UpperOntologySummit has all
these historic pages available: 
http://ontolog.cim3.net/cgi-bin/wiki.pl?action=history&id=UpperOntologySummi
t
and version #180 of this particular page can be displayed as: 
http://ontolog.cim3.net/cgi-bin/wiki.pl?action=browse&id=UpperOntologySummit
&revision=180
).    (016)

In this example where there is more than 180 versions of the same page,
probably only the "current" version (the snapshot of that) is of interest,
as far as the exercise is concerned. Mind you, one actually does not need to
"dig deeper" to reach those older versions. Therefore, when we fine tune our
google search appliance (I believe I heard Denise mentioned that you use the
google search appliance too, right?) we specifically filter out crawls to
"http://ontolog.cim3.net/cgi-bin/wiki.pl?action="; (we didn't index the
historic versions) so that we keep the search corpus to just the "current"
pages. ... Denise, you might want to consider if this is applicable to your
exercise now.    (017)

4. Yes ... I totally agree that our work should best be shared with other
(sub)projects ... I am still very hopeful that we will be integrating all
the work so that in the end, "Ontologizing Ontolog" will just be one big
exercise.    (018)

5. Lastly, this conversation is useful and interesting, and should be
archived. I am taking the liberty (once again) to direct it to the
[ontologizing] list, and urge that we all continue the discussion via that
forum.    (019)

Thanks & regards.  =ppy
--    (020)


Bob Smith wrote Fri, 5 May 2006 08:15:23 -0700:
> Hello,
> 
> Good idea to make the inventory available to other teams.
> 
> Keeping the "Context" explicit as much as possible is enabled by the 
> full path url complemented by dates.
> 
> Do any community issues surface from including CIM3 websites such as 
> the Upper Ontology Preparations or the Ontology Implementation 
> preparation files?
> 
> For example, here is one item:
> 
>  http://colab.cim3.net/forum/
> 
> (Blush... But I have not updated our own URL mission statement because 
> of a local problem...which should be included in the next round, if 
> possible)
> 
> Peter, what are your thoughts?
> 
> Cheers,
> 
> Bob    (021)


> -----Original Message-----
> From: dbedford@xxxxxxxxxxxxx [mailto:dbedford@xxxxxxxxxxxxx]
> Sent: Friday, May 05, 2006 7:26 AM
> To: Bob Smith
> Cc: Lisa Colvin; 'Peter P. Yim'
> Subject: RE: Status of the inventory
> 
> Bob,
> 
> Yes, because I get the full path url, as well as the specific file 
> name and characteristics.  We could create a single excel spreadsheet of
all relevant
> ontolog content, integrating from multiple site crawls.   It occurs to me
> that
> we can also hand this inventory over to the other teams working on
different
> approaches.   The inventory might provide us with a basic test corpus for
an
> ontology ontology.
> 
> Best regards,
> Denise    (022)


>              "Bob Smith"
> 
>              <Bob@1talltrees
> 
>              .com>
> To 
>                                      <dbedford@xxxxxxxxxxxxx>, "'Peter P.
> Yim'" 
>              05/05/2006              <peter.yim@xxxxxxxx>, "Lisa Colvin"
> 
>              10:11 AM                <lisadawncolvin@xxxxxxxxx>
> 
>  
> cc
>  
> 
>  
> Subject 
>                                      RE: Status of the inventory
> 
>  
> 
> 
> 
> Hi Denise, Peter;
> 
> I can almost hear that chugging on the East Coast...
> 
> If you run an inventory on other related sites, such as the Health 
> Care site which has only a few (perhaps a dozen) Documents, is the 
> fact that an item is from a different site noted in your inventory?
> 
> Peter, do you have an expectation that more related sites are needed 
> at this stage?
> 
> Warm regards,
> 
> Bob    (023)


> -----Original Message-----
> From: dbedford@xxxxxxxxxxxxx [mailto:dbedford@xxxxxxxxxxxxx]
> Sent: Friday, May 05, 2006 6:35 AM
> To: Peter P. Yim; Bob Smith
> Subject: Status of the inventory
> 
> Peter and Bob,
> 
> Just to let you know the COAST inventory is still chugging along - it is
at
> level 6 and has inventoried about 51,000 items.   I'm not sure that all of
> these
> are 'content rich' items.   Once the inventory finishes, we can take a
quick
> look.   We may want to rerun it and set a constraint not to go below a
> certain
> level in the tree, depending on what we find.
> 
> This is probably the slowest part of the process -- it is fast against 
> a shallow
> site or database.   I am using the Coast Webmaster software to do this
work.
> 
> If you want me to run inventories of other related sites, please send me
the
> url's.   Any content that you want us to include in the ontologizing
> exercises.
> 
> Best regards,
> Denise    (024)



_________________________________________________________________
Community Portal: http://ontolog.cim3.net/
Msg Archives: http://ontolog.cim3.net/forum/ontologizing/
Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog    (025)
<Prev in Thread] Current Thread [Next in Thread>