See inline for my comments. (01)
On Fri, 22 Jun 2007, Michael K. Bergman wrote: (02)
> <<Re-posted per Peter:>>
>
> Hi Bob, Denise and Ken,
>
> I very much enjoyed the discussion on Ontolog's Taxo-Thesaurus project
> yesterday
> (http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2007_06_21).
>
> What has been done to date to index all of Ontolog's holdings into XML
> with the support from Denise at the WorldBank and Teragram to extract
> important metadata and entities strikes me as pretty close to
> state-of-art. I'm also impressed with the 1 M PDF throughput per day
> extraction and tagging process. I very much look forward to the release
> of the best practices guides from the World Bank.
>
> Since I'm new to these efforts, I don't know who else should be copied
> on these notes; please feel free to distribute as you see fit. [now
> posted to full ontologizing list]
>
> Also, since I'm presently working with OpenLink and their Virtuoso
> software that is very relevant to possible next steps, I have taken the
> liberty of cc'ing their CEO, Kingsley Idehen, on these notes. (By way
> of disclosure, I have no financial or other relationship with OpenLink;
> I use them because they are, IMHO, best-of-breed in what they do.) I
> also copied Alistair Miles because I quote him below and he is a key
> individual with SKOS.
>
> So, with your leave, I offer some of my quick first thoughts on where
> this project may go in the near future.
>
> Indexing and Data Characterization
> ----------------------------------
> I very much encourage another pass on the Ontolog data *if* any of the
> existing metadata has been missed. I also very much agree with the
> one-link removed external crawl; that will bring in the good, related stuff.
>
> I encourage you to look at Solr as a faceted full-text indexer. There
> is a project by a humanities consortium managed by UVa called Collex
> (http://www.patacriticism.org/collex) (also, my own write-up at
> http://www.mkbergman.com/?p=331) that is a cool example of faceted
> browsing with Solr.
>
> OpenLink also has full-text indexing, but I have not yet used it and can
> not speak to its performance or capabilities.
>
> In any event, if Denise would be willing to send some sample XML from
> the Ontolog database (or, better still, anything related to its
> controlled vocabulary), I may be able to offer some better thoughts.
>
> Why RDF?
> --------
> As I've argued elsewhere, I believe RDF to be the right "middle ground"
> of expressiveness for both more formal ontologies (OWL, etc.) and a
> means of representing "less formal" 'ontologies' such as tags,
> microformats, etc. (see http://www.mkbergman.com/?p=374). I think
> Alistair Miles had a really good piece that came out yesterday
> (http://isegserv.itd.rl.ac.uk/blogs/alistair/archives/56); my guess is
> that he would support the idea of RDF and SKOS as well.
>
> Actually, as an "exposed" form, being read by humans, I really don't
> care for RDF much. I think there are much easier ways to express data
> to humans. But, as a canonical representation and storage form,
> available for manipulation by computer, it appears "just right." It
> also has a wealth of options for format conversion (next).
>
> A commitment to RDF as the canonical data model need not imply massive
> RDF exposure to most users, nor "lock in" to RDF. It appears most any
> data form or serialization will lend itself nicely to RDF conversion.
>
> RDF Conversion
> --------------
> As I mentioned on the call, RDF conversion brings a number of "free"
> benefits in terms of linking data (see next), interoperability and
> toolsets. RDF has literally taken off in the last six months, with a
> bunch of conversion tools emerging.
>
> There are, for example, many "RDFizers" for converting existing data
> formats into RDF:
> http://esw.w3.org/topic/ConverterToRdf
> http://simile.mit.edu/RDFizers/
>
> Also, the GRDDL effort is another means for data conversion:
> http://www.w3.org/TR/grddl/
> http://esw.w3.org/topic/GrddlResources?highlight=%28grddl%29
>
> I suspect that Denise's XML would lend itself pretty readily to one of
> these conversion options. (03)
SKOS could be relevant in two ways: (04)
1. In the short term, SKOS tools can be used to help RDFize the
Taxo-Thesaurus metadata. (05)
2. In the long term, The OntologySummit2007 framework dimensions would be
a nice addition to the SKOS ontology if the Ontolog community can reach
agreement on the evaluation criteria. (06)
> Linked Data
> -----------
> The linked data efforts of the W3C are rapidly showing the benefits of
> "meshing" up multiple sources of RDF into more meaningful, "emergent"
> data objects (mapping, timelines, structured data displays, browsing, etc.)
>
> For example, via a FOAF profile, it is now possible to get much more
> related information on a given individual (assuming they have released
> it publicly!).
>
> I think we will rapidly see the geo coordinates, timeline relationships
> and other "contextual" stuff emerge quickly, including subject matter
> such as Wikipedia/DBpedia, WordNet, CIA Fact Book, PubMed, PTO, legal
> precedents, bibilo and citation, library holdings, etc., etc.
>
> The benefit of linked data, of course, is that each publisher is only
> needed for their own understanding of their piece of the puzzle; other
> puzzle pieces can be rapidly linked and displayed.
>
> I'm sure that the Ontolog database would be *heartily* welcomed as an
> important foundation piece to the Linked Data growing roster of
> datasets. (The other advantage of early involvement is the volunteer
> assistance of many individuals committed to this approach.)
>
> The Linked Data project of the W3C is explained here further:
> http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
>
> With a listing of some of the current Linked Data data sets shown here:
> http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets.
>
> RDF Browsing and Limits
> -----------------------
> We're starting to see a variety of faceted RDF browsers emerge and most
> of these RDF stores are accessed via SPARQL query endpoints. IMHO this
> is currently one of the weak links in the chain.
>
> While powerful frameworks exist, they are not *yet* simple enough IMHO
> for most standard users. SPARQL is not natural for most, standard
> listings of RDF triples appear overwhelming, etc., etc.
>
> However, that being said, we are really only talking about UI and
> usability refinements, which are (and I believe will continue to be)
> improving rapidly. With the amount of RDF data growing into the
> billions of triples (and growing exponentially), many are working hard
> to overcome these ease-of-use limits.
>
> One of the reasons I started my own Sweet Tools listing
> (http://www.mkbergman.com/?page_id=325) was to document the rapid
> evolution and development in this area.
>
> Knowing Kingsley, I suspect he may weigh in with some thoughts, URLs and
> demo pointers himself.
>
> SKOS as the Framework
> ---------------------
> For those of you not aware of it, I think the SKOS effort is designed
> especially to address the consolidation interests of Ontolog:
> http://www.w3.org/2004/02/skos/
>
> (I would love to see Ken's great listing at
>
>http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2007_06_21/SurveySummary
> characterized by SKOS!) (07)
Actually, it could be an extension to SKOS which permits one to
evaluate/assess ontological artifacts in ways that are not now available
with SKOS. (08)
-- Ken (09)
> I would recommend in particular the various use cases and communities
> presently moving this initiative forward:
> http://www.w3.org/2006/07/SWD/wiki/UCRMaterial.
>
>
>
> While I mention W3C and its efforts numerous times in this email, please
> also know I do not think everything needs to be seen solely through the
> W3C lens. There are many communities in this space.
>
> One of the real powers of Ontolog is its breadth of reach and
> membership, including enterprise-scale perspectives. If some of these
> W3C efforts (and from elsewhere, as well!) can be brought to bear on the
> *integration* across all communities of related interests, I think that
> would be most powerful. I think that is one of the unique roles that
> the Ontolog Forum may be able to play.
>
> Another reason I am interested is that almost all of the linked data
> efforts I have seen to date populate their RDF via existing structure or
> semi-structure. The Ontolog data with information extraction (IE) (if I
> understand correctly!) for some of its metadata (which is, after all,
> itself structure) is an absolutely critical complement to current efforts.
>
> At any rate, I think you are sitting on a gold mine of ontological and
> semWeb related data. If I can help (within my limits and reason!) in
> this effort, I'd love to! :)
>
> Thanks, Mike
>
>
> _________________________________________________________________
> Msg Archives: http://ontolog.cim3.net/forum/ontologizing/
> Subscribe/Unsubscribe/Config:
>http://ontolog.cim3.net/mailman/listinfo/ontologizing/
> Community Portal: http://ontolog.cim3.net/
> Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/
> Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog
> (010)
_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontologizing/
Subscribe/Unsubscribe/Config:
http://ontolog.cim3.net/mailman/listinfo/ontologizing/
Community Portal: http://ontolog.cim3.net/
Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog (011)
|