ontologizing
[Top] [All Lists]

[ontologizing] RDF and Linked Data for the Ontolog Database

To: ontologizing@xxxxxxxxxxxxxxxx
From: "Michael K. Bergman" <mike@xxxxxxxxxxxxx>
Date: Fri, 22 Jun 2007 15:13:52 -0500
Message-id: <467C2D80.3080807@xxxxxxxxxxxxx>
<<Re-posted per Peter:>>    (01)

Hi Bob, Denise and Ken,    (02)

I very much enjoyed the discussion on Ontolog's Taxo-Thesaurus project 
yesterday 
(http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2007_06_21).    (03)

What has been done to date to index all of Ontolog's holdings into XML 
with the support from Denise at the WorldBank and Teragram to extract 
important metadata and entities strikes me as pretty close to 
state-of-art.  I'm also impressed with the 1 M PDF throughput per day 
extraction and tagging process.  I very much look forward to the release 
of the best practices guides from the World Bank.    (04)

Since I'm new to these efforts, I don't know who else should be copied 
on these notes; please feel free to distribute as you see fit. [now 
posted to full ontologizing list]    (05)

Also, since I'm presently working with OpenLink and their Virtuoso 
software that is very relevant to possible next steps, I have taken the 
liberty of cc'ing their CEO, Kingsley Idehen, on these notes.  (By way 
of disclosure, I have no financial or other relationship with OpenLink; 
I use them because they are, IMHO, best-of-breed in what they do.)  I 
also copied Alistair Miles because I quote him below and he is a key 
individual with SKOS.    (06)

So, with your leave, I offer some of my quick first thoughts on where 
this project may go in the near future.    (07)

Indexing and Data Characterization
----------------------------------
I very much encourage another pass on the Ontolog data *if* any of the 
existing metadata has been missed.  I also very much agree with the 
one-link removed external crawl; that will bring in the good, related stuff.    (08)

I encourage you to look at Solr as a faceted full-text indexer.  There 
is a project by a humanities consortium managed by UVa called Collex 
(http://www.patacriticism.org/collex) (also, my own write-up at 
http://www.mkbergman.com/?p=331) that is a cool example of faceted 
browsing with Solr.    (09)

OpenLink also has full-text indexing, but I have not yet used it and can 
not speak to its performance or capabilities.    (010)

In any event, if Denise would be willing to send some sample XML from 
the Ontolog database (or, better still, anything related to its 
controlled vocabulary), I may be able to offer some better thoughts.    (011)

Why RDF?
--------
As I've argued elsewhere, I believe RDF to be the right "middle ground" 
of expressiveness for both more formal ontologies (OWL, etc.) and a 
means of representing "less formal" 'ontologies' such as tags, 
microformats, etc. (see http://www.mkbergman.com/?p=374).  I think 
Alistair Miles had a really good piece that came out yesterday 
(http://isegserv.itd.rl.ac.uk/blogs/alistair/archives/56); my guess is 
that he would support the idea of RDF and SKOS as well.    (012)

Actually, as an "exposed" form, being read by humans, I really don't 
care for RDF much.  I think there are much easier ways to express data 
to humans.  But, as a canonical representation and storage form, 
available for manipulation by computer, it appears "just right."  It 
also has a wealth of options for format conversion (next).    (013)

A commitment to RDF as the canonical data model need not imply massive 
RDF exposure to most users, nor "lock in" to RDF.  It appears most any 
data form or serialization will lend itself nicely to RDF conversion.    (014)

RDF Conversion
--------------
As I mentioned on the call, RDF conversion brings a number of "free" 
benefits in terms of linking data (see next), interoperability and 
toolsets.  RDF has literally taken off in the last six months, with a 
bunch of conversion tools emerging.    (015)

There are, for example, many "RDFizers" for converting existing data 
formats into RDF:
http://esw.w3.org/topic/ConverterToRdf
http://simile.mit.edu/RDFizers/    (016)

Also, the GRDDL effort is another means for data conversion:
http://www.w3.org/TR/grddl/
http://esw.w3.org/topic/GrddlResources?highlight=%28grddl%29    (017)

I suspect that Denise's XML would lend itself pretty readily to one of 
these conversion options.    (018)

Linked Data
-----------
The linked data efforts of the W3C are rapidly showing the benefits of 
"meshing" up multiple sources of RDF into more meaningful, "emergent" 
data objects (mapping, timelines, structured data displays, browsing, etc.)    (019)

For example, via a FOAF profile, it is now possible to get much more 
related information on a given individual (assuming they have released 
it publicly!).    (020)

I think we will rapidly see the geo coordinates, timeline relationships 
and other "contextual" stuff emerge quickly, including subject matter 
such as Wikipedia/DBpedia, WordNet, CIA Fact Book, PubMed, PTO, legal 
precedents, bibilo and citation, library holdings, etc., etc.    (021)

The benefit of linked data, of course, is that each publisher is only 
needed for their own understanding of their piece of the puzzle; other 
puzzle pieces can be rapidly linked and displayed.    (022)

I'm sure that the Ontolog database would be *heartily* welcomed as an 
important foundation piece to the Linked Data growing roster of 
datasets.  (The other advantage of early involvement is the volunteer 
assistance of many individuals committed to this approach.)    (023)

The Linked Data project of the W3C is explained here further:
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData    (024)

With a listing of some of the current Linked Data data sets shown here:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets.    (025)

RDF Browsing and Limits
-----------------------
We're starting to see a variety of faceted RDF browsers emerge and most 
of these RDF stores are accessed via SPARQL query endpoints.  IMHO this 
is currently one of the weak links in the chain.    (026)

While powerful frameworks exist, they are not *yet* simple enough IMHO 
for most standard users.  SPARQL is not natural for most, standard 
listings of RDF triples appear overwhelming, etc., etc.    (027)

However, that being said, we are really only talking about UI and 
usability refinements, which are (and I believe will continue to be) 
improving rapidly.  With the amount of RDF data growing into the 
billions of triples (and growing exponentially), many are working hard 
to overcome these ease-of-use limits.    (028)

One of the reasons I started my own Sweet Tools listing 
(http://www.mkbergman.com/?page_id=325) was to document the rapid 
evolution and development in this area.    (029)

Knowing Kingsley, I suspect he may weigh in with some thoughts, URLs and 
demo pointers himself.    (030)

SKOS as the Framework
---------------------
For those of you not aware of it, I think the SKOS effort is designed 
especially to address the consolidation interests of Ontolog:
http://www.w3.org/2004/02/skos/    (031)

(I would love to see Ken's great listing at 
http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2007_06_21/SurveySummary 
characterized by SKOS!)    (032)

I would recommend in particular the various use cases and communities 
presently moving this initiative forward:
http://www.w3.org/2006/07/SWD/wiki/UCRMaterial.    (033)



While I mention W3C and its efforts numerous times in this email, please 
also know I do not think everything needs to be seen solely through the 
W3C lens.  There are many communities in this space.    (034)

One of the real powers of Ontolog is its breadth of reach and 
membership, including enterprise-scale perspectives.  If some of these 
W3C efforts (and from elsewhere, as well!) can be brought to bear on the 
*integration* across all communities of related interests, I think that 
would be most powerful.  I think that is one of the unique roles that 
the Ontolog Forum may be able to play.    (035)

Another reason I am interested is that almost all of the linked data 
efforts I have seen to date populate their RDF via existing structure or 
semi-structure.  The Ontolog data with information extraction (IE) (if I 
understand correctly!) for some of its metadata (which is, after all, 
itself structure) is an absolutely critical complement to current efforts.    (036)

At any rate, I think you are sitting on a gold mine of ontological and 
semWeb related data.  If I can help (within my limits and reason!) in 
this effort, I'd love to! :)    (037)

Thanks, Mike    (038)


_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontologizing/ 
Subscribe/Unsubscribe/Config: 
http://ontolog.cim3.net/mailman/listinfo/ontologizing/
Community Portal: http://ontolog.cim3.net/
Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog    (039)
<Prev in Thread] Current Thread [Next in Thread>