Re: [ontologizing] RDF and Linked Data for Emergency

To:	Ontologizing-Ontolog <ontologizing@xxxxxxxxxxxxxxxx>, "Tim Berners-Lee" <timbl@xxxxxx>, "Renato Iannella" <renato@xxxxxxxxxxxx>
From:	paola.dimaio@xxxxxxxxx
Date:	Sat, 23 Jun 2007 10:40:11 +0700
Message-id:	<c09b00eb0706222040y3245c7bdr97833ebca992fdc8@xxxxxxxxxxxxxx>

Mike
thanks a lot for extensive info on the Linked data project

It is important inho to start working with W3 and parallel orgs,
that's the point of doing all this standards work etc.

(RDF is good to start with, me too I always try to read rdf file in my browser, an alwasy end up seeing a bunch of tags but the answer is simple: develop a browser feature to strip rdf files of their tagging elements, and to render it as pure text. Piece of cake. I think we should ask mozilla to start developing it)

I am modifying slightly the thread name, (cc tbl, nicta) considering this effort seems pretty much at hand by this group, I am hoping to be able to model a similar process for a new community that is being started, the Emergency Management Incubator Group, where we aim to start working on a glossary, controlled vocabulary and catalogue of standard formats http://esw.w3.org/topic/DisasterManagement
(apologies for the name 'disaster management' it is temporary name holder)

The long term goal is develop a complete ontology, taking it step by step.

So I ask permission to this community to cross post some of the emails on this thread to that group over there, including Denises guidelines from the World Bank and related mails that describe the body of this effort, as background documentation and shared process of a parallel effort which I think will benefit immensely from what is being done here

Hope that's okay, let me know otherwise

Linking data sounds good

Thanks

Paola Di Maio

On 6/23/07, Michael K. Bergman <mike@xxxxxxxxxxxxx> wrote:

<<Re-posted per Peter:>>

Hi Bob, Denise and Ken,

I very much enjoyed the discussion on Ontolog's Taxo-Thesaurus project
yesterday
( http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2007_06_21).

What has been done to date to index all of Ontolog's holdings into XML
with the support from Denise at the WorldBank and Teragram to extract
important metadata and entities strikes me as pretty close to
state-of-art.  I'm also impressed with the 1 M PDF throughput per day
extraction and tagging process.  I very much look forward to the release
of the best practices guides from the World Bank.

Since I'm new to these efforts, I don't know who else should be copied
on these notes; please feel free to distribute as you see fit. [now
posted to full ontologizing list]

Also, since I'm presently working with OpenLink and their Virtuoso
software that is very relevant to possible next steps, I have taken the
liberty of cc'ing their CEO, Kingsley Idehen, on these notes.  (By way
of disclosure, I have no financial or other relationship with OpenLink;
I use them because they are, IMHO, best-of-breed in what they do.)  I
also copied Alistair Miles because I quote him below and he is a key
individual with SKOS.

So, with your leave, I offer some of my quick first thoughts on where
this project may go in the near future.

Indexing and Data Characterization
----------------------------------
I very much encourage another pass on the Ontolog data *if* any of the
existing metadata has been missed.  I also very much agree with the
one-link removed external crawl; that will bring in the good, related stuff.

I encourage you to look at Solr as a faceted full-text indexer.  There
is a project by a humanities consortium managed by UVa called Collex
(http://www.patacriticism.org/collex ) (also, my own write-up at
http://www.mkbergman.com/?p=331) that is a cool example of faceted
browsing with Solr.

OpenLink also has full-text indexing, but I have not yet used it and can
not speak to its performance or capabilities.

In any event, if Denise would be willing to send some sample XML from
the Ontolog database (or, better still, anything related to its
controlled vocabulary), I may be able to offer some better thoughts.

Why RDF?
--------
As I've argued elsewhere, I believe RDF to be the right "middle ground"
of expressiveness for both more formal ontologies (OWL, etc.) and a
means of representing "less formal" 'ontologies' such as tags,
microformats, etc. (see http://www.mkbergman.com/?p=374).  I think
Alistair Miles had a really good piece that came out yesterday
( http://isegserv.itd.rl.ac.uk/blogs/alistair/archives/56); my guess is
that he would support the idea of RDF and SKOS as well.

Actually, as an "exposed" form, being read by humans, I really don't
care for RDF much.  I think there are much easier ways to express data
to humans.  But, as a canonical representation and storage form,
available for manipulation by computer, it appears "just right."  It
also has a wealth of options for format conversion (next).

A commitment to RDF as the canonical data model need not imply massive
RDF exposure to most users, nor "lock in" to RDF.  It appears most any
data form or serialization will lend itself nicely to RDF conversion.

RDF Conversion
--------------
As I mentioned on the call, RDF conversion brings a number of "free"
benefits in terms of linking data (see next), interoperability and
toolsets.  RDF has literally taken off in the last six months, with a
bunch of conversion tools emerging.

There are, for example, many "RDFizers" for converting existing data
formats into RDF:
http://esw.w3.org/topic/ConverterToRdf
http://simile.mit.edu/RDFizers/

Also, the GRDDL effort is another means for data conversion:
http://www.w3.org/TR/grddl/
http://esw.w3.org/topic/GrddlResources?highlight=%28grddl%29

I suspect that Denise's XML would lend itself pretty readily to one of
these conversion options.

Linked Data
-----------
The linked data efforts of the W3C are rapidly showing the benefits of
"meshing" up multiple sources of RDF into more meaningful, "emergent"
data objects (mapping, timelines, structured data displays, browsing, etc.)

For example, via a FOAF profile, it is now possible to get much more
related information on a given individual (assuming they have released
it publicly!).

I think we will rapidly see the geo coordinates, timeline relationships
and other "contextual" stuff emerge quickly, including subject matter
such as Wikipedia/DBpedia, WordNet, CIA Fact Book, PubMed, PTO, legal
precedents, bibilo and citation, library holdings, etc., etc.

The benefit of linked data, of course, is that each publisher is only
needed for their own understanding of their piece of the puzzle; other
puzzle pieces can be rapidly linked and displayed.

I'm sure that the Ontolog database would be *heartily* welcomed as an
important foundation piece to the Linked Data growing roster of
datasets.  (The other advantage of early involvement is the volunteer
assistance of many individuals committed to this approach.)

The Linked Data project of the W3C is explained here further:
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

With a listing of some of the current Linked Data data sets shown here:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets.

RDF Browsing and Limits
-----------------------
We're starting to see a variety of faceted RDF browsers emerge and most
of these RDF stores are accessed via SPARQL query endpoints.  IMHO this
is currently one of the weak links in the chain.

While powerful frameworks exist, they are not *yet* simple enough IMHO
for most standard users.  SPARQL is not natural for most, standard
listings of RDF triples appear overwhelming, etc., etc.

However, that being said, we are really only talking about UI and
usability refinements, which are (and I believe will continue to be)
improving rapidly.  With the amount of RDF data growing into the
billions of triples (and growing exponentially), many are working hard
to overcome these ease-of-use limits.

One of the reasons I started my own Sweet Tools listing
( http://www.mkbergman.com/?page_id=325) was to document the rapid
evolution and development in this area.

Knowing Kingsley, I suspect he may weigh in with some thoughts, URLs and
demo pointers himself.

SKOS as the Framework
---------------------
For those of you not aware of it, I think the SKOS effort is designed
especially to address the consolidation interests of Ontolog:
http://www.w3.org/2004/02/skos/

(I would love to see Ken's great listing at
http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2007_06_21/SurveySummary
characterized by SKOS!)

I would recommend in particular the various use cases and communities
presently moving this initiative forward:
http://www.w3.org/2006/07/SWD/wiki/UCRMaterial .

While I mention W3C and its efforts numerous times in this email, please
also know I do not think everything needs to be seen solely through the
W3C lens.  There are many communities in this space.

One of the real powers of Ontolog is its breadth of reach and
membership, including enterprise-scale perspectives.  If some of these
W3C efforts (and from elsewhere, as well!) can be brought to bear on the
*integration* across all communities of related interests, I think that
would be most powerful.  I think that is one of the unique roles that
the Ontolog Forum may be able to play.

Another reason I am interested is that almost all of the linked data
efforts I have seen to date populate their RDF via existing structure or
semi-structure.  The Ontolog data with information extraction (IE) (if I
understand correctly!) for some of its metadata (which is, after all,
itself structure) is an absolutely critical complement to current efforts.

At any rate, I think you are sitting on a gold mine of ontological and
semWeb related data.  If I can help (within my limits and reason!) in
this effort, I'd love to! :)

Thanks, Mike

_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontologizing/
Subscribe/Unsubscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontologizing/
Community Portal: http://ontolog.cim3.net/
Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog

--

Paola Di Maio *****
School of Information Technology
Mae Fah Luang University
Chiang Rai - Thailand
*********************************************


_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontologizing/ 
Subscribe/Unsubscribe/Config: 
http://ontolog.cim3.net/mailman/listinfo/ontologizing/
Community Portal: http://ontolog.cim3.net/
Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog    (01)

<Prev in Thread]	Current Thread	[Next in Thread>
Re: [ontologizing] RDF and Linked Data for Emergency, paola . dimaio <=

Previous by Date:	[ontologizing] Thanks for the gentle reminder, Bob Smith
Next by Date:	[ontologizing] Fwd: Re: Draft Taxo-Thesaurus Facets, Rex Brooks
Previous by Thread:	[ontologizing] Thanks for the gentle reminder, Bob Smith
Next by Thread:	[ontologizing] Fwd: Re: Draft Taxo-Thesaurus Facets, Rex Brooks
Indexes:	[Date] [Thread] [Top] [All Lists]