Mike
thanks a lot for extensive info on the Linked data project
It is important inho to start working with W3 and parallel orgs,
that's the point of doing all this standards work etc.
(RDF is good to start with, me too I always try to read rdf file in my
browser, an alwasy end up seeing a bunch of tags but the answer is
simple: develop a browser feature to strip rdf files of their tagging
elements, and to render it as pure text. Piece of cake. I think we
should ask mozilla to start developing it)
I am modifying slightly the thread name, (cc tbl, nicta)
considering this effort seems pretty much at hand by this group, I am
hoping to be able to model a similar process for a new community
that is being started, the Emergency Management Incubator Group,
where we aim to start working on a glossary, controlled vocabulary and
catalogue of standard formats http://esw.w3.org/topic/DisasterManagement
(apologies for the name 'disaster management' it is temporary name holder)
The long term goal is develop a complete ontology, taking it step by step.
So I ask permission to this community to cross post some of the emails
on this thread to that group over there, including Denises guidelines
from the World Bank and related mails that describe the body of
this effort, as background documentation and shared process of a
parallel effort which I think will benefit immensely from what is being
done here
Hope that's okay, let me know otherwise
Linking data sounds good
Thanks
Paola Di Maio
On 6/23/07, Michael K. Bergman <mike@xxxxxxxxxxxxx> wrote:
<<Re-posted per Peter:>>
Hi Bob, Denise and Ken,
I very much enjoyed the discussion on Ontolog's Taxo-Thesaurus project yesterday (
http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2007_06_21).
What has been done to date to index all of Ontolog's holdings into XML with the support from Denise at the WorldBank and Teragram to extract
important metadata and entities strikes me as pretty close to state-of-art. I'm also impressed with the 1 M PDF throughput per day extraction and tagging process. I very much look forward to the release
of the best practices guides from the World Bank.
Since I'm new to these efforts, I don't know who else should be copied on these notes; please feel free to distribute as you see fit. [now posted to full ontologizing list]
Also, since I'm presently working with OpenLink and their Virtuoso software that is very relevant to possible next steps, I have taken the liberty of cc'ing their CEO, Kingsley Idehen, on these notes. (By way
of disclosure, I have no financial or other relationship with OpenLink; I use them because they are, IMHO, best-of-breed in what they do.) I also copied Alistair Miles because I quote him below and he is a key
individual with SKOS.
So, with your leave, I offer some of my quick first thoughts on where this project may go in the near future.
Indexing and Data Characterization ----------------------------------
I very much encourage another pass on the Ontolog data *if* any of the existing metadata has been missed. I also very much agree with the one-link removed external crawl; that will bring in the good, related stuff.
I encourage you to look at Solr as a faceted full-text indexer. There is a project by a humanities consortium managed by UVa called Collex (http://www.patacriticism.org/collex
) (also, my own write-up at http://www.mkbergman.com/?p=331) that is a cool example of faceted browsing with Solr.
OpenLink also has full-text indexing, but I have not yet used it and can
not speak to its performance or capabilities.
In any event, if Denise would be willing to send some sample XML from the Ontolog database (or, better still, anything related to its controlled vocabulary), I may be able to offer some better thoughts.
Why RDF? -------- As I've argued elsewhere, I believe RDF to be the right "middle ground" of expressiveness for both more formal ontologies (OWL, etc.) and a means of representing "less formal" 'ontologies' such as tags,
microformats, etc. (see http://www.mkbergman.com/?p=374). I think Alistair Miles had a really good piece that came out yesterday (
http://isegserv.itd.rl.ac.uk/blogs/alistair/archives/56); my guess is that he would support the idea of RDF and SKOS as well.
Actually, as an "exposed" form, being read by humans, I really don't
care for RDF much. I think there are much easier ways to express data to humans. But, as a canonical representation and storage form, available for manipulation by computer, it appears "just right." It
also has a wealth of options for format conversion (next).
A commitment to RDF as the canonical data model need not imply massive RDF exposure to most users, nor "lock in" to RDF. It appears most any
data form or serialization will lend itself nicely to RDF conversion.
RDF Conversion -------------- As I mentioned on the call, RDF conversion brings a number of "free" benefits in terms of linking data (see next), interoperability and
toolsets. RDF has literally taken off in the last six months, with a bunch of conversion tools emerging.
There are, for example, many "RDFizers" for converting existing data formats into RDF:
http://esw.w3.org/topic/ConverterToRdf http://simile.mit.edu/RDFizers/
Also, the GRDDL effort is another means for data conversion:
http://www.w3.org/TR/grddl/ http://esw.w3.org/topic/GrddlResources?highlight=%28grddl%29
I suspect that Denise's XML would lend itself pretty readily to one of
these conversion options.
Linked Data ----------- The linked data efforts of the W3C are rapidly showing the benefits of "meshing" up multiple sources of RDF into more meaningful, "emergent"
data objects (mapping, timelines, structured data displays, browsing, etc.)
For example, via a FOAF profile, it is now possible to get much more related information on a given individual (assuming they have released
it publicly!).
I think we will rapidly see the geo coordinates, timeline relationships and other "contextual" stuff emerge quickly, including subject matter such as Wikipedia/DBpedia, WordNet, CIA Fact Book, PubMed, PTO, legal
precedents, bibilo and citation, library holdings, etc., etc.
The benefit of linked data, of course, is that each publisher is only needed for their own understanding of their piece of the puzzle; other puzzle pieces can be rapidly linked and displayed.
I'm sure that the Ontolog database would be *heartily* welcomed as an important foundation piece to the Linked Data growing roster of datasets. (The other advantage of early involvement is the volunteer
assistance of many individuals committed to this approach.)
The Linked Data project of the W3C is explained here further:
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
With a listing of some of the current Linked Data data sets shown here:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets.
RDF Browsing and Limits ----------------------- We're starting to see a variety of faceted RDF browsers emerge and most
of these RDF stores are accessed via SPARQL query endpoints. IMHO this is currently one of the weak links in the chain.
While powerful frameworks exist, they are not *yet* simple enough IMHO for most standard users. SPARQL is not natural for most, standard
listings of RDF triples appear overwhelming, etc., etc.
However, that being said, we are really only talking about UI and usability refinements, which are (and I believe will continue to be) improving rapidly. With the amount of RDF data growing into the
billions of triples (and growing exponentially), many are working hard to overcome these ease-of-use limits.
One of the reasons I started my own Sweet Tools listing (
http://www.mkbergman.com/?page_id=325) was to document the rapid evolution and development in this area.
Knowing Kingsley, I suspect he may weigh in with some thoughts, URLs and demo pointers himself.
SKOS as the Framework --------------------- For those of you not aware of it, I think the SKOS effort is designed especially to address the consolidation interests of Ontolog:
http://www.w3.org/2004/02/skos/
(I would love to see Ken's great listing at http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2007_06_21/SurveySummary
characterized by SKOS!)
I would recommend in particular the various use cases and communities presently moving this initiative forward: http://www.w3.org/2006/07/SWD/wiki/UCRMaterial
.
While I mention W3C and its efforts numerous times in this email, please also know I do not think everything needs to be seen solely through the W3C lens. There are many communities in this space.
One of the real powers of Ontolog is its breadth of reach and membership, including enterprise-scale perspectives. If some of these W3C efforts (and from elsewhere, as well!) can be brought to bear on the
*integration* across all communities of related interests, I think that would be most powerful. I think that is one of the unique roles that the Ontolog Forum may be able to play.
Another reason I am interested is that almost all of the linked data
efforts I have seen to date populate their RDF via existing structure or semi-structure. The Ontolog data with information extraction (IE) (if I understand correctly!) for some of its metadata (which is, after all,
itself structure) is an absolutely critical complement to current efforts.
At any rate, I think you are sitting on a gold mine of ontological and semWeb related data. If I can help (within my limits and reason!) in
this effort, I'd love to! :)
Thanks, Mike
_________________________________________________________________ Msg Archives: http://ontolog.cim3.net/forum/ontologizing/
Subscribe/Unsubscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontologizing/ Community Portal: http://ontolog.cim3.net/
Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/ Community Wiki:
http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog
--
Paola Di Maio ***** School of Information Technology Mae Fah Luang University Chiang Rai - Thailand
*********************************************
_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontologizing/
Subscribe/Unsubscribe/Config:
http://ontolog.cim3.net/mailman/listinfo/ontologizing/
Community Portal: http://ontolog.cim3.net/
Community Files: http://ontolog.cim3.net/file/work/OntologizingOntolog/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologizingOntolog (01)
|