ontolog-forum
[Top] [All Lists]

[ontolog-forum] Fw: Semantic Enterprise Architecture

To: <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "sean barker" <sean.barker@xxxxxxxxxxxxx>
Date: Thu, 2 Sep 2010 19:59:44 +0100
Message-id: <028D8D8D0C834480A07F53F8CFBD81A9@SMB>


With regards Wikipedia, one might also point to DBpedia 
http://dbpedia.org/About), in which the infoboxes from Wikipedia are
translated to RDF, and then can be queried via a SPARQL end point. This 
allows one to, say, find people born in a particular place in a
particular year rather as if it were a database. However, such a search will 
not find too many ontologists or semantic webers, either because
their entries do not contain an infobox (John Sowa), or because the 
information does not make it into DBpedia for some reason (Pat Hayes,
Tim Berners-lee) (e.g. tags do not match ones known by the DBpedia 
ontology). If you use the SPARQL endpoint on the site, it works from
Firefox, but not from IE.    (01)


Sean Barker, Bristol, UK
> -----Original Message-----
> From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx
> [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of John F.
> Sowa
> Sent: 02 September 2010 13:55
> To: ontolog-forum@xxxxxxxxxxxxxxxx
> Subject: Re: [ontolog-forum] Semantic Enterprise Architecture
>
>
>                    *** WARNING ***
>
>  This message has originated outside your organisation,
>  either from an external partner or the Global Internet.
>      Keep this in mind if you answer this message.
>
>
> Rick,
>
> > Note that step three prescribes the use of standards, then
>> identifies two standards (RDF*, SPARQL).
>
> I agree that those are de facto standards for many applications, but you
> have to ask a question about how Linked Open Data will be used in the
> future.  Will RDF and SPARQL survive in their present forms?  What is
> the growth path for the future?  What technology has proved to be
> successful in the past?  What made it successful?
>
> Since the 1960s, it has been obvious that you cannot process large
> volumes of data without indexes that provide logarithmic-time access to
> the data.  SPARQL is designed for polynomial-time searches that do not
> scale to even small-scale enterprises.  Its "sweet spot"
> is for short searches that can be performed in a browser.
>
> Note what Franz, Oracle, and other vendors do:  they develop tools such
> as AllegroGraph or extensions to Oracle that suck the triples out of the
> web pages and index them.  Then they translate SPARQL to an optimized
> internal form for high-speed processing over indexed data.
>
> Another kind of data that is not yet in LOD form are tables from RDBs.
> Some people have proposed that those highly optimized search and
> retrieval engines be hamstrung by extracting the data from tables and
> mapping them to triples so that they can be queried by SPARQL.
> Anybody who proposed that is a prime candidate for the loony bin.
>
> Oracle has a better idea.  They have high-speed RDBs and high-speed
> triple stores.  They let anyone access data from either source by any
> query form.  They can execute SQL queries against triple stores or
> SPARQL queries against RDBs.  In either case, they optimize the queries
> for the data structures.  The users never need to know which kind of
> data is being accessed or how the data is organized.
>
> That is an idea that Franz, Oracle, and Google understand very well.
> Google became the biggest web company on earth because they developed
> better indexing, searching, and retrieval methods -- and they do *not*
> use RDF or OWL for their processing.  (They accept it when they find it
> in web pages, but that's also true of every notation under the sun.)
>
> Unfortunately, the Google indexes and metadata are not open.
> For LOD, the O means that some open method is necessary to integrate
> high-speed servers with open, standard message formats that can be
> accessed by both browsers and servers.
>
> As an example of where LOD is heading, I'd like to mention a couple of
> open-source projects, which are harbingers of things to come.
> Following is an excerpt from a note I sent to another forum.
> Some people complained that those systems are *not* typical LOD.
> I agree that they are not what LOD advocates are talking about today,
> but they are early versions of the many kinds of systems that will be
> designed to process LOD.
>
> The fact that those systems do *not* use SemWeb software is a wake-up
> call that the current SemWeb software and the current strategy for LOD
> are out of touch with future requirements.
>
> John
> ____________________________________________________________________
>
> The first is The Wikipedia Miner, which illustrates some interesting and
> powerful methods for processing Linked Open Data, but it doesn't use any
> of the tools designed for the Semantic Web:
>
>    http://wikipedia-miner.sourceforge.net/
>
>> Wikipedia Miner is a toolkit for navigating and making use of the
>> structure and content of Wikipedia. It aims to make it easy for you to
>
>> integrate Wikipedia's knowledge into your own applications, by:
>>
>> * providing simplified, object-oriented access to Wikipedia's
>>   structure and content.
>> * measuring how terms and concepts in Wikipedia are connected
>>   to each other.
>> * detecting and disambiguating Wikipedia topics when they are
>>   mentioned in documents.
>
> The Wikipedia was designed to be navigated by a reader who uses an
> ordinary browser to follow a few links to related pages.
> For that kind of application, links buried in the text are useful.
> But for more complex applications, very few readers would want to wait
> for the browser on their laptop to wade through SPARQL queries that
> navigate through many megabytes or gigabytes of markup.
>
> For more detail about how Wikipedia Miner works, see
>
> http://www.cs.waikato.ac.nz/~dnk2/publications/AnOpenSourceToolkitForMin
> ingWikipedia.pdf
>
> The author notes that the semantic features and URLs in Wikipedia "are
> buried under 20 GB of cryptic markup."  Just "the link-graph summary
> alone is still almost 1 GB. Instead the toolkit communicates with a
> MySQL database, so that the data can be indexed persistently and
> accessed immediately, without waiting for anything to load."
>
> Wikipedia Miner uses Perl (a language developed before the WWW) to
> gather the URLs and markup and put them in the more efficiently indexed
> tables of a relational DB.  But there are other systems that do even
> more sophisticated language processing. For example,
>
>    http://www.ukp.tu-darmstadt.de/software/jwpl
>    JWPL (Java Wikipedia Library)
>
> This is "a free, Java-based application programming interface that
> allows access to all information contained in Wikipedia."  Following is
> an article that provides more detail:
>
> http://elara.tk.informatik.tu-darmstadt.de/Publications/2008/lrec08_came
> ra_ready.pdf
>
> Note that JWPL also uses MySQL to index and store all the semantic
> information and URLs extracted from Wikipedia.
>
> In summary, these systems suggest future directions for LOD:
>
>  1. Tools like RDF and SPARQL, which can be processed by a browser,
>     are useful for lightweight, local navigation in a few web pages.
>
>  2. But for complex processing, it is essential to extract the
>     information from the source documents and organize it in
>     a high-speed indexed database (which could be an RDB or
>     something like AllegroGraph).
>
>  3. Future developments in LOD will follow the examples of JWPL
>     and Wikipedia Miner:  extract the markup and links from the
>     web pages and index them in databases.
>
>  4. There is no reason why semantic annotations must be stored
>     with the documents.  In fact, storing annotations in separate
>     files or databases is more flexible, because it can support
>     multiple ways of viewing and interpreting the same document
>     for different purposes.
>
> As an example, note the many ways of interpreting the Bible or the US
> Constitution.  For important documents, the semantic annotations can
> grow much faster and become orders of magnitude more voluminous than the
> source texts.  For many documents, the different interpretations may be
> proprietary or conflicting.
> Just imagine the different interpretations of the Bible by different
> religions or the Constitution by different judges and political parties.
>
> Summary:  For complex queries and high-performance processing, indexed
> databases are essential.  At present, conventional SQL databases run
> circles around SPARQL.  For the future, many novel kinds of databases
> could be designed.  But indexing is required, and annotations stored
> inside the documents should be optional.    (02)



_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (03)

<Prev in Thread] Current Thread [Next in Thread>