ontolog-forum
[Top] [All Lists]

[ontolog-forum] Contd: End of the line for triple stores

To: sowa@xxxxxxxxxxx, "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
Cc: Mills Davis <mdavis@xxxxxxxxxxxxxx>
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Thu, 23 Jul 2009 13:27:43 +0100
Message-id: <4A68573F.8090108@xxxxxxxxxxxxxx>
John F. Sowa wrote:
> Mills,
>
> Thanks for the note. I have sent notes to this forum and others
> saying that Google does not use RDF and triple stores.  This is
> the reason why:  they found a better way.
>
> There is a fundamental principle that Don Knuth and others
> emphasized over 40 years ago:
>
>    Premature optimization is the root of all evil.
>
> The strategy of using triples for RDF and OWL was based on
> the mistaken idea that a triple store is more efficient than
> a relational database or other notations. 
The assumption is that triples stores are better at handling "Open 
World" data access and integration scenarios where an infinite number of 
claims can be made about anything.
>  But there are
> many publications on the WWW that show other representations
> that are more efficient. 
You mean data models? I don't believe representation formats like 
RDF/XML are the issue here.    (01)

We are still talking about database technology and data models be it 
relational, entity-attribute-value style graph models etc..    (02)

The problem is that there is so much darn reinvention in the RDF realm 
that too many basics are lost in term reinvention.    (03)


>  But people didn't believe them.
> Perhaps articles like this will convince them.
>
> MD>This could be significant:
>   
> http://www.computerworld.com/s/article/9135726/Yale_researchers_create_data
> base_Hadoop_hybrid?taxonomyId=9
>   
> Yes, indeed.  And that article points to others that discuss
> the issue further:
>
> http://www.computerworld.com/s/article/9121278/Google_claims_MapReduce_sets
> _data_sorting_record_topping_Yahoo_conventional_databases
>       (04)

What is a conventional database?    (05)

All database management systems manage data based on a particular data 
model [1].
> This article points out that the same technology was used to
> sort one terabyte of data in 68 seconds.
>       (06)

How many machines where required? That's the important question since 
ultimately, the way this will fall apart is when we have scalability per 
cluster node size analysis.
"Google used 1,000 servers running MapReduce in parallel to sort the 
data, versus 910 for Yahoo, according to Czajowksi."    (07)

Clustering is about pooling resources and parallelization, the products 
that handle both with the best based on a cost per cluster node will win 
out.    (08)

I can tell you categorically, that 68 seconds is in no way exciting if 
we had a 1,000 blade cluster re. Virtuoso. The key to all of this lies 
in the DBMS (which doesn't mean RDBMS) technology realm.
> And this example is just today's latest and greatest.  There is
> always somebody with a bright idea just around the corner who
> will find an even better algorithm.  The fundamental principle
> is that you should never distort your logical representation to
> fit a specific physical representation.  That was the basis for
> the ANSI/SPARC three-schema approach to databases in 1978, and
> it's just as sound today as it was then.
>   
In this day and age if anyone wants to make claims about DBMS technology 
and Web Scale, I would prefer they simply put out an instance of the 
DBMS online and let the world have a play.     (09)

We would like to see SPARQL endpoints and Data Explorers working with 
Linked Data Space caches.    (010)

That's what we do re:    (011)

1. http://lod.openlinksw.com  --- more than 5 Billion Triples (and counting)
2. http://dbpedia.org/fct -- much smaller 2.5 Billion Triples corpus 
hosting Virtuoso    (012)

Links:    (013)

1. http://www.unixspace.com/context/databases.html    (014)


--     (015)


Regards,    (016)

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com    (017)





_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (018)

<Prev in Thread] Current Thread [Next in Thread>