John F. Sowa wrote:
> Thanks for the note. I have sent notes to this forum and others
> saying that Google does not use RDF and triple stores. This is
> the reason why: they found a better way.
> There is a fundamental principle that Don Knuth and others
> emphasized over 40 years ago:
> Premature optimization is the root of all evil.
> The strategy of using triples for RDF and OWL was based on
> the mistaken idea that a triple store is more efficient than
> a relational database or other notations.
The assumption is that triples stores are better at handling "Open
World" data access and integration scenarios where an infinite number of
claims can be made about anything.
> But there are
> many publications on the WWW that show other representations
> that are more efficient.
You mean data models? I don't believe representation formats like
RDF/XML are the issue here. (01)
We are still talking about database technology and data models be it
relational, entity-attribute-value style graph models etc.. (02)
The problem is that there is so much darn reinvention in the RDF realm
that too many basics are lost in term reinvention. (03)
> But people didn't believe them.
> Perhaps articles like this will convince them.
> MD>This could be significant:
> Yes, indeed. And that article points to others that discuss
> the issue further:
What is a conventional database? (05)
All database management systems manage data based on a particular data
> This article points out that the same technology was used to
> sort one terabyte of data in 68 seconds.
How many machines where required? That's the important question since
ultimately, the way this will fall apart is when we have scalability per
cluster node size analysis.
"Google used 1,000 servers running MapReduce in parallel to sort the
data, versus 910 for Yahoo, according to Czajowksi." (07)
Clustering is about pooling resources and parallelization, the products
that handle both with the best based on a cost per cluster node will win
I can tell you categorically, that 68 seconds is in no way exciting if
we had a 1,000 blade cluster re. Virtuoso. The key to all of this lies
in the DBMS (which doesn't mean RDBMS) technology realm.
> And this example is just today's latest and greatest. There is
> always somebody with a bright idea just around the corner who
> will find an even better algorithm. The fundamental principle
> is that you should never distort your logical representation to
> fit a specific physical representation. That was the basis for
> the ANSI/SPARC three-schema approach to databases in 1978, and
> it's just as sound today as it was then.
In this day and age if anyone wants to make claims about DBMS technology
and Web Scale, I would prefer they simply put out an instance of the
DBMS online and let the world have a play. (09)
We would like to see SPARQL endpoints and Data Explorers working with
Linked Data Space caches. (010)
That's what we do re: (011)
1. http://lod.openlinksw.com --- more than 5 Billion Triples (and counting)
2. http://dbpedia.org/fct -- much smaller 2.5 Billion Triples corpus
hosting Virtuoso (012)
1. http://www.unixspace.com/context/databases.html (014)
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com (017)
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (018)