[Top] [All Lists]

Re: [ontology-summit] Ontology Summit 2014 Hackathon - Optimized SPARQL

To: ontology-summit@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Mon, 31 Mar 2014 20:25:06 -0400
Message-id: <533A0762.5050900@xxxxxxxxxxxxxx>
On 3/30/14 5:10 PM, Victor Chernov wrote:
Hi Colleagues,

Here is the status report from Hackathon #3 (Optimized SPARQL performance management via native API).

STATUS as at 8pm CET 30 March

We have installed all necessary software on the machines of 3 experiment participants. The software includes Virtuoso Universal Server (trial version), Stardog (free eval), NitrosBase RDF Storage, C# application to run the test queries.

The C# application was developed in advance, so we just needed to explain the participants the main ideas, the program structure and the ways the program communicates each RDF tool.

Each participant generated 25 mln triples n3 file using an utility from SP2Bench site, then loaded that file into each database.

Every participant run the application against all 3 databases.

As the process of testing takes considerable amount of time we made a break.  

This moment we are fixing the problems with a) Virtuoso license that has been expired on one of our computers accidentally. b) The test query set run incorrectly on another computer. Then we'll run the test application again and hopefully can start analyzing the results.



I've been reading the material and general narrative that you are pushing in regards to performance comparisons against Virtuoso [1].

For starters, you do know that we have two current versions of Virtuoso i.e., version 6.x and 7.x. Where 7.x is based on column-wise storage, key compression, and vectorized query execution etc.,  right? In addition, if you are going to tout a benchmark, one which you will not win against Virtuoso, why have you gone specifically for SP2?

If you are going to attempt to make a showcase of your product against Virtuoso, at the very least undertake the adventure in objective fashion.

At this juncture, are you claiming to be the fastest DBMS for a 25 million triples RDF dataset, running the SP2 benchmark where you have Virtuoso running on Linux and your DBMS running on Windows? Without even getting into the numbers, what's stopped you running Virtuoso 6.x and 7.x on Windows as part of this effort?

To matters worse, you actually make this audacious claim in your collateral based on this most subjective benchmark:

" ..Benchmark shows that NitrosBase RDF Storage is tens or hundreds times faster than Openlink Virtuoso on most queries. In the worst case (query 3b), it is at least 8 times faster. In the best case (query 11) it is 300 000 times faster." .

Here's how benchmarks are run, professionally:

1. You publish the DBMS configuration i.e., don't take the default settings of one DBMS and compared them against optimized settings of another, across totally different operating systems. FWIW, Virtuoso is actually faster on Windows than it is on Linux. Open Source might be cool, but Linux isn't Solaris, AIX, or HP-UX.

2. Take the time to find out if your competitor has published optimization guides [2] and benchmark results too [3][4][5].

We are always game for a DBMS performance shootout, but we do prefer to have these conducted in the right manner, based on existing industry best practices.

if you find my tone harsh, well I just can't let innuendo of this kind fly about the place. Even worse, it can't become the thrust of a hackathon that would actually be much better served demonstrating the utility inherent in being able to create, share, and remix Linked Open Data etc., using a variety of tools.

The big issue right now isn't the fastest DBMS, the biggest issue right now is data-de-silo-fication or data desiloization i.e., the separation of data, databases, and database management systems. The aforementioned issue is where ontologies provide immense help;  especially bearing in mind the existing of a massive Linked Open Data cloud that's available to any tool that can de-reference an HTTP URI and make sense of the entity relations they unveil.


[1] http://nitrosbase.com/wp-content/uploads/2014/02/NitrosBase_sp2bench_Eng_V17.pdf -- Your benchmark report
[2] http://bit.ly/1bGhpSK -- Blog post about Virtuoso and the TPC-H benchmark, which includes configuration information
[3] http://bit.ly/ZOCmaD -- Virtuoso Star Schema Benchmark Results (here we demonstrate SQL and SPARQL performance)
[4] http://bit.ly/12rpwio -- Berlin SPARQL Benchmark Report at scales up to 150 Billion Triples
[5] http://bit.ly/Yf5etP -- Berlin SPARQL Benchmark covering Virtuoso 6.x and Virtuoso 7.x amongst others.



Kingsley Idehen	      
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
Community Portal: http://ontolog.cim3.net/wiki/     (01)
<Prev in Thread] Current Thread [Next in Thread>