On 3/30/14 5:10 PM, Victor Chernov
wrote:
Hi
Colleagues,
Here is the status report from Hackathon #3 (Optimized SPARQL
performance management via native API).
STATUS as at 8pm CET 30 March
We have installed all necessary software on the machines of 3
experiment participants. The software includes Virtuoso
Universal Server (trial version), Stardog (free eval),
NitrosBase RDF Storage, C# application to run the test queries.
The C# application was developed in advance, so we just needed
to explain the participants the main ideas, the program
structure and the ways the program communicates each RDF tool.
Each participant generated 25 mln triples n3 file using an
utility from SP2Bench site, then loaded that file into each
database.
Every participant run the application against all 3 databases.
As the process of testing takes considerable amount of time we
made a break.
This moment we are fixing the problems with a) Virtuoso license
that has been expired on one of our computers accidentally. b)
The test query set run incorrectly on another computer. Then
we'll run the test application again and hopefully can start
analyzing the results.
Regards,
Victor
Victor,
I've been reading the material and general narrative that you are
pushing in regards to performance comparisons against Virtuoso [1].
For starters, you do know that we have two current versions of
Virtuoso i.e., version 6.x and 7.x. Where 7.x is based on
column-wise storage, key compression, and vectorized query execution
etc., right? In addition, if you are going to tout a benchmark, one
which you will not win against Virtuoso, why have you gone
specifically for SP2?
If you are going to attempt to make a showcase of your product
against Virtuoso, at the very least undertake the adventure in
objective fashion.
At this juncture, are you claiming to be the fastest DBMS for a 25
million triples RDF dataset, running the SP2 benchmark where you
have Virtuoso running on Linux and your DBMS running on Windows?
Without even getting into the numbers, what's stopped you running
Virtuoso 6.x and 7.x on Windows as part of this effort?
To matters worse, you actually make this audacious claim in your
collateral based on this most subjective benchmark:
"
..Benchmark
shows that NitrosBase RDF Storage is tens or
hundreds times faster than Openlink
Virtuoso on most queries. In the worst case (query 3b), it is at
least 8 times faster. In the best
case (query 11) it is 300 000 times faster."
.
Here's how benchmarks are run, professionally:
1. You publish the DBMS configuration i.e., don't take the default
settings of one DBMS and compared them against optimized settings of
another, across totally different operating systems. FWIW, Virtuoso
is actually faster on Windows than it is on Linux. Open Source might
be cool, but Linux isn't Solaris, AIX, or HP-UX.
2. Take the time to find out if your competitor has published
optimization guides [2] and benchmark results too [3][4][5].
We are always game for a DBMS performance shootout, but we do prefer
to have these conducted in the right manner, based on existing
industry best practices.
Others:
if you find my tone harsh, well I just can't let innuendo of this
kind fly about the place. Even worse, it can't become the thrust of
a hackathon that would actually be much better served demonstrating
the utility inherent in being able to create, share, and remix
Linked Open Data etc., using a variety of tools.
The big issue right now isn't the fastest DBMS, the biggest issue
right now is data-de-silo-fication or data desiloization i.e., the
separation of data, databases, and database management systems. The
aforementioned issue is where ontologies provide immense help;
especially bearing in mind the existing of a massive Linked Open
Data cloud that's available to any tool that can de-reference an
HTTP URI and make sense of the entity relations they unveil.
Links:
[1]
http://nitrosbase.com/wp-content/uploads/2014/02/NitrosBase_sp2bench_Eng_V17.pdf
-- Your benchmark report
[2] http://bit.ly/1bGhpSK -- Blog post about Virtuoso and the TPC-H
benchmark, which includes configuration information
[3] http://bit.ly/ZOCmaD -- Virtuoso Star Schema Benchmark Results
(here we demonstrate SQL and SPARQL performance)
[4] http://bit.ly/12rpwio -- Berlin SPARQL Benchmark Report at
scales up to 150 Billion Triples
[5] http://bit.ly/Yf5etP -- Berlin SPARQL Benchmark covering
Virtuoso 6.x and Virtuoso 7.x amongst others.
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
|
smime.p7s
Description: S/MIME Cryptographic Signature
_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014
Community Portal: http://ontolog.cim3.net/wiki/ (01)
|