ontology-summit
[Top] [All Lists]

Re: [ontology-summit] Ontology Summit 2014 Hackathon - Optimized SPARQL

To: ontology-summit@xxxxxxxxxxxxxxxx, Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
From: Victor Chernov <vchernov@xxxxxxxxxxxxxx>
Date: Tue, 1 Apr 2014 18:50:57 +0400
Message-id: <1659741369.20140401185057@xxxxxxxxxxxxxx>
Dear Colleagues,

We are close to the end of our Hackathon project, the preliminary report is attached.

We have received a letter from Kingsley Idehen, Founder & CEO OpenLink Software. He had expressed interest to our event and made some remarks. Many remarks were very useful, unfortunately some of them were caused by misunderstood or obsolete information. Here are our comments to his letter

Dear Kingsley.

Thank you for your attention to our project.

First of all, sorry for misinformation, the paper you base your arguments is kind of obsolete now. We have made our previous tests when Virtuoso version 6 was an actual version.

On the hackathon we've used actual version for every product. It was Virtuoso Universal Server Release 7.1, Stardog 2.1.2, and NitrosBase RDF Storage 1.0 Release Candidate. All the triplestores on the Hackathon were run under Windows (64-bit).

Thanks to the Hackathon we've discovered some architectural features of the products. In particular Virtuoso's column-wise storage.

It's a pity that nobody from Virtuoso and Stardog had attended our Hackathon. That would be a real opportunity to make the necessary tuning steps before running the queries. Day before the event I have received a message from Virtuoso guy Ivan Mikhailov, who expressed his interest in our Hackathon project. But we didn't see him on the Hackathon. That's why we're forced to run all the products using default settings (NitrosBase too).

Thank you for providing the links to optimization guides. Unfortunately we were unable to find those guides earlier: they are on an external, not Virtuoso site. It would be much easier for developers if they could find them on OpenLink site.

We can't agree with your claim that this project doesn't suit the hackathon and that performance is not an issue.

For example, "Reference Data for Anime and Manga" Hackathon project a couple of days ago run into query performance issues. They have created an RDF data, but couldn't work with it. They spend a bunch of time optimizing the store performance to make the ontology-based queries workable solution.

Another argument. We all know 3Vs (volume, variety and velocity) in a definition of big data. Velocity is already there. We all believe that the moment all performance issues for RDF tools are resolved - the world would immediately switched to RDF and don't remain stuck with rigid relational model.  

We are a small company, but we are developing and perfecting our NitrosBase technology for more than 20 years. Our products are the demonstration of our algorithms and technology features. We are open for cooperation. Now we are talking with dot15926 team on integration of our tools. We can cooperate with all interested teams to improve tools for all community.

Other questions we are ready to discuss via personal E-mail.

Regards,
Victor Chernov
vchernov@xxxxxxxxxxxxxx

==============================================

> Ontology Summit 2014 Hackathon - Optimized SPARQL performance
> management via native API Status report on Hi Colleagues,
>
> Here is the status report from Hackathon #3 (Optimized SPARQL
> performance management via native API).
>
> STATUS as at 8pm CET 30 March
>
> We have installed all necessary software on the machines of 3
> experiment participants. The software includes Virtuoso Universal
> Server (trial version), Stardog (free eval), NitrosBase RDF Storage,
> C# application to run the test queries.
>
> The C# application was developed in advance, so we just needed to
> explain the participants the main ideas, the program structure and the
> ways the program communicates each RDF tool.
>
> Each participant generated 25 mln triples n3 file using an utility
> from SP2Bench site, then loaded that file into each database.
>
> Every participant run the application against all 3 databases.
>
> As the process of testing takes considerable amount of time we made a
> break.
>
> This moment we are fixing the problems with a) Virtuoso license that
> has been expired on one of our computers accidentally. b) The test
> query set run incorrectly on another computer. Then we'll run the test
> application again and hopefully can start analyzing the results.
>
> Regards,
> Victor
>

Victor,

I've been reading the material and general narrative that you are
pushing in regards to performance comparisons against Virtuoso [1].

For starters, you do know that we have two current versions of Virtuoso
i.e., version 6.x and 7.x. Where 7.x is based on column-wise storage,
key compression, and vectorized query execution etc.,  right? In
addition, if you are going to tout a benchmark, one which you will not
win against Virtuoso, why have you gone specifically for SP2?

If you are going to attempt to make a showcase of your product against
Virtuoso, at the very least undertake the adventure in objective fashion.

At this juncture, are you claiming to be the fastest DBMS for a 25
million triples RDF dataset, running the SP2 benchmark where you have
Virtuoso running on Linux and your DBMS running on Windows? Without even
getting into the numbers, what's stopped you running Virtuoso 6.x and
7.x on Windows as part of this effort?

To matters worse, you actually make this audacious claim in your
collateral based on this most subjective benchmark:

" ..Benchmark shows that NitrosBase RDF Storage is tens or hundreds
times faster than Openlink Virtuoso on most queries. In the worst case
(query 3b), it is at least 8 times faster. In the best case (query 11)
it is 300 000 times faster." .

Here's how benchmarks are run, professionally:

1. You publish the DBMS configuration i.e., don't take the default
settings of one DBMS and compared them against optimized settings of
another, across totally different operating systems. FWIW, Virtuoso is
actually faster on Windows than it is on Linux. Open Source might be
cool, but Linux isn't Solaris, AIX, or HP-UX.

2. Take the time to find out if your competitor has published
optimization guides [2] and benchmark results too [3][4][5].

We are always game for a DBMS performance shootout, but we do prefer to
have these conducted in the right manner, based on existing industry
best practices.

Others:
if you find my tone harsh, well I just can't let innuendo of this kind
fly about the place. Even worse, it can't become the thrust of a
hackathon that would actually be much better served demonstrating the
utility inherent in being able to create, share, and remix Linked Open
Data etc., using a variety of tools.

The big issue right now isn't the fastest DBMS, the biggest issue right
now is data-de-silo-fication or data desiloization i.e., the separation
of data, databases, and database management systems. The aforementioned
issue is where ontologies provide immense help; especially bearing in
mind the existing of a massive Linked Open Data cloud that's available
to any tool that can de-reference an HTTP URI and make sense of the
entity relations they unveil.

Links:

[1]
http://nitrosbase.com/wp-content/uploads/2014/02/NitrosBase_sp2bench_Eng_V17.pdf
-- Your benchmark report
[2] http://bit.ly/1bGhpSK -- Blog post about Virtuoso and the TPC-H
benchmark, which includes configuration information
[3] http://bit.ly/ZOCmaD -- Virtuoso Star Schema Benchmark Results (here
we demonstrate SQL and SPARQL performance)
[4] http://bit.ly/12rpwio -- Berlin SPARQL Benchmark Report at scales up
to 150 Billion Triples
[5] http://bit.ly/Yf5etP -- Berlin SPARQL Benchmark covering Virtuoso
6.x and Virtuoso 7.x amongst others.

--

Regards,

Kingsley Idehen        
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen



Attachment: Optimized SPARQL performance management via native API (preliminary report V01).pdf
Description: Adobe PDF document


_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
Community Portal: http://ontolog.cim3.net/wiki/     (01)
<Prev in Thread] Current Thread [Next in Thread>