OntologySummit2014_Hackathon - Project:    (4A3W)

Optimized SPARQL performance management via native API    (4A3X)

Project roster page: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014_Hackathon_OptimizedSPARQLviaNativeAPI (this page).    (4A3Y)

Team: VictorChernov (MSK, UTC+4) vchernov at nitrosbase.com (lead), Vladislav Golovkov (MSK, UTC+4) vgolovkov at nitrosbase.com    (4AYV)

Post-event updates    (4CUX)

Optimized SPARQL performance management via native API Hackathon took place on March 29, 2014 14:00 - 18:00 MSK (virtual session) with subsequent activities during the next day.    (4CUY)

Four people participated in the event:    (4CUZ)

  1. Victor Chernov (team lead), General Manager at NitrosData Rus, Russia, vchernov@nitrosbase.com;    (4CV0)
  2. Vladislav Golovkov System Architect at NitrosData Rus, Russia, vgolovkov@nitrosbase.com;    (4CV1)
  3. Andrej Andrejev, Ph.D. student at Uppsala University, Sweden, andrej.andrejev@it.uu.se;    (4CV2)
  4. Vladimir Salnikov, Head of QA Department at Compile Group, Russia, vladimir.salnikov@compilesoft.ru.    (4CV3)

During the event triplestores have been installed, benchmark queries prepared, experiments have been set up and run. The results are discussed and published (see link below).    (4CV4)

The main conclusions are:    (4CV5)

  1. Performance is bottleneck for ontological technologies.    (4CV6)
  2. It is desirable to have direct access to database, not through TCP protocol.    (4CV7)
  3. Sometimes it is worth to simplify the queries as much as possible and make some processing on the client.    (4CV8)
  4. Very often what is difficult to do with a single large query is easy to implement with a set of small ones. In those cases triplestore should be able to perform small queries quickly. Further performance gain could be reached giving the users direct access to database, bypassing SPARQL processing.    (4CV9)
  5. The myth that RDF database is slower than SQL does not work anymore. RDF storages perform fast and can compete with SQL databases.    (4CVA)

The report can be downloaded from    (4CVB)

http://nitrosbase.com/wp-content/uploads/OptimizedSPARQLreportV13.zip    (4CVC)


Event announce    (4CVD)

Participation: You are welcome to participate, please send an E-mail to vchernov at nitrosbase.com.    (4AYW)

Event starts 29th of March 2014 14:00 MSK / 10:00 UTC / 03:00 PST all over the world    (4A59)

Communication:    (4B6X)

  1. Google Hangout - We'll connect you according to participant's E-mail;    (4B6Y)
  2. Skype - Will be used as additional tool if the number of Google Hangout connections exceeded. Please add vladislav.golovkov to your Skype contact list.    (4B6Z)

A Caveat Concerning the IPR Policy Conformance:    (4D8B)


The Goals of the project are    (4A40)

Studying the kinds of queries revealing the advantages of one or another RDF database. The goals imply:    (4A41)

The following triplestores will be compared:    (4A47)

The triplestores have the following important advantages:    (4A4B)

It is important to use native API for fast query execution. All 3 tools provide native API:    (4A4F)

Virtuoso    (4A4G)
Jena, Sesame and Virtuoso ODBC RDF Extensions for SPASQL    (4A4H)
Stardog    (4A4I)
the core SNARL (Stardog Native API for the RDF Language) classes and interfaces    (4A4J)
NitrosBase    (4A4K)
C++ and .NET native API    (4A4L)

We suppose writing additional codes needed for accurate testing:    (4A4M)

The following steps are needed for loading test dataset:    (4A4S)

Note: Data are considered as loaded as soon as the system is ready to perform a simplest search query. This is done to eliminate background processes (eg. indexing).    (4A4V)

We are going to explore the query execution performance by the databases under consideration (Virtuoso, Stardog, NitrosBase).    (4A4W)

The queries should be fairly simple and cover the different techniques, for example:    (4A4X)

Note: During testing each database may allocate a lot of resources, that can affect the performance of other databases. ThatÂ’s why each test should be stared from system reboot.    (4A56)