ontology-summit
[Top] [All Lists]

[ontology-summit] Hackathon - Optimized SPARQL performance management vi

To: ontology-summit@xxxxxxxxxxxxxxxx
From: Victor Chernov <vchernov@xxxxxxxxxxxxxx>
Date: Mon, 24 Mar 2014 14:26:40 +0400
Message-id: <1713506174.20140324142440@xxxxxxxxxxxxxx>
Project summary:

Project roster page: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014_Hackathon_OptimizedSPARQLviaNativeAPI
Team lead: VictorChernov (MSK, UTC+4) vchernov at nitrosbase.com
Event starts 29th of March 2014 14:00 MSK / 10:00 UTC / 03:00 PST all over the world

The Goals of the project are
Studying the kinds of queries revealing the advantages of one or another RDF database. The goals imply:
- Selection of a SPARQL subset from SP2Bench
- Forming a dataset and loading it to all triple-stores.
- Implementing measurement aids, testing
- Accurate time measurement, getting min, max, average and median times.
- Reflection on the results, advantages and disadvantages of the triplestores on each selected query.

The following triplestores will be compared:

- Virtuoso
- Stardog
- NitrosBase

The triplestores have the following important advantages:

- Very high performance on demonstrated on sp2bench benchmark
- Linux and Windows versions
- Native API for fast query processing

It is important to use native API for fast query execution. All 3 tools provide native API:

Virtuoso
       Jena, Sesame and Virtuoso ODBC RDF Extensions for SPASQL
Stardog
       the core SNARL (Stardog Native API for the RDF Language) classes and interfaces
NitrosBase
       C++ and .NET native API

We suppose writing additional codes needed for accurate testing:

- Accurate time measurement;
- Functions for getting min, max, average and median times;
- Functions for getting time of scanning through the whole query result;
- Functions for getting time of retrieving first several records (for example, the first page of web grid);
- Etc.

The following steps are needed for loading test dataset:

- Selecting a data subset from sp2bench benchmark
- Measuring data loading time

Note: Data are considered as loaded as soon as the system is ready to perform a simplest search query. This is done to eliminate background processes (eg. indexing).

We are going to explore the query execution performance by the databases under consideration (Virtuoso, Stardog, NitrosBase).

The queries should be fairly simple and cover the different techniques, for example:

- search the small range of values
- search the big range of values
- Sorting
- Aggregation
- Several different join queries
- Retrieving part of result
- Retrieving whole result
- etc.

Note: During testing each database may allocate a lot of resources, that can affect the performance of other databases. That’s why each test should be stared from system reboot.




_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
Community Portal: http://ontolog.cim3.net/wiki/     (01)
<Prev in Thread] Current Thread [Next in Thread>