ontology-summit
[Top] [All Lists]

Re: [ontology-summit] Ontology Summit 2014 Hackathon - Optimized SPARQL

To: "'Ontology Summit 2014 discussion'" <ontology-summit@xxxxxxxxxxxxxxxx>, "'Victor Chernov'" <victor.chernov@xxxxxxxxxxxxxx>
From: "Anatoly Levenchuk" <ailev@xxxxxxxxxxx>
Date: Tue, 1 Apr 2014 22:01:00 +0400
Message-id: <047901cf4dd4$57ae9b20$070bd160$@asmp.msk.su>

I guess that I should answer at least one issue: whether «Optimized SPARQL performance management via native API» is a valid hackathon project that fit into overall Ontology Summit 2014 Hackathon topic or not.

 

Main accent here is about “native API” that should address gap between different types of data formats, query languages, etc.. Good development of this project can be repeating benchmarks on non-native API and tell to people the difference: what is tax of standardization and openness of current semantic web stack. This project study Velocity and aspect of Variety with regards of “native API”, not especially Volume or Openness or Data Quality, etc.. By the way, if performance breakthrough of NitrosBase will be acknowledge, they will be thinking about shift to not only “presenting semantic data to public” but about reasoning tasks too (may be also with help of their “native API”), and benchmarking reasoning, but this is also too much for this particular hackathon.

 

By the way, .15926 Editor have triple-store inside and “native API” only. It can query SPARQL endpoints but cannot be queried as an standard SPARQL endpoint. Still it has its own problems with performance with specific ontological queries (like search of tricky classification patterns). Having public available results for “native API” benchmark can be useful for .15926 Editor team to compare their embedded triple-store (with “native API only”) work with current state-of-the-art of stand-alone triple-stores industry.

 

Kingsley, it seems this hackaton project not finished yet, report of Victor Chernov team is only preliminary one. You welcomed to Victor Chernov team, it is not too late to join. Your advices and help can be valuable to achieve Ontology Summit 2014 Hackathon goals.

VictorC, it seems to me that you should explain more in your final Hackathon report about using of “native API” (that API regard semantic data as a kind of “structured data” and therefore somehow fill the Data structure gap) in benchmarking experiments with triple stores. I wonder why questions was more like “what operation system have you used” but not “what with your native API”.

 

Best regards,

Anatoly

 

From: ontology-summit-bounces@xxxxxxxxxxxxxxxx [mailto:ontology-summit-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Kingsley Idehen
Sent: Tuesday, April 01, 2014 9:15 PM
To: Victor Chernov; ontology-summit@xxxxxxxxxxxxxxxx
Subject: Re: [ontology-summit] Ontology Summit 2014 Hackathon - Optimized SPARQL performance management via native API Status report

 

On 4/1/14 10:46 AM, Victor Chernov wrote:

We can't agree with your claim that this project doesn't suit the hackathon and that performance is not an issue.


I am not saying that performance isn't an issue. I am saying, in regards to an ontology themed hackathon there are more important issues in the realms of:

1. data access
2. data integration and remixing
3. data quality assessment and adjustments
4. inference and reasoning as drivers re. 1-3 above.



For example, "Reference Data for Anime and Manga" Hackathon project a couple of days ago run into query performance issues. They have created an RDF data, but couldn't work with it. They spend a bunch of time optimizing the store performance to make the ontology-based queries workable solution.


We host a Linked Open Data cache [1] comprised of 50 Billion+ RDF statements, on a whim, I can load an useful dataset into that instance for the entire planet to query (on an ad-hoc basis) as each user sees fit. This has been the case for years.


Another argument. We all know 3Vs (volume, variety and velocity) in a definition of big data. Velocity is already there.


Again, why do you think this is news to an organization that's obsessed with every aspect of DBMS performance and scalability? Our live instances [2][3][4] have always been available to hackathons dating back to the inception of DBpedia and the Linked Open Data cloud it helped bootstrap.


We all believe that the moment all performance issues for RDF tools are resolved - the world would immediately switched to RDF and don't remain stuck with rigid relational model.  


Again, performance is a none issue in our world. There are many issues you are yet to encounter and tackle in your product. To understand what I mean I would encourage you to emulate DBpedia, URIBurner, or the Linked Open Data cache. Each of these is a Virtuoso instance, so if I am to taking your claims seriously, in the slightest, you can an least put out such an instance.

The Big Data is is an intersection of problems, not a single problem. I would never pitch Virtuosos high-performance and massive scalability as the solution to the aforementioned intersection of problems. The real solution lies in the ability to loosely couple the following:

1. Identifiers
2. Entity Relations (Data)
3. Entity Relations _expression_ Syntax & Notations
4. Entity Relations Serialization Formats
5. Databases (Datasets)
6. Database Management Systems.

Get 1-6 sort and we don't have a Big Data problem at all.


We are a small company, but we are developing and perfecting our NitrosBase technology for more than 20 years. Our products are the demonstration of our algorithms and technology features. We are open for cooperation. Now we are talking with dot15926 team on integration of our tools. We can cooperate with all interested teams to improve tools for all community.


Fine, but don't make careless presentation of benchmark claims that involve 3rd party products. There are best practices for performing and presenting benchmarks results.

BTW -- There is an LDBC [5] effort in place for any organization interested in the construction and use of benchmarks aimed at graph model oriented databases. You will have a lot of cooperation on the table if you get involved etc..


Links:

[1] http://lod.openlinksw.com -- /sparql for the SPARQL endpoint
[2] http://dbpedia.org/sparql -- DBpedia
[3] http://dbpedia-live.openlinksw.com -- DBpedia Live Edition that we host
[4] http://linkeddata.uriburner.com -- an OLTP instance that allows read-write operations in addition to ad-hoc querying etc..
[5] http://ldbc.eu -- Linked Data Benchmark Council


Other questions we are ready to discuss via personal E-mail.

Regards,
Victor Chernov
vchernov@xxxxxxxxxxxxxx




-- 
 
Regards,
 
Kingsley Idehen             
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
 
 
 
 

_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
Community Portal: http://ontolog.cim3.net/wiki/     (01)
<Prev in Thread] Current Thread [Next in Thread>