ontology-summit
[Top] [All Lists]

Re: [ontology-summit] Ontology Summit 2014 Hackathon - Optimized SPARQL

To: Victor Chernov <victor.chernov@xxxxxxxxxxxxxx>, ontology-summit@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Tue, 01 Apr 2014 13:15:05 -0400
Message-id: <533AF419.6010905@xxxxxxxxxxxxxx>
On 4/1/14 10:46 AM, Victor Chernov wrote:
We can't agree with your claim that this project doesn't suit the hackathon and that performance is not an issue.

I am not saying that performance isn't an issue. I am saying, in regards to an ontology themed hackathon there are more important issues in the realms of:

1. data access
2. data integration and remixing
3. data quality assessment and adjustments
4. inference and reasoning as drivers re. 1-3 above.


For example, "Reference Data for Anime and Manga" Hackathon project a couple of days ago run into query performance issues. They have created an RDF data, but couldn't work with it. They spend a bunch of time optimizing the store performance to make the ontology-based queries workable solution.

We host a Linked Open Data cache [1] comprised of 50 Billion+ RDF statements, on a whim, I can load an useful dataset into that instance for the entire planet to query (on an ad-hoc basis) as each user sees fit. This has been the case for years.

Another argument. We all know 3Vs (volume, variety and velocity) in a definition of big data. Velocity is already there.

Again, why do you think this is news to an organization that's obsessed with every aspect of DBMS performance and scalability? Our live instances [2][3][4] have always been available to hackathons dating back to the inception of DBpedia and the Linked Open Data cloud it helped bootstrap.

We all believe that the moment all performance issues for RDF tools are resolved - the world would immediately switched to RDF and don't remain stuck with rigid relational model.  

Again, performance is a none issue in our world. There are many issues you are yet to encounter and tackle in your product. To understand what I mean I would encourage you to emulate DBpedia, URIBurner, or the Linked Open Data cache. Each of these is a Virtuoso instance, so if I am to taking your claims seriously, in the slightest, you can an least put out such an instance.

The Big Data is is an intersection of problems, not a single problem. I would never pitch Virtuosos high-performance and massive scalability as the solution to the aforementioned intersection of problems. The real solution lies in the ability to loosely couple the following:

1. Identifiers
2. Entity Relations (Data)
3. Entity Relations _expression_ Syntax & Notations
4. Entity Relations Serialization Formats
5. Databases (Datasets)
6. Database Management Systems.

Get 1-6 sort and we don't have a Big Data problem at all.

We are a small company, but we are developing and perfecting our NitrosBase technology for more than 20 years. Our products are the demonstration of our algorithms and technology features. We are open for cooperation. Now we are talking with dot15926 team on integration of our tools. We can cooperate with all interested teams to improve tools for all community.

Fine, but don't make careless presentation of benchmark claims that involve 3rd party products. There are best practices for performing and presenting benchmarks results.

BTW -- There is an LDBC [5] effort in place for any organization interested in the construction and use of benchmarks aimed at graph model oriented databases. You will have a lot of cooperation on the table if you get involved etc..


Links:

[1] http://lod.openlinksw.com -- /sparql for the SPARQL endpoint
[2] http://dbpedia.org/sparql -- DBpedia
[3] http://dbpedia-live.openlinksw.com -- DBpedia Live Edition that we host
[4] http://linkeddata.uriburner.com -- an OLTP instance that allows read-write operations in addition to ad-hoc querying etc..
[5] http://ldbc.eu -- Linked Data Benchmark Council

Other questions we are ready to discuss via personal E-mail.

Regards,
Victor Chernov
vchernov@xxxxxxxxxxxxxx


-- 

Regards,

Kingsley Idehen	      
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2014/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2014  
Community Portal: http://ontolog.cim3.net/wiki/     (01)
<Prev in Thread] Current Thread [Next in Thread>