I like Kingsley's "data-de-silo-fication" theme. (In fact, I'm soon to
give an internal tech talk called "Down With Silos! How linked data is
beautifying the information landscape"). (01)
I want to contribute a different narrative, orthogonal to the
engineering discussion in this thread, but leading I think to the same
place Kingsley is heading. For brevity I'll keep it to bullet points. (02)
1. Enterprises depend for their success on people in the enterprise
doing the right thing at the right time.
2. People only know what is the right thing and how to do it by getting
good information in the form most useful to them at the time they need
3. They get the information they need primarily from "documents", taken
in the very general sense as some bounded, structured, purposeful
package of distinctions (glyphs, lines, colors, shapes, texture, sound,
4. Documents, being packages of differences, can be decomposed into
particles of significance related in particular ways.
5. RDF is a good way to represent, record, and exchange particles of
significance that are related in particular ways. Along with XML, HTML,
HTTP, and related W3C standards, we have a complete suite of tools for
delivering documents containing the information needed to the people who
need it to act for the success of the enterprise. (03)
There should be no dispute about RDBMS as an efficient storage and
retrieval machinery for relational data. I appreciate hearing about the
engineering and theoretical issues about such systems. However, those
issues are related to the problems of getting information to people at
the point of need only to the extent that system designers choose to
couple data persistence components to data delivery mechanisms. One of
the hallmarks of "legacy" systems is the unfortunate choice to closely
couple these components. (04)
I expect the discussion in this forum to focus on how to deliver
information to a human in the way that best meets his or her constantly
changing and not entirely predictable needs. Whether the data is
persisted on disk or papyrus, in Elbonian, SQL, NoSQL, or Linear B, may
be of great concern to the designers and engineers tasked with
supporting the information needs of an enterprise. But it should be
immaterial to discussions about what happens directly on each side of
the computer screen: that is, how documents are composed for display,
and how they are interpreted by the human on the other side of the
Those of us who focus on the 2-sides-of-the-screen problem domain have
found the W3C basic and semantic web technology stacks of inestimable
On Fri, 2013-09-13 at 10:07 -0400, Kingsley Idehen wrote:
> On 9/13/13 9:33 AM, Michael Brunnbauer wrote:
> > Hello Kingsley,
> > On Fri, Sep 13, 2013 at 08:37:01AM -0400, Kingsley Idehen wrote:
> > > > I agree wholeheartedly. RDF and SPARQL make data integration easier
> > > > (without
> > > > solving the fundamental issues of course).
> > > What is the fundamental issue, as you see it?
> ## In Turtle, for sake of clarity re, my world-view ##
> <#myLabel> "Data-de-silo-fication" ;
> <#sameAs> <#HeterogeneousDataFederation>, <#DataVirtualization>,
> <#DataSpaces>, <#MasterDataManagement> ;
> <#comment> """This problem covers data disparity issues that include:
> shape, location, and relation semantics (or lack thereof)""" .
> ## Turtle End ##
> So I assume we are in agreement re., the problem?
> > http://lists.w3.org/Archives/Public/public-lod/2013Jun/0458.html
> > > I see the fundamental issue (or pain point) being data-de-silo-fication.
> > RDF is nice for Extract Transform Load. The problems start if you want to
> > change data.
> Change sensitivity is handled via the use of Linked Data Views over
> disparate data sources. This is what R2RML facilitates albeit rarely
> mentioned, sadly.
> Views can be transient, materialized, or a configurable mix of both.
> That's certainly the case re. Virtuoso i.e., make a change in its SQL
> DBMS (or a remote ODBC or JDBC accessible DBMS) and they are reflected
> in all your SPARQL queries and Linked Data URI lookups. The same even
> applies to RESTful or SOA services that are attached to Virtuoso (we
> cover 100+ protocols and formats).
> We have Replication (Snapshot and Transactional) and HTTP (including
> cache invalidation) baked into Virtuoso.
> > > > But they are a bad option for data
> > > > storage because maintaining consistency is so difficult (think about
> > > > deleting
> > > > a row or transactions).
> > > I don't know what that really means.
> > Suppose you have an App with user registration. If you store the user data
> > in a triple store, deleting a user with SPARQL becomes difficult.
> That doesn't apply to every triplestore. That doesn't apply to
> Virtuoso. We even have large customer running OLTP like workflows with
> something like 40 million named graphs. BTW -- as part of the
> workflow, Virtuoso has to factor in deltas such that it doesn't
> perform wholesale named graph deletions etc.
> > Removing
> > a single triple is not enough. Storing the user in a named graph may help
> > probably creates other problems and definitely makes querying a lot more
> > complicated.
> > What about SPARQL transactions ? Starting a transaction, reading and
> > commiting the transaction.
> We are a full blown ACID DBMS. See our benchmark reports. These simply
> aren't new issues since we have a hybrid DBMS.
> > Is there a triple store that supports this with
> > all the fidelity of modern RDB systems ?
> Yes. It's called Virtuoso !
> > > I say that because we simply don't have that problem in our hybrid DBMS.
> > I don't know what that really means. Can I modify data with SPARQL *and* SQL
> > in your DBMS ? If yes, how does that work ?
> Of course you can. We support SPARQL 1.1 Update. We are SQL-99
> compliant. We do ACID. We have serious customers doing OLTP like stuff
> using RDF or SQL aspects of Virtuoso. 
> 1. http://bit.ly/ZOCmaD -- shows we even have the performance
> difference between SPARQL and SQL down to insignificant levels via
> Star Schema Benchmark Report
> 2. http://bit.ly/10pvAbF -- blog post about this effort
> 3. http://bit.ly/Yf5etP -- Berlin SPARQL Benchmark Report (note: this
> particular benchmark is SQL relational DBMS oriented)
> 4. http://bit.ly/14ULX2F -- 150 Billion triples scale report
> 5. http://bit.ly/RtdGjA -- CoRelational DBMS Concepts post that
> includes live links to R2RML Views built atop SQL data
> 6. http://bit.ly/13fnIbr -- example of R2RML views atop an Oracle DBMS
> hooked into Virtuoso via ODBC .
> > Regards,
> > Michael Brunnbauer
> > _________________________________________________________________
> > Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> > Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> > Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> > Shared Files: http://ontolog.cim3.net/file/
> > Community Wiki: http://ontolog.cim3.net/wiki/
> > To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
> Kingsley Idehen
> Founder & CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: https://plus.google.com/112399767740508618350/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (09)