[Top] [All Lists]

Re: [ontolog-forum] ONTOLOG community event planning and scheduling sess

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Fri, 13 Sep 2013 08:37:01 -0400
Message-id: <523306ED.2010805@xxxxxxxxxxxxxx>
On 9/13/13 4:23 AM, Michael Brunnbauer wrote:
Hello Ed,

I agree wholeheartedly. RDF and SPARQL make data integration easier (without
solving the fundamental issues of course).

What is the fundamental issue, as you see it?

I see the fundamental issue (or pain point) being data-de-silo-fication.

DBMS products weren't historically built to co-exist. They were built to compete against each other on the basis of core DBMS features (for which interop is an afterthought).

 But they are a bad option for data 
storage because maintaining consistency is so difficult (think about deleting
a row or transactions). 

I don't know what that really means. I say that because we simply don't have that problem in our hybrid DBMS.

There should be a big warning sign above the SPARQL 
UPDATE standard for those who think relational databases are legacy. But I 
think nobody in the Semantic Web video or on this list actually said that ?

I don't thing SQL RDBMS technology is legacy. At the same time, I don't think that are at the apex of the value pyramid circa. 2013 where our main challenge is access, integration, and management of disparate data.

A fast silo is still a silo. We need to deal with data-de-silo-fication .



Michael Brunnbauer

On Thu, Sep 12, 2013 at 04:27:49PM +0000, Barkmeyer, Edward J wrote:
First of all, I agree strongly with John Sowa (for a change).  "Extended RDBMS" are what the world runs on, and they have survived two fad replacement technologies.  RDF will simply be the third.  As John says, RDF can be a data representation at the interface to a data repository, but it provides no fundamentally different value.  SPARQL is just another query language, and a good "extended RDBMS" can run SPARQL queries.

The children who think they are reinventing data access with RDF triple stores need to understand that they are comparing a 5th normal form relational database with a 3rd normal form relational database, and  5th normal form is a more restricted specialization of 3rd normal form.  (Restating the formal mathematical definitions) In 3NF, the table usually names a class, some set of columns identifies a subject that is an instance of the class, and other sets of one or more columns state individual facts about that subject.  In 5th normal form, the table usually names a property,  one column designates the subject, and the other column, if any, designates a value of the property for that subject.  RDF is 5th normal form.  The advantage of 5th normal form is that it facilitates joins in multidatabases, which is precisely the intent of its use in the erstwhile "Semantic Web" and "Linked Open Data".  The disadvantage of 5th normal form is that you have to do a lot of joins to answer simple 
queries, and joins are expensive in large databases.  As Stonebraker and others in the mid-1980s pointed out, it is useful to convert selected 3rd normal form tables to 5th normal form for query-specific multidatabase joins ("distributed queries"), in order to deal with the problem of "partitioning" in multidatabases (an individual database can have some of the facts about a given set of things, or all of the facts about a subset of the things, and multiple database can overlap in both ways).  In latest-and-greatest terms that is to say, it is useful to convert selected database rows to RDF triples in order to answer certain queries.

On the other hand, RDF is particularly clumsy for dealing with data that is best represented in 4th normal form, such as a statement of a quantified property.  (In 4NF, a row states one fact, but the identifiers for the subject and the object can be multiple columns.)  The 'quantity' object is represented as two 'columns': number and unit.  In RDF/5NF the quantity becomes a database key (ooh IRI, but ad hoc) with two more 'assertions' that relate it to a number and a unit.  (Since engineering is what my division does, this is important to us.)

As John says, the right way forward is to see RDF as a standard representation for 5th normal form relational rows, and SPARQL as a query language that augments the capabilities of SQL (not as a replacement for it).  The real problem that neither solves is to get agreement on vocabulary and on the interpretation of individual data.

Now, as to persons in industry who think their relational database systems are "legacies", some of them are right.  Others are why the IT industry is permitted to waste billions of dollars/euros/yen on fad technologies and repainting of old ideas.

What makes a system a "legacy" is not the technology used, unless that technology is no longer supported, but rather its relevance to the way you currently do business.  There were a lot more database designers in the 1980s and 1990s than there were competent modelers who could build properly extensible conceptual schemas.  So, a lot of the purpose-built databases were brittle designs at the outset and have become as much a part of the problem as the business practice moved on.  It is poor fault analysis to say this is a consequence of the technology, without determining that the fault was not in the design.  ("A poor workman blames his tools.")   (If your database and your processing software assume that all of your products will be sold in barrels, by volume, and you subsequently get into the business of synthetic fibre, which is sold in spools, by length, is your legacy problem the fact that you used an RDBMS?)

Doing a new poor design with new technology just creates more expense and a new legacy system with a shorter lifecycle.  This is particularly the case when the would-be designers have little prior experience in what makes a good and flexible model, in the mistaken belief that their new technology will make up for their personal incompetence.  (I have seen a lot of weak or downright bad OWL models, and the problem is not in the technology.)  Database design, whatever the target technology, is a SKILL.  You have to learn the skill, and it involves understanding the technology, becoming familiar with the business problem space and the intended usages, and the learning the art of "abstraction", the "art of design".  In most failed database projects, the devil is not in the details, but rather in the overall conceptualization, or the lack of one.

The grave danger here is that we must teach the emerging workforce to use the solid technologies that are in use in industry in good designs, so that those technologies will continue to be supported.  We can allow the technologies that have fallen out of use, in favor of clearly better ones, to die.  But we should not allow pursuit of fads to destroy the future support for viable technologies that are in use.  The software industry has 60 years of experience.  If the new workforce only knows about the last 10, we have a serious education problem.  This industry desperately needs to examine new technologies in the light of older technologies and ask what is really different and how that is better; otherwise it spends a lot of time and money relearning the same lessons.

In so many words, RDF and RDBMS are closely related technologies, and neither SQL nor RDF is the solution to any problem.  They are tools, and a good workman will figure out which to use and how, when dealing with a specific problem.


(My blog of the week...)

Edward J. Barkmeyer                     Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Systems Integration Division
100 Bureau Drive, Stop 8263             Work:   +1 301-975-3528
Gaithersburg, MD 20899-8263             Mobile: +1 240-672-5800

"The opinions expressed above do not reflect consensus of NIST,
 and have not been reviewed by any Government authority."


Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J



Kingsley Idehen	      
Founder & CEO 
OpenLink Software     
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>