ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] ONTOLOG community event planning and scheduling sess

To: "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "Barkmeyer, Edward J" <edward.barkmeyer@xxxxxxxx>
Date: Thu, 12 Sep 2013 16:27:49 +0000
Message-id: <a5e51926677e41e6ac8a6c7bdb01c1c6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

First of all, I agree strongly with John Sowa (for a change).  “Extended RDBMS” are what the world runs on, and they have survived two fad replacement technologies.  RDF will simply be the third.  As John says, RDF can be a data representation at the interface to a data repository, but it provides no fundamentally different value.  SPARQL is just another query language, and a good “extended RDBMS” can run SPARQL queries. 

 

The children who think they are reinventing data access with RDF triple stores need to understand that they are comparing a 5th normal form relational database with a 3rd normal form relational database, and  5th normal form is a more restricted specialization of 3rd normal form.  (Restating the formal mathematical definitions) In 3NF, the table usually names a class, some set of columns identifies a subject that is an instance of the class, and other sets of one or more columns state individual facts about that subject.  In 5th normal form, the table usually names a property,  one column designates the subject, and the other column, if any, designates a value of the property for that subject.  RDF is 5th normal form.  The advantage of 5th normal form is that it facilitates joins in multidatabases, which is precisely the intent of its use in the erstwhile “Semantic Web” and “Linked Open Data”.  The disadvantage of 5th normal form is that you have to do a lot of joins to answer simple queries, and joins are expensive in large databases.  As Stonebraker and others in the mid-1980s pointed out, it is useful to convert selected 3rd normal form tables to 5th normal form for query-specific multidatabase joins (“distributed queries”), in order to deal with the problem of “partitioning” in multidatabases (an individual database can have some of the facts about a given set of things, or all of the facts about a subset of the things, and multiple database can overlap in both ways).  In latest-and-greatest terms that is to say, it is useful to convert selected database rows to RDF triples in order to answer certain queries. 

 

On the other hand, RDF is particularly clumsy for dealing with data that is best represented in 4th normal form, such as a statement of a quantified property.  (In 4NF, a row states one fact, but the identifiers for the subject and the object can be multiple columns.)  The ‘quantity’ object is represented as two ‘columns’: number and unit.  In RDF/5NF the quantity becomes a database key (ooh IRI, but ad hoc) with two more ‘assertions’ that relate it to a number and a unit.  (Since engineering is what my division does, this is important to us.)

 

As John says, the right way forward is to see RDF as a standard representation for 5th normal form relational rows, and SPARQL as a query language that augments the capabilities of SQL (not as a replacement for it).  The real problem that neither solves is to get agreement on vocabulary and on the interpretation of individual data.

 

Now, as to persons in industry who think their relational database systems are “legacies”, some of them are right.  Others are why the IT industry is permitted to waste billions of dollars/euros/yen on fad technologies and repainting of old ideas. 

 

What makes a system a “legacy” is not the technology used, unless that technology is no longer supported, but rather its relevance to the way you currently do business.  There were a lot more database designers in the 1980s and 1990s than there were competent modelers who could build properly extensible conceptual schemas.  So, a lot of the purpose-built databases were brittle designs at the outset and have become as much a part of the problem as the business practice moved on.  It is poor fault analysis to say this is a consequence of the technology, without determining that the fault was not in the design.  (“A poor workman blames his tools.”)   (If your database and your processing software assume that all of your products will be sold in barrels, by volume, and you subsequently get into the business of synthetic fibre, which is sold in spools, by length, is your legacy problem the fact that you used an RDBMS?)

 

Doing a new poor design with new technology just creates more expense and a new legacy system with a shorter lifecycle.  This is particularly the case when the would-be designers have little prior experience in what makes a good and flexible model, in the mistaken belief that their new technology will make up for their personal incompetence.  (I have seen a lot of weak or downright bad OWL models, and the problem is not in the technology.)  Database design, whatever the target technology, is a SKILL.  You have to learn the skill, and it involves understanding the technology, becoming familiar with the business problem space and the intended usages, and the learning the art of “abstraction”, the “art of design”.  In most failed database projects, the devil is not in the details, but rather in the overall conceptualization, or the lack of one.

 

The grave danger here is that we must teach the emerging workforce to use the solid technologies that are in use in industry in good designs, so that those technologies will continue to be supported.  We can allow the technologies that have fallen out of use, in favor of clearly better ones, to die.  But we should not allow pursuit of fads to destroy the future support for viable technologies that are in use.  The software industry has 60 years of experience.  If the new workforce only knows about the last 10, we have a serious education problem.  This industry desperately needs to examine new technologies in the light of older technologies and ask what is really different and how that is better; otherwise it spends a lot of time and money relearning the same lessons. 

 

In so many words, RDF and RDBMS are closely related technologies, and neither SQL nor RDF is the solution to any problem.  They are tools, and a good workman will figure out which to use and how, when dealing with a specific problem. 

 

-Ed

 

(My blog of the week...)

 

 

--

Edward J. Barkmeyer                     Email: edbark@xxxxxxxx

National Institute of Standards & Technology

Systems Integration Division

100 Bureau Drive, Stop 8263             Work:   +1 301-975-3528

Gaithersburg, MD 20899-8263             Mobile: +1 240-672-5800

 

"The opinions expressed above do not reflect consensus of NIST,

 and have not been reviewed by any Government authority."

 

 

 

 

From: ontolog-forum-bounces@xxxxxxxxxxxxxxxx [mailto:ontolog-forum-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Juan Sequeda
Sent: Thursday, September 12, 2013 10:31 AM
To: [ontolog-forum]
Subject: Re: [ontolog-forum] ONTOLOG community event planning and scheduling session - Thu 2013.09.12 & Thu 2013.09.19

 

 

 

On Thu, Sep 12, 2013 at 8:33 AM, John F Sowa <sowa@xxxxxxxxxxx> wrote:

On 9/11/2013 10:48 PM, Juan Sequeda wrote:
> If you consider RDBMSs legacy systems, then we have the W3C RDB2RDF
> standards: Direct Mapping and R2RML, to bridge RDBMSs with the Semantic Web

Points to remember:

  1. RDBMSes are definitely *not* legacy systems.  They run the world
     economy today, and they'll continue running it for another 40 years.

 

Fair enough.

 

So what is the definition of legacy system? 

 

In my (short) experience, talking to real businesses and customers, many people consider RDBMSs as legacy systems. 

 

All in all, people want to bridge a previous technology (RDBMS) with new technology (Semweb). 

 

 

 


  2. RDB2RDF and Direct Mapping are throwbacks to ancient times of batch
     processing, and they only go in one direction.  They do *nothing*
     for interoperability.  In fact, they are *worse* than nothing,
     because they create an *illusion* of usefulness.

 

 

There are two modes of operation. ETL or Wrappers. 

 

With ETL, you have (if I understand correctly), what you are calling "batch processing". With wrappers, the data doesn't move. It continues to be in the RDBMS, and can be queried by SQL and SPARQL.

 


  3. The term NoSQL originally meant "no" as in "not exists".  But it
     quickly became an acronym for "Not only SQL".  Some of the most
     efficient NoSQL systems use SQL as their primary query language,
     but they implement SQL with new data structures and algorithms.

 

Our system, Ultrawrap, takes full advantage of the SQL infrastructure. We push down SPARQL optimization to the SQL optimizer. The result: the RDBMS successfully optimizes SPARQL and query execution of SPARQL vs SQL are comparable and sometimes equal.

 

 


  4. The major vendors of *commercial* SQL-based and RDF-based tools have
     a far deeper and more successful understanding of interoperability:
     They support *both* SQL and SPARQL, and they enable them to run
     *concurrently* on the same data.

 

I agree. See above. 

 


Fundamental requirement:  Equal support for SQL and SPARQL.

 

I agree. See above.   


Recommendation:  Adopt Datalog (or a typed version of Datalog) as
the fundamental DB language, and specify the core of *both* SQL and
SPARQL in terms of Datalog.  But each of them has idiosyncratic
additions; they must be supported, but they should be deprecated.

 

I completely agree! And actually, it has been proven that SPARQL and non-recursive safe Datalog with negation have equivalent expressive power. Therefore, by classical results, SPARQL is equivalent from an expressive point of view to Relational Algebra (SQL). 

 

 


For further discussion of the issues, see the following article by
Michael Stonebraker in the Communications of the ACM:

http://www.labouseur.com/courses/db/Stonebraker-on-NoSQL-2011.pdf

The db directory of this website also contains other downloads that
address related topics:

http://www.labouseur.com/courses/db/

 


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>