[Top] [All Lists]

Re: [ontolog-forum] Neo4J for Ontology

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Mon, 23 Apr 2012 08:35:38 -0400
Message-id: <4F954C9A.80508@xxxxxxxxxxx>
Duane,    (01)

I'd just like to comment on the following point:    (02)

>> I'm sure that things like Neo4J can be useful for many applications,
>> but if you really have large graphs and large numbers of graphs,
>> you need to index the data.  And there are some very good methods
>> for indexing finding exact and approximate matches.    (03)

> DN: Yes.  This is why they are used for big data sets like FaceBook and
> Google.  Each user starts their social graph with an index of themselves.
> The traversal then finds other nodes related to that user.  The advantage
> here is also the flexibility offered by graph databases IMO.  RDBMS is
> much more inflexible as the schema has to be changed to add a single
> property for an object.  Graph DB's allow one property to be added then,    (04)

That is the kind of application for which graph traversal is good:    (05)

  1. You have a pointer to a clearly defined starting point, such as
     a web page for a specific individual.  For info about a specific
     individual, FaceBook is a *structured* DB for which they have
     predefined specific categories that they use for well-worn
     branches in the graph.  Most paths are short, and people seldom
     ask complex queries.    (06)

  2. RDBMS is a different kind of structured data, and I certainly do not
     intend to support all the encrustations and limitations that have
     evolved over the years in RDBMS.  Nor do I endorse all the quirks
     and peculiarities of SQL, which I used to call "the worst notation
     for logic that has ever been inflicted on innocent users" -- but
     that was before I saw RDF and OWL.    (07)

  3. As for physical layout, graphs and tables are two logically
     equivalent choices -- anything you can store in one can be mapped
     to the other.  That is an implementation choice.  In terms of
     matrices, a densely populated matrix is best stored in table form,
     and a sparse matrix is best stored in a graph form.    (08)

  4. Ideally, the users (both programmers and end users) should not need
     to know or care about the implementation.  For all its flaws, SQL is
     far better than graph traversal for complex queries.  Most object
     oriented DBs offer programmers a choice of SQL vs native path-based
     methods.  And most of them choose SQL.    (09)

  5. I also agree that flexibility is extremely important.  A major
     complaint about RDBMS is the need for a DB administrator to define
     a schema in advance. But note that casual users love *spreadsheets*
     for dense data.  Their table headings are a rudimentary, easy-to-
     change schema, and users love the simplicity of a rectangular grid.    (010)

  6. None of the issues listed above are new.  They were very thoroughly
     discussed and analyzed during the "DB wars" of the 1970s, and there
     has been 40 more years of R & D on all those issues.  My major
     complaint about the SW is that they ignored all that R & D and
     forced a one-size-fits-all format on everybody.    (011)

  7. The SW notion of interoperability is to provide a mapping from
     RDB to RDF.  But that is the *worst conceivable* approach.  It is
     unbelievably inefficient for dense data, and it is vastly worse
     than SQL for complex queries.  If anybody had suggested that method
     at a VLDB conference in the 1980s or '90s they would have been
     laughed out of the room in disgrace.    (012)

> DN: I think a lot of the issue you note are also solved by better
> modelling and indexing.  Nevertheless, it will be very interesting to
> watch the 2012 growth of these Graph DB's.    (013)

Graph DBs and RDBMS are both designed for professional programmers
who are forced to dig into the implementation details.  The 40+ years
of R & D on databases focused on implementation-independent methods.
You don't even need any research studies to see why application
programmers prefer JSON to RDF -- it's equally good for representing
graphs, tables, or trees.    (014)

I agree that for optimum performance on very large applications,
such as FaceBook or Amazon, professional systems programmers need
to get down into the bowels of the implementation.  Those systems
provide good interfaces for casual users who go to their sites.    (015)

But application programmers should *never* need to get into the
details of the implementation.  And interoperability across
independently developed systems should *always* be at a level
that is independent of the implementation.  That is the point
of the following paper and slides:    (016)

    Future directions in semantic systems    (017)

    Integrating Semantic Systems    (018)

John    (019)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (020)

<Prev in Thread] Current Thread [Next in Thread>