Duane, (01)
I'd just like to comment on the following point: (02)
JFS
>> I'm sure that things like Neo4J can be useful for many applications,
>> but if you really have large graphs and large numbers of graphs,
>> you need to index the data. And there are some very good methods
>> for indexing finding exact and approximate matches. (03)
> DN: Yes. This is why they are used for big data sets like FaceBook and
> Google. Each user starts their social graph with an index of themselves.
> The traversal then finds other nodes related to that user. The advantage
> here is also the flexibility offered by graph databases IMO. RDBMS is
> much more inflexible as the schema has to be changed to add a single
> property for an object. Graph DB's allow one property to be added then, (04)
That is the kind of application for which graph traversal is good: (05)
1. You have a pointer to a clearly defined starting point, such as
a web page for a specific individual. For info about a specific
individual, FaceBook is a *structured* DB for which they have
predefined specific categories that they use for well-worn
branches in the graph. Most paths are short, and people seldom
ask complex queries. (06)
2. RDBMS is a different kind of structured data, and I certainly do not
intend to support all the encrustations and limitations that have
evolved over the years in RDBMS. Nor do I endorse all the quirks
and peculiarities of SQL, which I used to call "the worst notation
for logic that has ever been inflicted on innocent users" -- but
that was before I saw RDF and OWL. (07)
3. As for physical layout, graphs and tables are two logically
equivalent choices -- anything you can store in one can be mapped
to the other. That is an implementation choice. In terms of
matrices, a densely populated matrix is best stored in table form,
and a sparse matrix is best stored in a graph form. (08)
4. Ideally, the users (both programmers and end users) should not need
to know or care about the implementation. For all its flaws, SQL is
far better than graph traversal for complex queries. Most object
oriented DBs offer programmers a choice of SQL vs native path-based
methods. And most of them choose SQL. (09)
5. I also agree that flexibility is extremely important. A major
complaint about RDBMS is the need for a DB administrator to define
a schema in advance. But note that casual users love *spreadsheets*
for dense data. Their table headings are a rudimentary, easy-to-
change schema, and users love the simplicity of a rectangular grid. (010)
6. None of the issues listed above are new. They were very thoroughly
discussed and analyzed during the "DB wars" of the 1970s, and there
has been 40 more years of R & D on all those issues. My major
complaint about the SW is that they ignored all that R & D and
forced a one-size-fits-all format on everybody. (011)
7. The SW notion of interoperability is to provide a mapping from
RDB to RDF. But that is the *worst conceivable* approach. It is
unbelievably inefficient for dense data, and it is vastly worse
than SQL for complex queries. If anybody had suggested that method
at a VLDB conference in the 1980s or '90s they would have been
laughed out of the room in disgrace. (012)
> DN: I think a lot of the issue you note are also solved by better
> modelling and indexing. Nevertheless, it will be very interesting to
> watch the 2012 growth of these Graph DB's. (013)
Graph DBs and RDBMS are both designed for professional programmers
who are forced to dig into the implementation details. The 40+ years
of R & D on databases focused on implementation-independent methods.
You don't even need any research studies to see why application
programmers prefer JSON to RDF -- it's equally good for representing
graphs, tables, or trees. (014)
I agree that for optimum performance on very large applications,
such as FaceBook or Amazon, professional systems programmers need
to get down into the bowels of the implementation. Those systems
provide good interfaces for casual users who go to their sites. (015)
But application programmers should *never* need to get into the
details of the implementation. And interoperability across
independently developed systems should *always* be at a level
that is independent of the implementation. That is the point
of the following paper and slides: (016)
http://www.jfsowa.com/pubs/futures.pdf
Future directions in semantic systems (017)
http://www.jfsowa.com/talks/iss.pdf
Integrating Semantic Systems (018)
John (019)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (020)
|