ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Neo4J for Ontology

To: edbark@xxxxxxxx, ontolog-forum@xxxxxxxxxxxxxxxx
From: Peter Neubauer <peter.neubauer@xxxxxxxxxxxxxxxxx>
Date: Tue, 24 Apr 2012 17:30:01 +0200
Message-id: <CAF59RW6rN0r-4LVZ4x0dqYFCOp9-yHusc5hBHp9p1uTZg6HWuA@xxxxxxxxxxxxxx>

+ 1,
Choosing the right tool for the job us the tenor of the whole NoSql movement.

On Apr 24, 2012 5:16 PM, "Ed Barkmeyer" <edbark@xxxxxxxx> wrote:


Duane Nickull wrote:
Personally, tightly binding the function to the persistence form is not a good idea architecturally, for ontology or otherwise.  An abstraction can be healthy.  Spring is awesome for this.

Abstractions are indeed healthy.  That is why we have modeling languages of various kinds.  Binding the function to a persistent form, however, is a necessary part of the concept "implementation".  >From the abstraction alone, you can' t get results.  A "graph DB" is an implementation form.  It is just a different implementation form from that of an RDB, and its models use a somewhat different abstraction.

In a similar way, the OODB folk felt that having a persistent structure that essentially matched their abstract object-oriented analysis model was clearly the optimal implementation form, since it "naturally" fit the problem space model they made.  Unfortunately, that illusion was shattered when somebody asked a question from a somewhat different viewpoint on the same space.

What I like about GraphDB's is that they seem to be somewhat aligned with the type of queries used for languages like KIF or JSON expressions of it.  I have been experimenting and found I like it a lot more than conventional RDBMS as there is often no ETL layer in the middle.

Having said that, this is IMO and YAMMV.  

I would be interested to know if there are any Ontology specific projects that use GraphDB's, specifically Neo4J, for the back end.  I see this as a potential huge win for the intelligence community but companies like Boeing who don't sell airplanes (they sell X million individual parts conveniently assembled as an airplane) could also gain from the functionality.

I would have said that every RDF triple store is a "graph DB", especially when the query language is SPARQL.

There are ways to do things in both worlds.  I am not knocking traditional RDBMS systems but rather pointing to the fact that neo4J has really impressed me.

Which means that, for the problem Duane posed, neo4J is probably a good tool.

All Len and I are saying is that this is about choosing the right tool for the job at hand, not about the intrinsic superiority of one kind of tool over another. 

-Ed


Duane
******************
COO and Director 
Uberity Technology Corporation 
"LiveCycle ES and Mobile Specialists"
@uberity @duanechaos

From: lenyabloko <lenyabloko@xxxxxxxxx>
Reply-To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
Date: Monday, 23 April, 2012 6:25 PM
To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
Subject: Re: [ontolog-forum] Neo4J for Ontology



----- Reply message -----
From: "Ed Barkmeyer" <edbark@xxxxxxxx>
To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
Subject: [ontolog-forum] Neo4J for Ontology
Date: Mon, Apr 23, 2012 12:43 pm


Len Yabloko wrote:

> I am sure you are aware of a deeper issue that make SQL and Graph DB fundamentally 
> different, that is complete axiomatization of SQL as well as referential integrity
> Reasoning is the ultimate application for both, with query being only one form of it. 

In the last 20 years, it seems to have become a requirement for somebody 
to "completely axiomatize" every computational modeling language, so as 
not to be left behind in the claims of "formal grounding".  Java is also 
formally grounded. So what?
LY:
The "formal grounding" is not an objective in itself. Completeness becomes critical issue when the goal of computation is answering a question, as opposed to pre-fetching all related information. 
What graph DBs and relational DBs have in common is their ability to 
assist in retrieving information for decision purposes.  As Len 
suggests, the great advantage of RDBs (and SQL compliance) is that they 
regularize the process of information creation and information update, 
thus guaranteeing consistency and reliability of most retrievals.  The 
great advantage of graph DBs is flexiblity -- the information in them 
can grow rapidly and somewhat unpredictably -- but the consequence is 
that it is very difficult to prune outdated information and to prevent 
the addition of incomplete and inconsistent information that may confuse 
critical applications.  (I think this is John's point, expressed from a 
different viewpoint.)

Both of these technologies are useful for big databases.  What should 
decide the choice of technology is what you want to do with the data.  
There is also a lot of practical business reasoning that now uses a 
hybrid database support.  The original operational data is maintained in 
the RDBs, and supports business operations effectively and reliably.  
Parts of that data are converted regularly or ad hoc to graph DB forms 
to support other associative queries and queries that require inferences. 
LY:
This is the crucial point. Inferences are not always faster in Graph DB. Only some very limited types of questions can be answered faster. If you compare the performance using the same set of generic queries SQL DB will have better performance. This is why there are attempts to combine SQL with no-SQL into so-called co-SQL. 
  
It takes a certain amount of enlightenment to see the value in the new 
screwdriver alongside your tried-and-true hammer, and to realize at the 
same time that the guy who tells you his new screwdriver will completely 
replace hammers is a nitwit. The IT industry as a whole seems to have a 
really hard time seeing technologies as incremental and complementary 
rather than revolutionary.

  
LY:
I agree. The main challenge is to combine different logics and different engines efficiently. In that regard ontologies may be a perfect place to do that.
-Ed

-- 
Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                Cel: +1 240-672-5800

"The opinions expressed above do not reflect consensus of NIST, 
 and have not been reviewed by any Government authority."




>   
> Duane,
>
> I'd just like to comment on the following point:
>
> JFS
> >> I'm sure that things like Neo4J can be useful for many applications,
> >> but if you really have large graphs and large numbers of graphs,
> >> you need to index the data.  And there are some very good methods
> >> for indexing finding exact and approximate matches.
>
> > DN: Yes.  This is why they are used for big data sets like FaceBook and
> > Google.  Each user starts their social graph with an index of themselves.
> > The traversal then finds other nodes related to that user.  The advantage
> > here is also the flexibility offered by graph databases IMO.  RDBMS is
> > much more inflexible as the schema has to be changed to add a single
> > property for an object.  Graph DB's allow one property to be added then,
>
> That is the kind of application for which graph traversal is good:
>
>   1. You have a pointer to a clearly defined starting point, such as
>      a web page for a specific individual.  For info about a specific
>      individual, FaceBook is a *structured* DB for which they have
>      predefined specific categories that they use for well-worn
>      branches in the graph.  Most paths are short, and people seldom
>      ask complex queries.
>
>   2. RDBMS is a different kind of structured data, and I certainly do not
>      intend to support all the encrustations and limitations that have>      evolved over the years in RDBMS.  Nor do I endorse all the quirks>      and peculiarities of SQL, which I used to call "the worst notation
>      for logic that has ever been inflicted on innocent users" -- but
>      that was before I saw RDF and OWL.
>
>   3. As for physical layout, graphs and tables are two logically
>      equivalent choices -- anything you can store in one can be mapped>      to the other.  That is an implementation choice.  In terms of
>      matrices, a densely populated matrix is best stored in table form,
>      and a sparse matrix is best stored in a graph form.
>
>   4. Ideally, the users (both programmers and end users) should not need
>      to know or care about the implementation.  For all its flaws, SQL is
>      far better than graph traversal for complex queries.  Most object>      oriented DBs offer programmers a choice of SQL vs native path-based
>      methods.  And most of them choose SQL.
>
>   5. I also agree that flexibility is extremely important.  A major
>      complaint about RDBMS is the need for a DB administrator to define
>      a schema in advance. But note that casual users love *spreadsheets*
>      for dense data.  Their table headings are a rudimentary, easy-to->      change schema, and users love the simplicity of a rectangular grid.
>
>   6. None of the issues listed above are new.  They were very thoroughly
>      discussed and analyzed during the "DB wars" of the 1970s, and there
>      has been 40 more years of R & D on all those issues.  My major
>      complaint about the SW is that they ignored all that R & D and
>      forced a one-size-fits-all format on everybody.
>
>   7. The SW notion of interoperability is to provide a mapping from
>      RDB to RDF.  But that is the *worst conceivable* approach.  It is>      unbelievably inefficient for dense data, and it is vastly worse
>      than SQL for complex queries.  If anybody had suggested that method
>      at a VLDB conference in the 1980s or '90s they would have been
>      laughed out of the room in disgrace.
>
> > DN: I think a lot of the issue you note are also solved by better> > modelling and indexing.  Nevertheless, it will be very interesting to
> > watch the 2012 growth of these Graph DB's.
>
> Graph DBs and RDBMS are both designed for professional programmers
> who are forced to dig into the implementation details.  The 40+ years
> of R & D on databases focused on implementation-independent methods.
> You don't even need any research studies to see why application
> programmers prefer JSON to RDF -- it's equally good for representing
> graphs, tables, or trees.
>
> I agree that for optimum performance on very large applications,
> such as FaceBook or Amazon, professional systems programmers need
> to get down into the bowels of the implementation.  Those systems
> provide good interfaces for casual users who go to their sites.
>
> But application programmers should *never* need to get into the
> details of the implementation.  And interoperability across
> independently developed systems should *always* be at a level
> that is independent of the implementation.  That is the point
> of the following paper and slides:
>
>     http://www.jfsowa.com/pubs/futures.pdf
>     Future directions in semantic systems
>
>     http://www.jfsowa.com/talks/iss.pdf
>     Integrating Semantic Systems
>
> John
>  
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/ 
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>  
>   

 
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
 
  

-- 
Edward J. Barkmeyer                        Email: edbark@xxxxxxxx
National Institute of Standards & Technology
Manufacturing Systems Integration Division
100 Bureau Drive, Stop 8263                Tel: +1 301-975-3528
Gaithersburg, MD 20899-8263                Cel: +1 240-672-5800

"The opinions expressed above do not reflect consensus of NIST, 
 and have not been reviewed by any Government authority."


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
 

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>