[Top] [All Lists]

Re: [ontolog-forum] Ontology-based database integration

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Paola Di Maio <paola.dimaio@xxxxxxxxx>
Date: Sat, 10 Oct 2009 13:31:32 +0100
Message-id: <4a4804720910100531q69d54dbp7fda800125527e83@xxxxxxxxxxxxxx>

I have sensed in person the trend toward the glorification of RDF over and beyond what it is (what I think it is anyway)
That's because a lot of people bank on 'hype', scientists and researchers included

You may have seen me refer to this trend as 'indoctrination', whereby I intend the propagation of a limited set of facts as absolute, and not their historical context and perspectives
(thus disregarding everything else). On that, I am with you.

What worries me the most, is that scores of  otherwise unaware young generations of students and researchers are being brainwashed to think what you suggest below. rrather than being taught the parameters to evaluate by themselves what is the best technology

But we should not fall into making the opposite worst mistake either, by making sweeping unsupported statements

Technologies should be fit for purpose (discussed on this list before)
SPARQL , from what I understand, fits very nicely its purpose (querying linked data sets), certainly up to a point

If you have data, for example, to show where SPARQL breaks, and where it is best complemented/substituted by other stuff,
then you can only do a service to science, including 'semantic web science if you publish this data

I also personally witness the research mafias' (sorry folks, no disrespect intended for the many good people who work for scientific cartels) deliberatelly disregarding  (rejecting) whatever research may contradict and  not promote/support/validate the work and approaches they propose (that does not profit them), so I would expect that even if you did write such  paper, you may have to struggle to have it published,

Then you would have to be prepared to throw yourself in the fray,  fight, confront your ideas with others, and also to be proven wrong when we are

Its through our personal struggles that humanity as a whole makes progress, lets try to make it a fun exercise if possible?


On Sat, Oct 10, 2009 at 3:55 AM, John F. Sowa <sowa@xxxxxxxxxxx> wrote:
Paola, Kingsley, and Cecil,

Before commenting on your notes, I want to apologize for an
emotional moment when I used the phrase "profoundly foolish"
about people who are intelligent, but had not given fair
support for the wide range of available DB technologies.

PDM> ... as I am learning the ropes, I see some benefits/advantages
 > of SPARQL (over Sql for example) It would be great if you could
 > produce some data to backup your statement above?

My major concern was not so much about the adoption of SPARQL as
one option among many, but the choice of that approach and the
use of triple stores in preference to the enormous range of DB
technologies that had been designed, developed, and implemented
over the past forty years.

I have no particular love for SQL, and I had been disappointed
that much better notations for relational DBs had been ignored.
My personal preference for a DB query, constraint, and rule
language would be Datalog, which can be viewed as a simplified
version of Prolog that is specialized for DB access.  In fact,
when Ted Codd, the inventor of relational DBs, first saw Prolog,
his immediate reaction was "I wish I had invented that."

But to support my point that the notation for a query language is
independent of the way the data is stored, I'd like to mention a
discussion I had a few years ago with the developers of Objectivity,
which is an object-oriented database that stores the data as graphs.
For the query language, they supported both SQL and a path-based
query language similar to SPARQL.  However, most of their users
preferred to use SQL because they were more familiar with it. So
the Objectivity developers supported SQL as well as other languages
for accessing their database.  Following is their FAQ sheet:


But one of my major criticisms of the Semantic Web is that the W3C
had not provided better integration with relational DBs, since nearly
every commercial web site, both large and small, was built around a
relational DB.  For a summary, see slides 89 and 90 of the following:
KI> My response is about a single point: you can have a multi-model
 > DBMS engine. One capable of being optimized for scenarios specific
 > to a given model. In this case Graph vs Relational.  I think we
 > actually agree on the concept of multi-model DBMS engines, as
 > opposed to one model fits all, right?

Yes, definitely.  I had mistakenly assumed that you were arguing
in support of SPARQL and triple stores in preference to other
methods for DBMS.  One reason why I like Datalog is that it is
a clean notation (much cleaner than SQL), which can be supported
by any kind of underlying DB organization.

SPARQL is a step backwards to the bad old days of CODASYL DBTG
and the database wars between Ted Codd and Charlie Bachman.
Some people think that because I use conceptual graphs I would
prefer a path-based access method.  But that is definitely false.

My favorite graph-based approach is to give the system a query
graph and say "Find all matches to the query graph within a
given semantic distance, and I don't care how you do it."
And by the way, a Datalog _expression_ is a graph, after you
parse it and treat the variables as cross-links.

KI> We shouldn't write-off anything, its about using the best
 > combination of tools for the problem at hand. In this case, the
 > trick is to combine technology and techniques from a range of
 > realms: raw DBMS and Middleware.

I'm happy with that statement.

CL> These messages are not dumped to a database then processed. They
 > are processed in real-time, in memory, off the data stream. If you
 > had time to database them, then you are not in need of real-time
 > analysis. That is not the use case I am talking about here, or
 > in most decision support that we do in healthcare.

You can do all kinds of just-in-time analysis and optimization
on streaming data.  We (at VivoMind) do that on high-speed data
streams, and we definitely do not want some programmer to try
to outguess our optimizer by specifying a path by a SPARQL query.
The automatic optimization is much better, since it's tailored
to the actual data stream, not to somebody's preconceived
idea of what the data stream should be.

KI> As for your processing off a stream, what point are you trying
 > to make about how the data is going to be accessed, reconciled,
 > and meshed?  Where does thinking occur?  Where does remembering
 > occur? What's the grey matter? How does the machine construct
 > frames of references when dealing with these huge streams of
 > disparate data?

What we do at VivoMind is to represent everything in conceptual
graphs and to index the graphs by a Cognitive Signature (TM) as
they arrive.  When new graphs arrive, we compute their Cognitive
Signatures, check whether we ever saw anything similar, and
retrieve the previous cases.

The time to find all graphs within a given semantic distance of
a query graph varies logarithmically with the number of graphs.
The time to find one graph out of a billion takes three times as
long as finding one out of a thousand.  See slides 10 to 13 of


Paola Di Maio
Networked Research Lab, UK


Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>