ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] [SMW-devel] [News] Google, Microsoft, Facebook And O

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Fri, 12 Oct 2012 11:49:03 -0400
Message-id: <50783BEF.8090001@xxxxxxxxxxx>
Rich, Simon, and Michael,    (01)

Fundamental issue:  Tim B-L emphasized three basic principles in
the DAML proposal of Feb 2000:  interoperability, diversity, and
heterogeneity.  He said that SWeLL (a superset of propositional
logic, first-order logic, and higher-order logic) was necessary
to support all the heterogeneous systems on the WWW.  He also
mentioned KIF and SQL as languages that had to be supported.    (02)

Pat Hayes worked with the W3C in the early days, and he made
sure that the Common Logic semantics was sufficiently general
to support the requirements for SWeLL.  But if you look at the
DAML reports (which I cited in earlier notes), the original
requirements were watered down in each successive report.  The
final report (in 2006) did not mention the three basic principles
that Tim discussed at length throughout the proposal in 2000.    (03)

SWeLL is one key term that was mentioned in the final report.
But that report said SWeLL was renamed OWL.  That ploy was a
political move to confuse executives who had no understanding
of the technical issues.    (04)

But the original DAML proposal specified SWeLL as a superset of
the expressive power of all the heterogeneous systems on the WWW.
Yet OWL was deliberately restricted to a tiny subset of the SWeLL power. 
  OWL, by itself, cannot support any of the basic principles
in Tim's vision for the Semantic Web.    (05)

RC
> Actually, AND OR and NOT functions can be calculated, and usually
> are done so, with inverted files inside SQL interpreters.  That
> time is very nearly linear rather than polynomial.    (06)

The execution time for any database query (in SQL, SPARQL, or
any other language) is proportional to the number of accesses
to external storage.  For each existential quantifier in a query
(which may be implicit or explicit), there is some number N
of items that could be the relevant value for that quantifier.    (07)

The time for the query depends on the number M of quantifiers.
Since M is usually small, N is the critical value.  With no
indexing, the time is proportional to N^M (polynomial time).
In the best case, the total time could be (log N)^M.    (08)

SS
> I'm not sure that John is allowed to both cite worst-case complexity
> results *and* criticize the direction taken by DAML in the same text    (09)

The complexity results for OWL have nothing to do with the execution
time of a DB query.  OWL was optimized for one specialized part of one
very specialized kind of reasoning:  the time required to classify
some item in a hierarchy.  By imposing tight restrictions on the
expressive power of a DL language, OWL can guarantee that operation
is (a) decidable and (b) computable in polynomial time.    (010)

For some purposes, that property is useful.  But in the grand scheme
of Tim's vision, that issue is so tiny that he never mentioned it in
*any* DAML report.  I'm sure that the DL people who jumped on the DAML
bandwagon hyped it to high heaven.  But Tim was the final editor of
every DAML report, and he kept the word 'decidability' out of them.    (011)

For anybody who is concerned about complexity and execution time in
a large ontology, please look at the Cyc reports.  It has a hierarchy
of over 600,000 concepts.  And the Cyc people (which include Doug F)
have said that decidability was *never* a major concern.  They have
been working on very large reasoning problems for the past 28 years,
and they have *never* encountered a problem for which the kinds of
restrictions imposed by OWL were useful or desirable.    (012)

MB
>  I think every n-tuple alternative to RDF is not exactly like a RDB system...    (013)

I agree.  I *never* said that we should use SQL and RDBs as the only
method for storing, finding, and retrieving data.  Neither did Tim.
But he recognized the fact that nearly every commercial web site was
built around an RDB.  He emphasized both RDBs *and* RDF.    (014)

Tim's emphasis was on interoperability with *heterogeneous* systems.
That implies that all of them are running simultaneously with different
representations and implementations.    (015)

MB
> Both of these restrictions require a fixed schema that is known in advance or
> looked up from the predicate URI. I don't know if that would be a good idea.    (016)

There are different kinds of applications.  When you have fixed-format
data, a fixed schema can be highly efficient.  But when you have highly
variable data of unpredictable types, a fixed schema is terrible.
We need both.  Tim knew that, and he made provision for both in the
DAML proposal of 2000.    (017)

MB
> Triples only offer complex modeling that is potentially slow and
> potentially bad (relations should have higher arity). But probably modeling
> higher arity relations with lower arity relations is not so bad as modeling
> lower arity relations with higher arity relations.    (018)

That's an implementation issue that is independent of the way you
present the data to users or to application programs.  Stonebraker
demonstrated that in an extremely fast implementation of VoltDB:    (019)

Source: 
http://highscalability.com/blog/2010/6/28/voltdb-decapitates-six-sql-urban-myths-and-delivers-internet.html
> VoltDB claims to be 100 times faster than MySQL, up to 13 times faster than
> Cassandra, and 45 times faster than Oracle, with near-linear scaling.    (020)

MB
> There is a performance/design tradeoff with n-tuples. Look at your
> example:
>
>     {{#set_internal:Is president of|Has name=James Madison
>     |Has start year=1801|Has end year=1809|Has vice
>     president#list=George Clinton, Elbridge Gerry}}
>
> The binary has_name property stands on its own and should clearly not be part
> of another relation with higher arity but doing it so makes things faster.    (021)

There are many kinds of tradeoffs for various purposes.  In any case,
that's not my example. It's just one that I took from the SMW web site.    (022)

The issues I want to emphasize are the ones in Tim's original proposal.    (023)

John    (024)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (025)

<Prev in Thread] Current Thread [Next in Thread>