[Top] [All Lists]

Re: [ontology-summit] Clarification re Big Data Challenges Synthesis

To: ontology-summit@xxxxxxxxxxxxxxxx
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Tue, 10 Apr 2012 08:55:11 -0400
Message-id: <4F842DAF.8090400@xxxxxxxxxxx>
Doug,    (01)

Before commenting on your points, I'd like to mention three projects
that illustrate the ways that people use highly expressive languages
to process Big Data in very sophisticated ways:    (02)

  1. Experian's use of logic programming to determine everybody's credit
     rating.  This is a multi-billion dollar company that used Prolog
     so heavily that they bought Prologia, the company founded by
     Alain Colmerauer, who built the first Prolog interpreter.    (03)

  2. Mathematica's use of logic programming and compilation technology
     to support engineers, mathematicians, and statisticians.  Among
     their users are the "quants" who analyze the stock markets and
     process huge volumes of data and make decisions at microsecond
     speeds.  This is merely a multi-million dollar company.    (04)

  3. The knowledge-compilation technology that led Bill Andersen
     and his colleagues to found Ontology Works, now High Fleet.
     They're not as big as the above two companies, but they've
     been in business for over a dozen years, and they build
     applications for large customers who process Big Data.    (05)

Re Experian:  They are highly secretive about their technology,
rules, and sources for detecting fraud and checking credit.
It's not possible to cite actual examples of what they do, but
the general trends are clear.  They started with commercial
Prolog software, and they didn't buy Prologia to sell Prolog
software.  I'm sure that they wanted to develop their rule
language and compilation technology to make them more user
friendly for the people who write rules and more efficient
for processing Big Data from all sources, including the WWW.    (06)

Re Mathematica:  They also started with Prolog as the rule language
for writing all the software for analyzing and reasoning with and
about mathematical statements of any kind.  Over the years, their
rule language evolved far beyond Prolog, but they definitely did
*not* reduce expressive power.    (07)

Instead, they made it more "user friendly" for their target audience,
who know mathematics, but are not experts in computational complexity.
The tools give the users maximum expressive power, and they compile
or transform the input language to forms the system can process
efficiently.  After the users are satisfied with the results, the
tools can translate the algorithms to code in FORTRAN or C.    (08)

As for the stock-market gang, those companies moved their computers
to a location that is as close as possible to where the high-speed
Internet feed enters Manhattan.  That gives them an 8-microsecond
advantage over offices on Wall Street.  That shows how much they
value performance.  They process Big Data, but they would never
tolerate RDF bloat.    (09)

Re High Fleet:  I referred to fflogic.pdf for further discussion
of their 1998 paper.  For convenience, I copied the paragraph that
discusses it at the end of this note.  It shows how compilation
technology can extract info from a very expressive language (CycL)
and map it to forms that can be processed efficiently by other tools.
The people who entered the knowledge had the freedom to express it
in any way that was convenient for them.  Then the compiler translated
it to forms that could be efficiently processed for the given problems.    (010)

After the authors started their own company, they no longer had access
to Cyc.  But they continued to use knowledge compilation techniques.
Their primary language is Prolog, but they generate anything their
customers want.  They also accept any kind of input they get, including
RDF and OWL, but they translate those languages to Prolog to improve
expressive power *and* efficiency.    (011)

Note that all three of these companies are commercially successful,
and they have stayed in business for many years.    (012)

>> Then there are the questions about who or what is going to do
>> those transformations and adaptations.  The SME?  The knowledge
>> engineer?    (013)

> Who does the programming -- the programmer.  Who does the
> extract from a complex ontology -- the knowledge engineer.    (014)

Consider the three examples above.  It's very hard to distinguish the
KE from the SME.  The people who write rules for Experian know a great
deal about the subject, and they also know how to write rules in a
very expressive language -- Prolog or whatever Prologia now produces.    (015)

Or consider the "quants" who use Mathematica or whatever to analyze
the stock market.  They combine the functions of SME and KE.  And
for all of them, a software tool is the low-level programmer.    (016)

>> SMEs are experts in their subject, not in any kind of calculation.    (017)

> They should be expert in the sort of calculation performed in their
> field of expertise.    (018)

Consider the people who use Mathematica.  They are experts in using
mathematics to state their problems.  But the system uses a very
wide range of methods for solving those problems.  It automatically
chooses the inference algorithms.  For any specific type of problem,
it can translate the algorithms to efficient code in FORTRAN or C.    (019)

But the people who enter the knowledge don't need to know, learn,
or even worry about computational complexity.  The tools handle that.
And the *tools* can warn the KE, SME, or end user about any issues
of computational complexity for any specific problem.    (020)

> John, these statements are made to contradict the claim that
> a complex ontology necessarily would result in excessive computation.
> It depends upon whether the systems using the ontology decide to
> perform such complex calculations.    (021)

I completely agree.  I never made that claim.  I always said that
the KEs, SMEs, and end users should be allowed to express anything
and everything they know about the domain in any form they prefer.    (022)

> I am addressing the concern that an ontology with rules would
> necessarily slow things down.  The OEs can design the systems so
> that such computations are done only when desired -- just as
> programmers do with code!    (023)

What is an OE?  Is that a human or a software system?  I am assuming
that such optimizations should be done by the software tools.  Of
course, somebody has to implement those tools.  The people at the
three companies mentioned above do that.    (024)

The technology these companies use in successful commercial projects
had been published when the Semantic Web was just getting started.
Instead of adopting it, the SW gang ignored it.  That is why the
current SW is an obsolete legacy system.    (025)

______________________________________________________________________    (026)

Source:  http://www.jfsowa.com/pubs/fflogic.pdf    (027)

Although controlled NLs are easy to read, writing them requires training 
for the authors and tools for helping them. Using the logic generated 
from controlled NLs in practical systems also requires tools for mapping 
logic to current software. Both of these tasks could benefit from 
applied research:  the first in human factors, and the second in 
compiler technology. An example of the second is a knowledge compiler 
developed by Peterson et al. (1998), which extracted a subset of axioms 
from the Cyc system to drive a deductive database. It translated Cyc 
axioms, stated in a superset of FOL, to constraints for an SQL database 
and to Horn-clause rules for an inference engine. Although the knowledge 
engineers had used a very expressive dialect of logic, 84% of the axioms 
they wrote could be translated directly to Horn-clause rules (4667 of 
the 5532 axioms extracted from Cyc). The remaining 865 axioms were 
translated to SQL constraints, which would ensure that all database 
updates were consistent with the axioms.    (028)

Peterson, Brian J., William A. Andersen, & Joshua Engel (1998) 
“Knowledge bus: generating application-focused databases from large 
ontologies,” Proc. 5th KRDB Workshop, Seattle, WA. 
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-10/    (029)

Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2012/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2012  
Community Portal: http://ontolog.cim3.net/wiki/     (030)
<Prev in Thread] Current Thread [Next in Thread>