[Top] [All Lists]

[ontolog-forum] Big Data Challenges

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Tue, 10 Apr 2012 09:08:05 -0400
Message-id: <4F8430B5.7000907@xxxxxxxxxxx>
Following is a slightly edited version of a note I sent to a different
list, but it's also relevant to this list.    (01)

John Sowa    (02)

-------- Original Message --------    (03)

I'd like to mention three projects that illustrate ways that people use
highly expressive languages to process Big Data in sophisticated ways:    (04)

   1. Experian's use of logic programming to determine everybody's credit
      rating.  This is a multi-billion dollar company that used Prolog
      so heavily that they bought Prologia, the company founded by
      Alain Colmerauer, who built the first Prolog interpreter.    (05)

   2. Mathematica's use of logic programming and compilation technology
      to support engineers, mathematicians, and statisticians.  Among
      their users are the "quants" who analyze the stock markets and
      process huge volumes of data and make decisions at microsecond
      speeds.  This is merely a multi-million dollar company.    (06)

   3. The knowledge-compilation technology that led Bill Andersen
      and his colleagues to found Ontology Works, now High Fleet.
      They're not as big as the above two companies, but they've
      been in business for over a dozen years, and they build
      applications for large customers who process Big Data.    (07)

Re Experian:  They are highly secretive about their technology, rules,
and methods for detecting fraud and checking credit. It's not possible
to cite actual examples of what they do, but the general trends are
clear.  They started with commercial Prolog software, and they didn't
buy Prologia to sell Prolog software.  I'm sure that they wanted to
develop their rule language and compilation technology to make them
more user friendly for the people who write rules and more efficient
for processing Big Data from all sources, including the WWW.    (08)

Re Mathematica:  They also started with Prolog as the rule language
for writing all the software for analyzing and reasoning with and
about mathematical statements of any kind.  Over the years, their
rule language evolved far beyond Prolog, but they definitely did
*not* reduce expressive power.    (09)

Instead, they made it more "user friendly" for their target audience,
who know mathematics, but are not experts in computational complexity.
The tools give the users maximum expressive power, and they compile
or transform the input language to forms the system can process
efficiently.  After the users are satisfied with the results, the
tools can translate the algorithms to code in FORTRAN or C.    (010)

As for the stock-market gang, those companies moved their computers
to a location that is as close as possible to where the high-speed
Internet feed enters Manhattan.  That gives them an 8-microsecond
advantage over offices on Wall Street.  That shows how much they
value performance.  They process Big Data, but they would never
tolerate RDF bloat.    (011)

Re High Fleet:  I referred to fflogic.pdf for further discussion
of their 1998 paper.  For convenience, I copied the paragraph that
discusses it at the end of this note.  It shows how compilation
technology can extract info from a very expressive language (CycL)
and map it to forms that can be processed efficiently by other tools.
The people who entered the knowledge had the freedom to express it
in any way that was convenient for them.  Then the compiler translated
it to forms that could be efficiently processed for the given problems.    (012)

After the authors started their own company, they no longer had access
to Cyc.  But they continued to use knowledge compilation techniques.
Their primary language is Prolog, but they generate anything their
customers want.  They also accept any kind of input they get, including
RDF and OWL, but they translate those languages to Prolog to improve
expressive power *and* efficiency.    (013)

Note that all three of these companies are commercially successful,
and they have stayed in business for many years.    (014)

>> Then there are the questions about who or what is going to do
>> those transformations and adaptations.  The SME?  The knowledge
>> engineer?    (015)

> Who does the programming -- the programmer.  Who does the
> extract from a complex ontology -- the knowledge engineer.    (016)

Consider the three examples above.  It's very hard to distinguish the
KE from the SME.  The people who write rules for Experian know a great
deal about the subject, and they also know how to write rules in a
very expressive language -- Prolog or whatever Prologia now produces.    (017)

Or consider the "quants" who use Mathematica or whatever to analyze
the stock market.  They combine the functions of SME and KE.  And
for all of them, a software tool is the low-level programmer.    (018)

>> SMEs are experts in their subject, not in any kind of calculation.    (019)

> They should be expert in the sort of calculation performed in their
> field of expertise.    (020)

Consider the people who use Mathematica.  They are experts in using
mathematics to state their problems.  But the system uses a very
wide range of methods for solving those problems.  It automatically
chooses the inference algorithms.  For any specific type of problem,
it can translate the algorithms to efficient code in FORTRAN or C.    (021)

But the people who enter the knowledge don't need to know, learn,
or even worry about computational complexity.  The tools handle that.
And the *tools* can warn the KE, SME, or end user about any issues
of computational complexity for any specific problem.    (022)

______________________________________________________________________    (023)

Source:  http://www.jfsowa.com/pubs/fflogic.pdf    (024)

Although controlled NLs are easy to read, writing them requires
training for the authors and tools for helping them. Using the logic
generated from controlled NLs in practical systems also requires tools
for mapping logic to current software. Both of these tasks could
benefit from applied research:  the first in human factors, and the
second in compiler technology. An example of the second is a knowledge
compiler developed by Peterson et al. (1998), which extracted a subset
of axioms from the Cyc system to drive a deductive database. It
translated Cyc axioms, stated in a superset of FOL, to constraints
for an SQL database and to Horn-clause rules for an inference engine.
Although the knowledge engineers had used a very expressive dialect
of logic, 84% of the axioms they wrote could be translated directly
to Horn-clause rules (4667 of the 5532 axioms extracted from Cyc).
The remaining 865 axioms were translated to SQL constraints, which
would ensure that all database updates were consistent with the axioms.    (025)

Peterson, Brian J., William A. Andersen, & Joshua Engel (1998)
Knowledge bus: generating application-focused databases from large
ontologies, Proc. 5th KRDB Workshop, Seattle, WA.
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-10/    (026)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (027)

<Prev in Thread] Current Thread [Next in Thread>