ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Is there something I missed?

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sat, 31 Jan 2009 13:08:26 -0500
Message-id: <4984939A.7030301@xxxxxxxxxxx>
Dear Rich, Matthew, and Alex,    (01)

JFS>> Ted Codd, Chris Date, and others published various proposals
 >> for adding types, type hierarchies, and type constraints to
 >> relational DBs, but they weren't adopted in the SQL standards.    (02)

RC> Actually, SQL enforces foreign key constraints that let the
 > modeler...    (03)

Note your verb 'let'.  I agree that SQL *lets* the modeler write
constraints that can enforce anything that can be stated in first
order logic.  But as you said yourself,    (04)

RC> In my experience, commercial databases are developed in a
 > haphazard way for nearly all commercial applications without
 > the luxury of careful modeling.    (05)

Implementing type hierarchies in an SQL database is possible, but
not the path of least resistance.  Furthermore, any N modelers who
adopt types are likely to find N incompatible ways to implement them.    (06)

MW> I did some work that effectively does this about 15 years ago,
 > which is captured in a document called "Developing high Quality
 > Data Models".  Part of it is about finding the hidden classes,
 > and part of it is about not hiding them in the first place.
 > You can find it here:
 > http://www.matthew-west.org.uk/Publications.html    (07)

That is good work.  It is the kind of approach that that could
and should have formed the basis for integrating RDBs with the
Semantic Web.  Although UML is not sufficiently formalized, tools
that use UML notations could be integrated with logic-based systems.
There was a lot of R & D on deductive databases that could have
provided a far more suitable basis than the notorious layer cake.    (08)

RC> But sometimes, denormalized tables are preferable for performance
 > reasons - not for conceptual modeling reasons.    (09)

I agree.  But the underlying table structure is a performance issue
that should be handled by an automatic (or at least semi-automatic)
optimizer.    (010)

SQL became successful because the queries were optimized by the
compiler.  But optimization techniques have progressed quite far
in the past 40 years, and much more can be done to help knowledge
engineers focus on the knowledge, not the implementation details.    (011)

ASh> We may partially extract classes and relationships from user
 > interface with database. You know these labels (words) on forms,
 > pages, reports.  From other side any database may be converted
 > to set of sentences (facts). And these sentences should be accepted
 > by user (domain expert) as native.  It should be quite enough to
 > begin modeling    (012)

I agree, but good tools are necessary to support that approach.    (013)

RC> I’m presently looking at a table with tens of millions of rows
 > and 160 some-odd columns.  Each row of that huge table translates
 > into a very, very long sentence!  It would be better to translate
 > each row into a full paragraph, with anaphoric references to
 > earlier concepts expressed in sentences within the paragraphs.    (014)

I wouldn't attempt to generate English from the database.  Instead,
I'd extract English definitions from the documents that describe
the database.  Those would include the technical manuals and reports
as well as the documentation designed for the end users and data
entry clerks.    (015)

RC> S32994 is probably an automatically generated identifier that
 > ties together one or more concepts from the database.  So
 > changing that to some kind of English-like, humanly meaningful
 > phrase is the major difficulty in that kind of translation.    (016)

For an example of a legacy re-engineering project that related COBOL
and SQL code to 100 megabytes of English documentation, see slides
12 to 15 of the following talk:    (017)

    http://www.jfsowa.com/talks/pursue.pdf    (018)

That project required both formal statements suitable for a database
dictionary *and* an English glossary that defined the English terms
and related them to internal identifiers such as S32994.  The system
generated both of them:    (019)

  1. First it analyzed the formal code (COBOL and SQL) and generated
     a DB dictionary that related all the computer-oriented terms
     and identifiers to the programs and files.    (020)

  2. Then it analyzed the English documentation to check for
     inconsistencies between the programs and the English.
     For the terms in the English glossary, it selected one or
     more sentences from the documentation that had the form
     of a definition, and it cross-referenced them to the
     computer-oriented terms.    (021)

ASh> Then we automatically translate these sentences to OWL...    (022)

I wouldn't recommend OWL.  Translating a controlled natural language
to and from Common Logic is much simpler and more systematic.  See    (023)

    http://www.jfsowa.com/talks/cl_sowa.pdf    (024)

John Sowa    (025)


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (026)

<Prev in Thread] Current Thread [Next in Thread>