[Top] [All Lists]

Re: [ontolog-forum] Is there something I missed?

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: "John F. Sowa" <sowa@xxxxxxxxxxx>
Date: Sat, 31 Jan 2009 02:06:54 -0500
Message-id: <4983F88E.8050602@xxxxxxxxxxx>
Pat C, Pat H, Mitch, and Matthew,    (01)

Ted Codd's original relational algebra treated each table with
N columns as an n-adic relation and each row of a table as one
instance of the relation.  The SQL query language was loosely
based on Codd's algebra, the designers were not as knowledgeable
about logic as Codd was, and Codd kept a hands-off attitude toward
the SQL project.  As a result, there was a lot of controversy (i.e.,
confusion) about the meaning of the labels at the top of the columns:    (02)

  1. The labels represent the type of data in the column.    (03)

  2. The labels represent the role played by data in the column.    (04)

  3. The labels are nothing but readable character strings for
     distinguishing the columns instead of the integers 1,...,N.    (05)

  4. The labels represent dyadic relations that link an entity
     represented by the name of the table to entities in each
     row of the table.  (See the comment by Pat H. below.)    (06)

Some database administrators try to apply policies inspired by
#1 or #2, but those are difficult to generalize.  Points #3 and
#4 are more general.    (07)

The most direct mapping of an SQL WHERE-clause to FOL is #3:
treat each table as a single relation with character strings
to label the argument positions instead of integers.  The
mapping obtained by #4 can be systematically mapped to and
from any representation obtained by #3.    (08)

PC> It is also possible to ... let each table represent a relation,
 > and in that case the rows would be instances of that relation.
 > But in this case, the table would be labeled as a relation, not
 > as an individual entity.    (09)

That is Ted Codd's original conception, and it is consistent
with the way that SQL evaluates queries in terms of an RDB.
However, if you have a table labeled "Employee", you can think
of it as a relation that relates each employee's name (or ID)
to other entities, such as manager, salary, department, etc.    (010)

PC> In the RDB's I have seen, each field (column in a table) often
 > (typically?) represents one relation on the type represented by
 > the table, so the number of columns does not have any relation
 > to the arity of the relations used.    (011)

That would be a variation of #1 above.  A column labeled Manager,
for example, would represent a monadic predicate that is true of
each of the individuals named by the character strings in that
column.  However, none of the SQL operators do anything to support
or enforce any such typing.  For example, you would like to say
that Manager is a subtype of Employee, and Employee is a subtype
of Person.  But those constraints would not be enforced by SQL.    (012)

Ted Codd, Chris Date, and others published various proposals for
adding types, type hierarchies, and type constraints to relational
DBs, but they weren't adopted in the standards for SQL.    (013)

PH> But another way is that each column of the table is a binary
 > relation holding between the table and the rows, which are
 > entities. This is the same switcheroo which takes an n-ary
 > relation into a collection of binary 'case roles' (think
 > column names) in the format beloved of linguists. Basically,
 > its the rather trite observation that a 2-d table can be thought
 > of either as a vector of rows or a vector of columns. Both ways work.    (014)

I agree with this point.  This interpretation maps each RDB table
to a collection of dyadic relations, which correspond to the labels
of each column.  However, this mapping would tend to generate an
explosion of binary relations, which tend to be highly inefficient
for large databases.  As Matthew points out (see below), it is
generally more efficient to combine multiple tables with few
columns into a smaller number of large tables with multiple columns.    (015)

MH>> Are you saying that most of the table schemas you see in rdb's
 >> have large numbers of fields because they were purposefully
 >> re-engineered that way (from collections of binary relations)
 >> in order to make access (and joins) more efficient?    (016)

MW> It depends whether the design was done properly or not. If it
 > is done properly then a (so-called) conceptual schema is developed
 > that is neutral to the processing that is done on the data. A
 > conceptual schema is considered "good" if it is highly normalised,
 > i.e. the information is held with no repetitions of the same data.
 > It is then typically denormalised into a physical (table) schema
 > that takes account of the joins that are necessary to do the most
 > processing for the most important uses of the data.    (017)

I agree with Matthew's point.  I'd also like to mention a special
case that attracted a lot of theoretical attention during the
1980s:  "the universal join assumption".  This was an assumption
that all the tables in the DB could be joined into one very
large, totally unnormalized table.  If such a join is possible,
then many theorems can be proved about the relationships among
all the data.    (018)

However, the universal join assumption was almost always *false*
for large practical databases.  As a result, the many published
papers on the topic were useless, because few, if any practical
databases could be consistently joined into a single large table.    (019)

In any case, I agree with Matthew about using large unnormalized
tables to improve efficiency by reducing the number of join
operations needed for common queries.    (020)

John Sowa    (021)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (022)

<Prev in Thread] Current Thread [Next in Thread>