[Top] [All Lists]

Re: [ontolog-forum] master data vs. ontologies

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Tue, 17 Feb 2015 13:42:59 -0500
Message-id: <54E38BB3.4060106@xxxxxxxxxxx>
Tom, David, Ed, Kingsley, Bill, and Doug,    (01)

When defining anything, it's helpful to look at the documents
by the people who use the terms.  So I chose three sources:    (02)

  1. Wikipedia: http://en.wikipedia.org/wiki/Master_data    (03)

  2. A citation by Wikipedia of a conference article from 2010 by
     Boris Otto and Alexander Schmidt, "Enterprise Data Architecture":
http://mitiq.mit.edu/ICIQ/Documents/IQ%20Conference%202010/Papers/2B1_EnterpriseMasterDataArchitecture.pdf    (04)

  3. A 596-page IBM Redbook from 2012 that goes into the details
     of IBM's version of Master Data Management (MDM).  See
     http://www.redbooks.ibm.com/redbooks/pdfs/sg247956.pdf    (05)

Wikipedia lists three "types of master data":    (06)

  1. "Reference Data is the set of permissible values to be used by other
     (master or transaction) data fields."    (07)

  2. "Master Data is a single source of basic business data used across
     multiple systems, applications, and/or processes."    (08)

  3. "Market Master Data is the single source of basic business data for
     an entire marketplace. Market master data is used among enterprises
     within the value chain. An example of Market Master Data is the UPC
     (Universal Product Code) found on consumer products."    (09)

The article by Otto & Schmidt is consistent with the Wikipedia.
It also cites and compares several "architectures", among which is
the 3-column Zachman "Framework for Information Systems" from 1987.
In 1992, Zachman and I (who were both working at IBM) published the
updated 6-column version:  http://www.jfsowa.com/pubs/sowazach.pdf    (010)

I followed Aristotle in relating each column to one of six question
words:  What? How? Where? Who? When? Why?  I also related it to the
work on conceptual schemas and to logic in E-R notation, predicate
calculus, and conceptual graphs.    (011)

The IBM Redbook mentions logic in the sense of logical data models
(LDL), which it relates to UML and to physical data models for DB2.
It also mentions 'ontology' twice, but with a disclaimer.  It says
that the "IBM Infosphere Data Architect" manages glossaries and
ontologies about documents.  Unfortunately, it says that there is
a "tool break" that makes it difficult to use for MDM:    (012)

> [page 37] However, adopting the glossary and ontology profiles from
> there is a challenging undertaking, and thus we avoid following this
> approach, even though we entertain the idea of a central glossary
> often and encourage you to build one up yourself.    (013)

I find that statement from 2012 extremely frustrating.  I wrote my 1984
book while I was at IBM.  I was teaching, lecturing, and writing about
logic and ontology and their relationship to AI, databases, linguistics,
and the conceptual schema.  I also designed and implemented the ILIS
parser, which Ted Codd and his group adopted for mapping English
queries to SQL:  http://www.jfsowa.com/pubs/ilis.pdf    (014)

There were many, many other IBM researchers and developers who knew,
understood, published, and implemented prototypes and highly usable
tools for processing languages, glossaries, logics, ontologies, and
mapping them to and from DBs, KBs, English, German, French, Italian,
Spanish, Portuguese, Japanese, etc.  None of them became IBM products.    (015)

That incompatible "tool break" is the result of IBM *buying* companies
that developed incompatible solutions to problems. IBM researchers had
recognized and were developing *compatible* tools for solving those
problems since the 1970s. That joint paper with Zachman was just one
of my efforts to relate the many versions to one another.  Around the
same time, I was participating (as an IBM representative) in ANSI and
ISO standards projects and the DARPA-sponsored project for Shared
Reusable Knowledge Bases (SRKB). For references and brief descriptions,
see "SRKB" and "IRDS conceptual schema" in http://www.jfsowa.com/ikl .    (016)

Now to relate that to this thread:    (017)

> I'm concerned that you agree that "ontologies and master data
> represent ... essentially the same thing."    (018)

That depends on how broad we interpret the word 'essentially'.
In the ikl web page, I've been trying to relate all such projects.    (019)

> ontologies are to master data as types are to instances    (020)

> In my experience, ontologies are every bit as much about the instances
> as database tables. Iíve never seen an enterprise ontology that was
> not about instances.    (021)

Note that some instances, such as the earth and the sun, are critical
for defining a huge number of types.  Any enterprise (business or
government or NGO) will distinguish privileged instances -- including
the enterprise itself.  In the axioms for arithmetic, 0 and 1 are
treated as privileged instances.  But I agree that most ontologies
will define many more types than special instances.    (022)

> Note also that RDF is essentially about instances; adding theory to
> it is the function of RDF Schema and OWL.    (023)

Yes.  And note that RDFS uses RDF as its base logic.  In effect,
RDFS is a metalanguage for specifying the semantics of RDF data.
Any version of logic can be used as a metalanguage.    (024)

> Does any such thing as a non-relational database management system
> actually exist, bearing in mind data is represented using relations?    (025)

Any graph can be mapped to a table -- and vice versa.  Note that
people who never heard of logic, ontology, SQL, or RDF use
spreadsheets for data they want to save and/or print in a table.    (026)

> I can't grasp how "Prolog" is an example of a "mapping tool".
> "Mapping", to me, and as I think is intended here, is "data mapping".    (027)

> ETL stands for extract, transform and load, which is a very common
> activity in the database world.It seems that an ontology-supported
> ETL process would be close to  the topic(s) raised here, and there
> appears to be a fair amount of  experience with this.    (028)

Yes.  Alain Colmerauer, who built the METEO translator between English
and French, designed and implemented a grammar-writing tool.  After
talking with Bob Kowalski about using logic, he realized that his
grammar rules were very similar to Kowalski's logic.  So Colmerauer
revised his translation tool to implement the first version of Prolog.    (029)

When he retired, Colmerauer sold his Prologia company to Experian
-- one of the companies that monitor everybody's credit ratings.
They use Prolog to write rules for analyzing documents, extracting
the data, and reasoning with and about the data to evaluate credit
ratings.  They're very secretive about their methods.  But they use
Prolog so heavily that they bought the company.    (030)

As another example, Prolog runs circles around XSLT in speed and
generality for mapping XML data to and from other notations.    (031)

To confirm that point, note that IBM had developed UIMA as an XML-based
tool for annotating and processing documents.  For the Watson Jeopardy!
system, they wrote a UIMA-based pattern matcher.  But it was very slow
and limited.    (032)

So they replaced it with Prolog.  It was much faster than the UIMA
tools, and it was far more general, powerful, and easier to use.    (033)

Fundamental principle:  A modest amount logic can clarify and relate
the patterns that relate and interconnect data of any kind in any
format whatsoever.  Prolog is what XSLT should have been -- and still
could be.  (It's easy to map XSLT rules to Prolog rules for the legacy
stuff -- using Prolog, of course.)    (034)

John    (035)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (036)

<Prev in Thread] Current Thread [Next in Thread>