ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Invoice ontology discussion points/issues

To: Adam Pease <adampease@xxxxxxxxxxxxx>
Cc: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Patrick Cassidy <pcassidy@xxxxxxxxxxxxxxxx>
Date: Fri, 11 Jul 2003 22:05:26 -0400
Message-id: <3F0F6CE6.9040603@xxxxxxxxxxxxxxxx>
Adam's reply to my comments (below) demonstrates that there
are significantly different approaches to representing
"documents" (generically, information-bearing physical
objects and the abstract informational content thereof).
I was pointing out that Cyc (and my own preferences)
include explicitly within the class hierarchy the abstract
informational entities themselves, and the relations between
informational entities, and also the relations between the
informational entities and the physical objects that represent
that information. In SUMO the class hierarchy emphasizes
the physical objects and only indirectly through the
"containsInformation" relation.  What is important for
this discussion is to recognize that adopting the SUMO
method of representing documents is not necessary, is
not the method used in Cyc, and is not, I believe, the
best way to represent information for the purpose of
reasoning, even when one's perspective is as restricted
as the representation of an "invoice".  If one chooses
the SUMO representation, one has to be aware that one
may be choosing a representation that is not universally
used and may not become widely adopted.  This will not
be a problem if one simply adds the required abstract
concept classes in the "Abstract" section of SUMO, but
as Adam's note shows, the Teknowledge group does
not like that approach.  My recommendation was that, if
one uses the SUMO higher levels for the Invoice ontology,
one explicitly represent the abstract document, and one
can if one wishes also have a parallel representation
of the physical document and its physical parts in the
"Physical" section of the class hierarchy.    (01)

   As to the specific questions Adam asks:    (02)


[PC]>>    This gets into one of the characteristics of SUMO which differs
 >> from CYC.   In SUMO, "CurrencyMeasure" is an abstract concept that
 >> refers to a unit of measure, which is a currency.  The symbol used
 >> for that currency in a document would be a separate abstract
 >> concept, which, as best I can tell, has no representation in the
 >> SUMO class hierarchy.
 >
 >
[AP]> Do mean the character "$", for example?  That would be a SUMO 
&%Character.
 >
    No, I was referring to the generic concept of "US Dollar Currency 
Symbol", which could be "$", "USD", "US$",
and probably others as well.  One of those three is a subtype of
an abstract character.  There is also a single-character dollar symbol
with two vertical strokes.  But in SUMO, only the physical objects
which are representations of these abstract symbols are
explicitly included in the class hierarchy.  Each of these is a
different conceptual object, which has very large numbers of physical
representations scattered in zillions of documents.    (03)


[PC]>> In SUMO, classes of texts tend to be found
 >> explicitly only in the "Physical" section, representing the
 >> physical objects which are texts (instances of book or invoice).
 >> Such documents are related to the more abstract objects which they
 >> represent by the "containsInformation" relation, which points to a
 >> "Proposition". But there are few explicitly reified Conceptual
 >> classes corresponding to the abstract objects.  So there is a
 >> place in the  SUMO class hierarchy for every single copy of a
 >> novel, but no explicit class other than the very generic
 >> "Proposition" for the more abstract idea of the novel of which
 >> the individual artifacts are physical representations.
 >
 >
[AP]> Proposition is the correct class in SUMO for the information
 > content of a text.  Would you want that class defined differently?
 > If so, how?   Would you want a subclass added?  If so, why?
 >
    A novel (as an abstract idea) can be considered as a "Proposition"
in SUMO, and, yes, I would want the abstract "Novel" added as a
subclass.    (04)

WHY?  When one asks "who wrote 'Gone with the wind?'"  one is not
interested in what printer created some specific book, one is
interested in knowing what person created the conceptual content -
which has millions of representations in physical objects.  For
myself and at least some other ontologists, the conceptual
work is the original and primary object for logical reasoning,
rather than some specific physical embodiment of the work.  I don't
doubt that with the proper inferencing routines, one can somehow
reason about that conceptual work, but this forum needs to be aware
that  there are some who think the conceptual works themselves
can and should be represented directly.
    For complete invoices, it may be that one is more commonly
interested in the physical object itself, but for the various
data elements in an invoice, what is of interest is the
conceptual content, not the physical symbols on a piece of paper.
When one talks or thinks about the "Price" of an object, one
is more commonly concerned with the abstract notion of value,
not with how it is written on a piece of paper.  I think
we need to represent both the idea and the physical symbols
as well.    (05)


    As for the representation of abstract symbols, from a reading of
the documentation of SUMO "Proposition", it does not appear that
symbols per se would be instances, unless they represent
some kind of complete thought:    (06)

"&%Propositions are &%Abstract entities that express a complete
thought or a set of such thoughts. As an example, the formula
'(instance Yojo Cat)' expresses the &%Proposition that the entity
named Yojo is an element of the &%Class of Cats. Note that
propositions are not restricted to the content expressed by individual
  sentences of a &%Language. They may encompass the content expressed
by theories, books, and even whole libraries. It is important to
distinguish &%Propositions from the &%ContentBearingObjects that
express them. A &%Proposition is a piece of information, e.g. that
the cat is on the mat, but a &%ContentBearingObject is an &%Object
that represents this information. A &%Proposition is an abstraction
that may have multiple representations: strings, sounds, icons, etc.
For example, the &%Proposition that the cat is on the mat is
represented here as a string of graphical characters displayed on a
monitor and/or printed on paper, but it can be represented by a
sequence of sounds or by some non-latin alphabet or by some
cryptographic form"    (07)


Perhaps the abstract symbols "USD", "$", or the word "dollars"
are intended to fit under that category, but neither the text nor
the given instances suggest that.    (08)



[PC]>>
 >>    For the "CurrencyCode", if one were going to use SUMO, I would
 >> recommend addition of a subclass of "SymbolicObject" under
 >> "Abstract".  "SymbolicObject" would have as a subclass
 >> "CurrencySymbol".
 >
 >
[AP]> I'd suggest subclassing Character, if one were going to do
 > this, but I  don't see that this is necessary.  The MId-Level
 > Ontology has  &%DigitCharacter, for example.
 >    (09)

   Again, we are addressing two different things.  Adam is focusing
on the physical object that represents the conceptual notion
of "currency symbol", I am saying that the abstract notion of
"currency symbol" itself is part of one or more fields in the
abstract "invoice".    (010)


 >
 >
 > You could just use the existing &%refers for this.  In general one
 > could  create an endless number of specializations of various
 > concepts, but in  order to avoid a such proliferation, ask what are
 > the *differentia*  between the new concept and the existing
 > concepts.  If there are no such
 > differentia, then it doesn't make sense to create the new concept.
 >
    It goes without saying that one should create a subclass of
a class in an ontology only if there are additional differentiating
properties that distinguish it from the parent (and if it appears
to be useful for one's purposes).  I will leave the question of how
an abstract "CurrencySymbol" specializes the general notion of
"Proposition" (or "Abstract") as an exercise for the reader.
(hint: it has a specific relation to a CurrencyMeasure).    (011)


 >
 >
[AP]> I believe this would be a bad path to take.  Our job is not to
 > formalize  text forms, but rather the informational content of a
 > particular kind of  form - an invoice.  So the fact that there is
 > a document with parts that  are text fields is irrelevant.  The
 > relevant issue is that an  information object, which is an invoice,
 > contains a number of subsidiary  information items, which include
 > an address, a total cost etc.
 >
     I'm not sure what part of my suggestion Adam objects to here.
I agree that we primarily want to formalize the information
content, which is why I have suggested the extension of the
"Abstract" section of SUMO to include "Invoice-abstract" and
its abstract parts.    (012)

     Pat    (013)


-- 
=============================================
Patrick Cassidy    (014)

MICRA, Inc.                      || (908) 561-3416
735 Belvidere Ave.               || (908) 668-5252 (if no answer)
Plainfield, NJ 07062-2054        || (908) 668-5904 (fax)    (015)

internet:   cassidy@xxxxxxxxx
=============================================    (016)



============================================    (017)

Adam Pease wrote:    (018)

> Pat,
> 
> At 06:35 PM 7/11/2003 -0400, Patrick Cassidy wrote:
> 
>> Just a few comments on some of the questions raised by Mike:
>>
>>
>> >
>> > 2. There is a field called Invoice types.  Did not see an enumerated
>> > list for these but if you find one -- tell me.  Probably need to be
>> > modeled as either subclasses or as axioms.
>> >
>>   In my experience working with domain ontologists, when they
>> talk about a "type" of some object, it invariably has the
>> same logical meaning as a "subclass".  But in any given case,
>> without instances, it can be hard to know what is intended.
>>
>>
>> > 3. There are CurrencyCodes for invoice, tax and pricing.  My take
>> > on this is that this is the same as defining a CurrencyType which
>> > can be modeled in protegy as a slot with a range of "Class" where
>> > you define the parent class and the valid values are any subclass.
>> > So, I mapped this to the SUMO CurrencyMeasure class.
>> >
>>    This gets into one of the characteristics of SUMO which differs
>> from CYC.   In SUMO, "CurrencyMeasure" is an abstract concept that
>> refers to a unit of measure, which is a currency.  The symbol used for
>> that currency in a document would be a separate abstract concept,
>> which, as best I can tell, has no representation in the SUMO
>> class hierarchy.
> 
> 
> Do mean the character "$", for example?  That would be a SUMO &%Character.
> 
>> In SUMO, classes of texts tend to be found
>> explicitly only in the "Physical" section, representing the physical
>> objects which are texts (instances of book or invoice).  Such
>> documents are related to the more abstract objects which they
>> represent by the "containsInformation" relation, which points to a
>> "Proposition". But there are few explicitly reified Conceptual classes
>> corresponding to the abstract objects.  So there is a place in the 
>> SUMO class hierarchy for every single copy of a novel, but no explicit
>> class other than the very generic "Proposition" for the more abstract
>> idea of the novel of which the individual artifacts are physical
>> representations.
> 
> 
> Proposition is the correct class in SUMO for the information content of 
> a text.  Would you want that class defined differently?  If so, how?  
> Would you want a subclass added?  If so, why?
> 
>>     In Cyc, there is an abstract ("intangible") class "ConceptualWork"
>> with subclasses such as "Book-CW".  Instances of these classes
>> are the unique conceptual content which may be represented in any
>> number of physical objects -- different copies of the same book,
>> or a representation on some electronic data storage medium, in
>> any font.
>>
>>    For the "CurrencyCode", if one were going to use SUMO, I would 
>> recommend addition of a subclass of "SymbolicObject" under 
>> "Abstract".  "SymbolicObject" would have as a subclass "CurrencySymbol".
> 
> 
> I'd suggest subclassing Character, if one were going to do this, but I 
> don't see that this is necessary.  The MId-Level Ontology has 
> &%DigitCharacter, for example.
> 
>>  This
>> abstract concept (e.g. the idea for a "Pounds" symbol) could then be
>> related to SUMO "CurrencyMeasure" by a new relation, e.g. "hasSymbol"
>> between the CurrencyMeasure (e.g. UnitedStatesDollar) and the
>> abstract symbol (e.g. "DollarSymbol")  In turn, the physical printed
>> dollars symbols would be represented under SUMO "SymbolicString".
>> The existing SUMO "containsInformation" may not be the proper
>> relation between the SymbolicString and the abstract "DollarSymbol";
>> a new relation may be needed.  A subclass of "SymbolicString" such
>> as "CurrencyString" would also be useful.  Then, instances of 
>> CurrencyString would be, for example the individual printed characters
>> "$" or "USD" in some specific text.
> 
> 
> You could just use the existing &%refers for this.  In general one could 
> create an endless number of specializations of various concepts, but in 
> order to avoid a such proliferation, ask what are the *differentia* 
> between the new concept and the existing concepts.  If there are no such 
> differentia, then it doesn't make sense to create the new concept.
> 
>>    Considerations like that above suggest that ideas that are
>> expressed in symbols which ultimately are represented on physical 
>> objects need three levels of representation in an ontology.
>> If this seems complicated, it is only because the human facility
>> can handle such multiple related but subtly different concepts
>> without us even noticing the differences.  But the machines can't,
>> at least not yet.  So I believe that all these different concepts
>> need to be represented, and preferably explicitly so in the
>> class hierarchy.
>>    It also follows that there should be an abstract concept
>> "Invoice-Conceptual" as well as a class for the physical
>> document -- of which there may be multiple copies.
>> Each data field in the invoice would also have to have its own
>> class, each a subclass of "Abstract" (one could create
>> a generic class "TransactionDatum" under "Abstract" to hold all
>> of the data fields that might be relevant to a transaction).
>>
>>
>>
>> > 5. The way "date" is modeled in SUMO seems strange.  In the protege
>> >  version (it may be more natural in KIF), this is modeled as an
>> > Instance of BinaryPredicate.  I get why date is a binary predicate
>> > (there is  documentation)  but if I want a specific date attached as
>> >  a slot, it is a binary  predicate by default -- i.e.
>> >  date(ContainingClass, value).
>>     I think one could add a class "Date" to the SUMO hierarchy under
>> "Day".  The difference from "Day" would be that "Date" has an
>> explicit representation as Month-Day-Year.  This class could then be
>> the "range" class for the relation "date".  But this use of the
>> SUMO relation "date" is I think not what is needed -- see below.
> 
> 
> SUMO is defined in FOL, which allows us to use functions for dates.  
> Since Protege lacks this ability, it's causing confusion.  This is one 
> example of where Protege can hinder, rather than help us.  Rendering or 
> parsing a certain date string format is an issue for a parser or 
> generator, not an ontology.  SUMO handles date information quite well in 
> its existing form.
> 
> 
>> >   Also, in protege there is no way to
>> >  say that a slot should be of type a specific instance (because an
>> >  instance is not a domain, it is a range of some other domain).
>> > So what I did was for the specific dates in the UBL spreadsheet for
>> > Invoice (like "IssueDate") ... I made its value an Instance of Class
>> > TimePoint.
>> >
>>    I haven't been following this line closely -- busy with other
>> issues right now.  So I'm not clear what you want to do in Protege.
>> Slots can be defined on classes and the "range" can be a class or
>> an instance (or a symbol or a number or a string).  There are
>> instance values and default values that can be specified, also.
>> What is it that you can't do?
>>
>> ===================
>>
>>
>>     Generally, what I think needs to be done is to define an abstract
>> concept "Invoice-Conceptual" which would have proper parts which are
>> the data fields, each of which would also be a class of abstract
>> conceptual objects.  Each of these conceptual objects would have a
>> representation which is the physical markings on some
>> physical object (a piece of paper or electron patterns in a
>> computer).  Then the "part" of the Invoice which is a "date" would
>> be, e.g. "TaxPointDate" (a subclass of "Date") and there
>> would also be a "IssueDate" (if "Issuing" were a subclass of
>> "Creation" in SUMO, the IssueDate would be the EndFn 
>> (WhenFn(Issuing)).  Each of the Data fields might also
>> have parts which are data fields.  The conceptual objects could
>> also have formatting properties; each of the data fields
>> might be specified as to where on a printed document it would
>> appear.  In this view, the HTML formatting markup in the
>> "ConceptualWork" which is a document in its abstract sense
>> would represent properties of the data fields.
> 
> 
> I believe this would be a bad path to take.  Our job is not to formalize 
> text forms, but rather the informational content of a particular kind of 
> form - an invoice.  So the fact that there is a document with parts that 
> are text fields is irrelevant.  The relevant issue is that an 
> information object, which is an invoice, contains a number of subsidiary 
> information items, which include an address, a total cost etc.
> 
> Adam
> 
> 
> 
>>    If this seems to make sense, I would try to create specific
>> representation for representation, but that would probably
>> require some additions to what is now in SUMO.  Especially, as
>> I mentioned above, I believe that abstract symbolic objects
>> could use more explicit representation.
>>
>>    Pat
>>
>> =============================================
>> Patrick Cassidy
>>
>> MICRA, Inc.                      || (908) 561-3416
>> 735 Belvidere Ave.               || (908) 668-5252 (if no answer)
>> Plainfield, NJ 07062-2054        || (908) 668-5904 (fax)
>>
>> internet:   cassidy@xxxxxxxxx
>> =============================================
>>
>> _________________________________________________________________
>> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>> Subscribe/Unsubscribe/Config: 
>> http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
>> Shared Files: http://ontolog.cim3.net/file/
>> Community Wiki: http://ontolog.cim3.net/wiki/ To Post: 
>> mailto:ontolog-forum@xxxxxxxxxxxxxxxx
> 
>     (019)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Unsubscribe/Config: 
http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (020)

<Prev in Thread] Current Thread [Next in Thread>