[Top] [All Lists]

Re: [ontolog-forum] Invoice ontology discussion points/issues

To: cassidy@xxxxxxxxx, "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>, "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Adam Pease <adampease@xxxxxxxxxxxxx>
Date: Fri, 11 Jul 2003 15:53:43 -0700
Message-id: <>
Pat,    (01)

At 06:35 PM 7/11/2003 -0400, Patrick Cassidy wrote:
>Just a few comments on some of the questions raised by Mike:
> >
> > 2. There is a field called Invoice types.  Did not see an enumerated
> > list for these but if you find one -- tell me.  Probably need to be
> > modeled as either subclasses or as axioms.
> >
>   In my experience working with domain ontologists, when they
>talk about a "type" of some object, it invariably has the
>same logical meaning as a "subclass".  But in any given case,
>without instances, it can be hard to know what is intended.
> > 3. There are CurrencyCodes for invoice, tax and pricing.  My take
> > on this is that this is the same as defining a CurrencyType which
> > can be modeled in protegy as a slot with a range of "Class" where
> > you define the parent class and the valid values are any subclass.
> > So, I mapped this to the SUMO CurrencyMeasure class.
> >
>    This gets into one of the characteristics of SUMO which differs
>from CYC.   In SUMO, "CurrencyMeasure" is an abstract concept that
>refers to a unit of measure, which is a currency.  The symbol used for
>that currency in a document would be a separate abstract concept,
>which, as best I can tell, has no representation in the SUMO
>class hierarchy.    (02)

Do mean the character "$", for example?  That would be a SUMO &%Character.    (03)

>In SUMO, classes of texts tend to be found
>explicitly only in the "Physical" section, representing the physical
>objects which are texts (instances of book or invoice).  Such
>documents are related to the more abstract objects which they
>represent by the "containsInformation" relation, which points to a
>"Proposition". But there are few explicitly reified Conceptual classes
>corresponding to the abstract objects.  So there is a place in the SUMO 
>class hierarchy for every single copy of a novel, but no explicit
>class other than the very generic "Proposition" for the more abstract
>idea of the novel of which the individual artifacts are physical
>representations.    (04)

Proposition is the correct class in SUMO for the information content of a 
text.  Would you want that class defined differently?  If so, how?  Would 
you want a subclass added?  If so, why?    (05)

>     In Cyc, there is an abstract ("intangible") class "ConceptualWork"
>with subclasses such as "Book-CW".  Instances of these classes
>are the unique conceptual content which may be represented in any
>number of physical objects -- different copies of the same book,
>or a representation on some electronic data storage medium, in
>any font.
>    For the "CurrencyCode", if one were going to use SUMO, I would 
> recommend addition of a subclass of "SymbolicObject" under 
> "Abstract".  "SymbolicObject" would have as a subclass "CurrencySymbol".    (06)

I'd suggest subclassing Character, if one were going to do this, but I 
don't see that this is necessary.  The MId-Level Ontology has 
&%DigitCharacter, for example.    (07)

>  This
>abstract concept (e.g. the idea for a "Pounds" symbol) could then be
>related to SUMO "CurrencyMeasure" by a new relation, e.g. "hasSymbol"
>between the CurrencyMeasure (e.g. UnitedStatesDollar) and the
>abstract symbol (e.g. "DollarSymbol")  In turn, the physical printed
>dollars symbols would be represented under SUMO "SymbolicString".
>The existing SUMO "containsInformation" may not be the proper
>relation between the SymbolicString and the abstract "DollarSymbol";
>a new relation may be needed.  A subclass of "SymbolicString" such
>as "CurrencyString" would also be useful.  Then, instances of 
>CurrencyString would be, for example the individual printed characters
>"$" or "USD" in some specific text.    (08)

You could just use the existing &%refers for this.  In general one could 
create an endless number of specializations of various concepts, but in 
order to avoid a such proliferation, ask what are the *differentia* between 
the new concept and the existing concepts.  If there are no such 
differentia, then it doesn't make sense to create the new concept.    (09)

>    Considerations like that above suggest that ideas that are
>expressed in symbols which ultimately are represented on physical objects 
>need three levels of representation in an ontology.
>If this seems complicated, it is only because the human facility
>can handle such multiple related but subtly different concepts
>without us even noticing the differences.  But the machines can't,
>at least not yet.  So I believe that all these different concepts
>need to be represented, and preferably explicitly so in the
>class hierarchy.
>    It also follows that there should be an abstract concept
>"Invoice-Conceptual" as well as a class for the physical
>document -- of which there may be multiple copies.
>Each data field in the invoice would also have to have its own
>class, each a subclass of "Abstract" (one could create
>a generic class "TransactionDatum" under "Abstract" to hold all
>of the data fields that might be relevant to a transaction).
> > 5. The way "date" is modeled in SUMO seems strange.  In the protege
> >  version (it may be more natural in KIF), this is modeled as an
> > Instance of BinaryPredicate.  I get why date is a binary predicate
> > (there is  documentation)  but if I want a specific date attached as
> >  a slot, it is a binary  predicate by default -- i.e.
> >  date(ContainingClass, value).
>     I think one could add a class "Date" to the SUMO hierarchy under
>"Day".  The difference from "Day" would be that "Date" has an
>explicit representation as Month-Day-Year.  This class could then be
>the "range" class for the relation "date".  But this use of the
>SUMO relation "date" is I think not what is needed -- see below.    (010)

SUMO is defined in FOL, which allows us to use functions for dates.  Since 
Protege lacks this ability, it's causing confusion.  This is one example of 
where Protege can hinder, rather than help us.  Rendering or parsing a 
certain date string format is an issue for a parser or generator, not an 
ontology.  SUMO handles date information quite well in its existing form.    (011)

> >   Also, in protege there is no way to
> >  say that a slot should be of type a specific instance (because an
> >  instance is not a domain, it is a range of some other domain).
> > So what I did was for the specific dates in the UBL spreadsheet for
> > Invoice (like "IssueDate") ... I made its value an Instance of Class
> > TimePoint.
> >
>    I haven't been following this line closely -- busy with other
>issues right now.  So I'm not clear what you want to do in Protege.
>Slots can be defined on classes and the "range" can be a class or
>an instance (or a symbol or a number or a string).  There are
>instance values and default values that can be specified, also.
>What is it that you can't do?
>     Generally, what I think needs to be done is to define an abstract
>concept "Invoice-Conceptual" which would have proper parts which are
>the data fields, each of which would also be a class of abstract
>conceptual objects.  Each of these conceptual objects would have a
>representation which is the physical markings on some
>physical object (a piece of paper or electron patterns in a
>computer).  Then the "part" of the Invoice which is a "date" would
>be, e.g. "TaxPointDate" (a subclass of "Date") and there
>would also be a "IssueDate" (if "Issuing" were a subclass of
>"Creation" in SUMO, the IssueDate would be the EndFn 
>(WhenFn(Issuing)).  Each of the Data fields might also
>have parts which are data fields.  The conceptual objects could
>also have formatting properties; each of the data fields
>might be specified as to where on a printed document it would
>appear.  In this view, the HTML formatting markup in the
>"ConceptualWork" which is a document in its abstract sense
>would represent properties of the data fields.    (012)

I believe this would be a bad path to take.  Our job is not to formalize 
text forms, but rather the informational content of a particular kind of 
form - an invoice.  So the fact that there is a document with parts that 
are text fields is irrelevant.  The relevant issue is that an information 
object, which is an invoice, contains a number of subsidiary information 
items, which include an address, a total cost etc.    (013)

Adam    (014)

>    If this seems to make sense, I would try to create specific
>representation for representation, but that would probably
>require some additions to what is now in SUMO.  Especially, as
>I mentioned above, I believe that abstract symbolic objects
>could use more explicit representation.
>    Pat
>Patrick Cassidy
>MICRA, Inc.                      || (908) 561-3416
>735 Belvidere Ave.               || (908) 668-5252 (if no answer)
>Plainfield, NJ 07062-2054        || (908) 668-5904 (fax)
>internet:   cassidy@xxxxxxxxx
>Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>Shared Files: http://ontolog.cim3.net/file/
>Community Wiki: http://ontolog.cim3.net/wiki/ To Post: 
>mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (015)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (016)

<Prev in Thread] Current Thread [Next in Thread>