ontology-summit
[Top] [All Lists]

[ontology-summit] HC-06: ISO 15926 Reference Data Validation Final repor

To: ontology-summit@xxxxxxxxxxxxxxxx
From: Victor Agroskin <vic5784@xxxxxxxxx>
Date: Tue, 30 Apr 2013 03:58:39 +0400
Message-id: <CAJbbDjRDLkQLPXBb7+F9qNd0burgN4f5jhBQLgEv=Ud0HROQVA@xxxxxxxxxxxxxx>
Ontology clinic HC-06
http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2013_Hackathon_Clinics_ISO_15926_RefDataValidation
took place on Sat 2013.03.30 2:00pm - 6:00pm (virtual and real
session) and 8:00pm - 10:00pm (virtual session). All time Moscow
UTC+4.    (01)

Real session gathered 5 participants in Moscow, more people from
Moscow, St. Petersburg, Surgut, Kiev, Zurich connected online. In
"open webcast" hour clinic's team was joined by experts from UK,
Spain, USA.    (02)

Implementation work was carried out in the environment of .15926
Editor (downloadable from http://techinvestlab.ru/dot15926Editor ).
All scripts in this report should be executed in .15926 Editor Python
console.    (03)

Other tools were tried, debugged and improved in the process but had
not delivered reportable results.    (04)

The team was focusing on OQuaRE quality metrics (refer to
http://miuras.inf.um.es/oquarewiki/index.php5/Quality_metrics)
adaptation for JORD/PCA reference data library. In the process some
verification tests were designed and executed for reference data
library.    (05)


1. Preparatory work    (06)

JORD/PCA reference data library was downloaded from
http://rds.posccaesar.org/downloads/PCA-RDL.owl.zip    (07)

This report was prepared for RDL available on 2013.04.10, which is
updated compared to the one available on 2013.03.30.    (08)

Subset of reference data was selected for further work, removing data
types which are specific for ISO 15926 ontology and rarely used in
mainstream OWL ontologies. Final subset includes:    (09)

- only classes of individual (no classes of classes);
- except classes of EXPRESS information representation (literal data
types in mainstream OWL ontologies).    (010)

The following code defines this subset (sub-ontology):    (011)

cois = find(type = part2.any.ClassOfIndividual)
expr = find(type=part2.any.ClassOfInformationRepresentation)
subont = (cois-expr)
show(id=subont)    (012)

The root class for taxonomy tree in RDL is "ISO 15926-4 THING":    (013)

root = 'http://posccaesar.org/rdl/RDS398732751'    (014)


2. LCOMOnto - Lack of Cohesion in Methods
http://miuras.inf.um.es/oquarewiki/index.php5/Quality_metrics#LCOMOnto_-_Lack_of_Cohesion_in_Methods
Average length of specialization path leading from root taxonomy class
to the leaf class (class without further subclasses).    (015)

The set of all non-leaf classes (classes which have subclasses) and
the set of leaf classes in sub-ontology were found:    (016)

nonleaf = find(type = part2.Specialization, hasSuperclass=out)
leaf = (subont-nonleaf)
show(id=leaf)    (017)

The following code looks for all subclasses of "ISO 15926-4 THING",
then for their subclasses, their subclasses, etc.  At each step number
of leaf classes reached and total length of paths leading to them are
calculated. The search stops when no new subclasses are found along
specialization relationships (it is important to notice that reference
data taxonomy contains cycles).    (018)

path_sum = 0
leaf_num = 0
deep = 0
next = set([root])
all = set()
while next:
   deep += 1
   print deep
   next = find(type = part2.Specialization, hasSubclass=out, hasSuperclass=next)
   found = leaf & next
   next = next - all
   all |= next
   print len(found )
   path_sum += deep*len(found )
   leaf_num += len(found )
LCOMOnto = path_sum/leaf_num
print 'LCOMOnto = '+ str(LCOMOnto)    (019)

If some class is reached by different paths of the same length, only
one path will be included in the calculation. This difference from
original methodology was not corrected.    (020)

The value of metric LCOMOnto = 8.    (021)


3. Free class identification    (022)

The following code shows all classes from this subontology which are
not subclasses of any class, therefore are not connected to "ISO
15926-4 THING" by any chain of specializations. Some of them have
their own subtaxonomies.    (023)


cois = find(type = part2.any.ClassOfIndividual)
expr = find(type=part2.any.ClassOfInformationRepresentation)
subont = (cois-expr)
nonroot = find(type = part2.Specialization, hasSubclass=out)
roots=subont-nonroot
show(id=roots)    (024)

There were 823 such classes at the time of preparation of this report.
The list of URIs is published at
http://15926.org/download/file.php?id=131    (025)


4. Cycle identification    (026)

Reference data library was checked for specialization cycles. The
following code traces specialization chains from "ISO 15926-4 THING"
until no new classes are found, which gives us a set of classes
suspicious for cycle membership (this set may also contain classes
reachable by more then one specialization chain). Then for each member
of this set the check is performed whether the class is subclass of
itself.    (027)

root = 'http://posccaesar.org/rdl/RDS398732751'    (028)

next = set([root])
all = set()
cycles = set()    (029)

while True:
   next = find(type = part2.Specialization, hasSubclass=out, hasSuperclass=next)
   if not next - all:
      break
   all |= next    (030)

found = next
show(id = found )    (031)

for item in found:
   next = set([item])
   all = set()
   while True:
      next = find(type = part2.Specialization, hasSubclass=out,
hasSuperclass=next)
      if item in next:
         cycles.add(item)
      if not next - all:
         break
      all |= next    (032)

show(id = cycles )    (033)

Three classes are identified:    (034)

"CHP 406.4 X 9.53 ASME B36.19M"
http://posccaesar.org/rdl/RDS22601516315 which is declared subclass of
itself by specialization relationship
http://posccaesar.org/rdl/RDS2260812411    (035)

"SFT CLASS 200" http://posccaesar.org/rdl/RDS978526101 and "WATER"
http://posccaesar.org/rdl/RDS1012769 are each declared subclass of
another.    (036)


5. WMCOnto - Weighted Method Count    (037)

http://miuras.inf.um.es/oquarewiki/index.php5/Quality_metrics#WMCOnto_-_Weighted_Method_Count    (038)

Mean number of properties and relationships per class.    (039)

This code is optimized for speed, using internal data structures of
the Editor instead of search APIs. Properties and relationships are
counted as all triples where an entity from subontology is a subject
or an object, except annotation property triples (distinguished by a
namespace of a property predicate)    (040)

doc = appdata.active_document    (041)

from iso15926.io.rdf_base import compact_uri, curi_head    (042)

total = 0
cu = set([compact_uri('http://posccaesar.org/rdl/'),
compact_uri('http://www.w3.org/2000/01/rdf-schema#')])    (043)

for v in subont:
   for t in doc.grTriplesForSubj(v):
      if curi_head(t[1]) not in cu:
         total += 1
   total += len(set(doc.grTriplesForObj(v)))    (044)

print 'WMCOnto = ' + str(total/len(subont))    (045)


The value of metric  WMCOnto = 5    (046)


6. DITOnto - Depth of subsumption hierarchy    (047)

http://miuras.inf.um.es/oquarewiki/index.php5/Quality_metrics#DITOnto_-_Depth_of_subsumption_hierarchy    (048)

Length of the largest path from Thing to a leaf class.    (049)

As calculated by the code in LCOMOnto calculation, DITOnto = 16.    (050)



7. NACOnto - Number of Ancestor Classes    (051)

http://miuras.inf.um.es/oquarewiki/index.php5/Quality_metrics#NACOnto_-_Number_of_Ancestor_Classes    (052)

Mean number of ancestor classes per leaf class.    (053)

The number of leaf classes is 30 044 as shown by code:    (054)

show(id=leaf)    (055)

The number of direct parents of leaf classes is equal to the number of
specialization relationships (not classes themselves as each parent
class should be counted as many times as many children it has). There
are 36 965  specialization relationships with leaf classes as subclass
as shown by code:    (056)

show(type = part2.Specialization, hasSubclass=leaf)    (057)

NACOnto = 36 965  / 30 044 = 1.23    (058)


8. NOCOnto - Number of Children    (059)

http://miuras.inf.um.es/oquarewiki/index.php5/Quality_metrics#NOCOnto_-_Number_of_Children    (060)

Mean number of direct subclasses. It is the number of relationships
divided by the number of classes minus the relationships of Thing    (061)

The number of classes is 41 680  as shown by code:    (062)

show(id=subont)    (063)

The number of specialization relationships in the taxonomy
(subontology analyzed) is 50 220 as shown by code:    (064)

show(type=part2.Specialization, object= subont)    (065)

The number of subclasses of "ISO 15926-4 THING"  is 22.    (066)

NOCOnto = 50 220 / (41 680  - 22) = 1.2    (067)


9. Conclusions    (068)

Normalized Metric/Score results for subontology analyzed can be looked
up on http://miuras.inf.um.es/oquarewiki/index.php5/Quality_metrics .
Further comparison of  JORD/PCA reference data library with other
ontologies can be carried out once comparable data is collected in
OQuaRE projects.    (069)

Clinic activities were instrumental in bringing together .15926
developers and users, providing an opportunity to demonstrate software
usage patterns. Strong and weak sides of .15926 Editor ontology
exploration environment were identified which will guide further
development of software.    (070)

Verification results were reported to ISO 15926 community and to
reference data library maintainers on community forum:
http://15926.org/viewtopic.php?f=5&t=154 .    (071)

_________________________________________________________________
Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2013/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2013  
Community Portal: http://ontolog.cim3.net/wiki/     (072)
<Prev in Thread] Current Thread [Next in Thread>
  • [ontology-summit] HC-06: ISO 15926 Reference Data Validation Final report, Victor Agroskin <=