[Top] [All Lists]

Re: [ontology-summit] Track 3 - Big Data, Machine Learning and Ontologie

To: Ontology Summit 2012 discussion <ontology-summit@xxxxxxxxxxxxxxxx>
From: Bart Gajderowicz <bgajdero@xxxxxxxxxx>
Date: Wed, 8 Feb 2012 09:41:42 -0500
Message-id: <CABw=6A6u2714F+2oaO16X4FE-61SufMcNFmw_yhyJ6fdNd_2sA@xxxxxxxxxxxxxx>
Correction... I meant to say during the 01/19 conference call:
http://ontolog.cim3.net/cgi-bin/wiki.pl?ConferenceCall_2012_01_19#nid328G    (01)

Sorry about that.    (02)

On 8 February 2012 00:48, Bart Gajderowicz <bgajdero@xxxxxxxxxx> wrote:
> During the 02/19 conference call, Lucier Brady presented a sample
> problem which can benefit from Automatic Programming.
> I'd like to present a similar idea in terms of Ontology Learning, or
> Ontology Extension.
> *Goal:*
> Enable scientist to make maximum use of big data
> *Proposals:*
> - Extending Ontologies with Data
> - Associating Data with Ontologies
> - Using machine learning on ontologically enhanced/extended data.
> *Simple Use Case:*
> Instead of statistical interpretation (mean, standard deviation, mode,
> etc) we can show semantic relations of data, and analyse it at a more
> abstract level.
> View data at different levels of abstraction and granularity.
> How does height relate to weight
> How does weight relate to weather
> how does weather relate to geographical location
>  - How does weight relate to geographical location?
> Statistical models reveal correlations, with a degree of certainty.
> Semantic models may tell us about causation.
> *My Background*
> I have been researching incorporating ontologies in the field of
> machine learning . I have applied to ontology matching (at the system
> level it's semantic integration). My MSc thesis [1], as well as paper
> [2] in the 2009 Uncertainty and Reasoning for the Semantic Web
> workshop proceedings include my work on these topics. As a well known
> reference, this work is related to the 2003 work on GLUE by Doan et
> al. [3].
> The following snippet taken form a paper I'm working on, is a summary
> of the latest work on Ontology Learning and utilizing Decision Trees
> with OWL.
> If anyone is interested in this approach, I would love to hear your
> opinions, use cases, and references in this thread.
> *Ontology Learning*
> Ontology Learning is the area of research that deals with the
> construction and management of ontologies in a systematic way. This
> includes automatically adjusting ontologies to accommodate changes in
> data patterns as well as reflect variations in the data being
> represented [4]. Inductive learning [5] specifically uses machine
> learning algorithms to generate ontology extensions and refinements.
> Traditionally, work in this field concentrated on text based data
> [4][6], such as articles and research papers. Learning ontologies
> beyond text is still an open problem.
> Inductively derived rules and machine learning have been successfully
> applied to many different applications [7]. d’Amato et al. [5] address
> how this benefits the Semantic Web, and why it’s important to merge
> the worlds of ontologies and machine learning. The authors list
> several approaches that use various sources from the Semantic Web,
> such as folksonomies and Linked Data [8] in order to construct
> ontologies, and call this process Ontology Mining.
> For sources that contain information in the form of text, ontology
> mining is performed using NLP techniques. However, as the authors
> note, semantic rules derived from these sources are not completely
> clear, and the expressiveness of the language used to represent these
> rules is not as expressive as OWL. Sources with annotated observations
> and some background knowledge make it possible to deduce additional
> information by extending the provided observations and background
> knowledge [9]. Existing annotations act as a starting point in the
> form of a small set of simple rules stored in an ontology. The
> background knowledge acts as an external source and is introduced
> during the learning process. As new information becomes available,
> incremental changes are made to the existing rules, extending and
> refining them in the process.
> Another approach learns concept descriptions for an existing taxonomy
> by clustering annotated data to create meaningful groups that
> represent similar concepts [5]. These types of methods are used for
> ontology evolution, pattern recognition within the ontology, scaling
> large ontologies by incrementally inducing them to a manageable size,
> and finally for building probabilistic ontologies when uncertainty in
> the derived rules is unavoidable.
> For structured data, such as numeric data and database records,
> Stocker et al. [4] apply ontology learning to create domain-ontologies
> of environmental data that originate as numerical measurements. A
> taxonomy of lakes was used to classify various bodies of water. A
> major problem with this is that building ontologies based on numerical
> data is often biased [10]. For example classifying a body of water as
> being low or high in nitrogen is highly objective. Stocker et al. [4]
> defined two properties richIn and poorIn as identifying whether a body
> of water is rich in nitrogen or poor in nitrogen, respectively. Their
> data shows that based on variation of nitrogen levels in Finland, the
> threshold for a body of water classified as being richIn nitrogen is
> 0.88. They then compared this value to the Spanish threshold for a
> richIn property of nitrogen in a body of water at 8.36. In fact, the
> Spanish threshold for poorIn property at 0.78 is closer to the Finish
> richIn of 0.88. Clearly defining qualitative property such as richIn
> or poorIn with quantitative values can be incredibly misleading.
> To remove objectivity and bias from the models, Stocker at al. [4]
> propose creating models by deriving semantic rules such as:
> poorIn(?i, Nitrogen) ← totalNitrogen(?i,?x) ∧ lessThanOrEqual(?x,?y)
> and representing them in RDF. There are no values present in the rule
> above, only relations which define the concept poorIn for nitrogen.
> These rules were derived using the k-means clustering algorithm, and a
> general purpose rule engine was applied to the RDF rules for
> rule-based reasoning which generates a static inference model. The
> SPARQL [11] query language was then used to query the inference models
> for lakes that are richIn and poorIn nitrogen.
> *OWL as Decision Trees*
> Biological and chemical information is increasingly being published
> and shared using semantic technologies [12][13]. Much of the analysis
> on this type of information has not caught up to the latest
> representation languages such as RDF and OWL. For example, the
> toxicity of chemical products is often analyzed using statistical
> analysis of chemical features. These features focus on a chemical’s
> structure and function. A popular method to achieve this is the
> development of decision trees by mining empirical toxicology data. It
> is beneficial for the representation and analysis to be done in
> compatible, or better yet, the same languages. Chepelev et al. [13]
> have created such decision trees represented in the OWL language
> specifically for toxicity classification. The result are OWL rules
> which classify toxicity features. An OWL reasoner was then used to
> characterize the toxicity of various chemical products. Datasets were
> compared semantically by examining logical equivalences between the
> OWL decision trees. However, the underlying decision trees
> differentiating between toxic and non-toxic classes were not easily
> created due to significant overlap. The addition of chemical product
> structure was required to disambiguate the various classification
> rules.
> Another use of semantic technologies to represent decision trees has
> been conducted by Holford et al. [14], where the Semantic Web Rule
> Language (SWRL) [14] was used to create decision trees that classify
> human pseudogenes. Specifically, this research focused on the
> relationship between pseudogenes and segment duplications (SD), or DNA
> patterns that map to multiple locations on a genome. By representing
> these trees in a Semantic Web language, researchers in the biomedical
> field can share and extend the derived ontologies. In this work, the
> Sequence Ontology (SO) [16], which provides terms and relationships
> for sequence annotations, was extended with data. The data was
> provided by the http://pseudogene.org website. An SWRL reasoner was
> used to ensure consistency and satisfiablility of the derived rules.
> It was also possible to query these rules for various types of
> pseudogene classifications. These queries took advantage of both the
> structures in SO and the incorporated data.
> As Fanizzi et. al [17] demonstrate, it is not always desirable to
> learn ontologies, but also to learn from them. In their work, an
> existing OWL ontology is used to generate decision trees called
> terminological decision trees which are represented as OWL-DL classes.
> Like their traditional data-based decision tree counterparts,
> terminological decision trees are based on frequent patterns in the
> ontology’s defined OWL roles. Unlike traditional decision trees that
> use conditions such as wa:Direction = ‘North‘ or        wa:Temp = 30, 
> rules, called concept description, use the OWL roles
> defined in      the     ontology,       such    as     
> ∃hasPart.Worn   and
> ∃hasPart.(¬Replaceable). Such concept descriptions are in the form:
> SendBack ≡ ∃hasP art.(Worn ⊓ ¬Replaceable).
> An important distinction from traditional decision tree nodes is that
> a concept description is made up of the actual roles defined in the
> ontology. As a result, it is not as cryptic and specific as the type
> of decision tree traditionally created for data classification and
> prediction. This type of tree node is better at reflecting manually
> created and human readable roles [13].
> * References *
> [1] B. Gajderowicz and A. Sadeghian, "Ontology granulation through
> inductive decision trees," in URSW, ser. CEUR Workshop Proceedings, F.
> Bobillo, P. C. G. da Costa, C. d’Amato, N. Fanizzi, K. B. Laskey, K.
> J. Laskey, T. Lukasiewicz, T. Martin, M. Nickles, M. Pool, and P.
> Smrz, Eds., vol. 527.   CEURWS.org, 2009, pp. 39–50.
> [2] B. Gajderowicz, "Using decision trees for inductively driven
> semantic integration and ontology matching," Master’s thesis, Ryerson
> University, 250 Victoria Street, Toronto, Ontario,
> Canada, 2011
> [3] A. Doan, J. Madhavan, P. Domingos, and A. Halevy, "Learning to Map
> Between Ontologies on the Semantic Web," in Proc 11th International
> Conference on World Wide Web (WWW'02), ACM, New York, NY, 2002.
> [14 M. Stocker, M. Ronkko, F. Villa, and M. Kolehmainen, "The
> relevance of measurement data in environmental ontology learning," in
> Environmental Software Systems. Frameworks of eEnvironment, ser. IFIP
> Advances in Information and Communication Technology, J. Hreb ́ıcek,
> G. Schimak, and R. Denzer, Eds. Springer Boston, 2011, vol. 359, pp.
> 445–453.
> [5] C. d’Amato, N. Fanizzi, and F. Esposito, "Inductive learning for
> the semantic web: What does it buy?" Semantic Web, vol. 1, no. 1, pp.
> 53–59, 2010.
> [6] L. Zhou, "Ontology learning: state of the art and open issues,"
> Information Technology and Management, vol. 8, no. 3, pp. 241–252,
> Sep. 2007.
> [7] I. Witten and E. Frank, Data Mining: Practical machine learning
> tools and techniques, 2nd ed.   San Francisco: Morgan Kaufmann
> Publishers, 2005.
> [8] C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee, "Linked data on
> the web (ldow2008)," in Proceeding of the 17th international
> conference on World Wide Web, ser. WWW ’08. New York, NY, USA: ACM,
> 2008, pp. 1265–1266.
> [9] J. Lehmann and P. Hitzler, "Concept learning in description logics
> using refinement operators," Mach. Learn., vol. 78, pp. 203–250,
> January 2010.
> [10]    M. Brodaric and M. Gahegan, "Experiments to examine the situated
> nature of geoscientific concepts," Spatial Cognition and Computation:
> An Interdisciplinary Journal, vol. 7, no. 1, pp. 61– 95, 2007.
> [11] A. S. Eric Prud’hommeaux. (2008, January) Sparql query language
> for rdf. [Online]. Available: http://www.w3.org/ TR/rdf-sparql-query/
> [12] F. Belleau, M.-A. Nolin, N. Tourigny, P. Rigault, and J.
> Morissette, "Bio2rdf: towards a mashup to build bioinformatics
> knowledge systems." Journal of Biomedical Informatics, vol. 41, no. 5,
> pp. 706–716, 2008.
> [13] D. K. Leonid L. Chepelev and M. Dumontier, "Chemical hazard
> estimation and method comparison with owl-encoded toxicity decision
> trees," in OWLED 2011 OWL: Experiences and Directions, June 2011.
> [14] M. E. Holford, E. Khurana, K.-H. Cheung, and M. Gerstein, "Using
> semantic web rules to reason on an ontology of pseudogenes,"
> Bioinformatics, vol. 26, pp. i71–i78, June 2010. [Online]. Available:
> http://dx.doi.org/10.1093/bioinformatics/btq173
> [15] I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B.
> Grosof, and M. Dean, "SWRL: A Semantic Web Rule Language Combining OWL
> and RuleML," W3C Member Submission, World Wide Web Consortium, Tech.
> Rep., May 2004.
> [16] K. Eilbeck and S. E. Lewis, "Sequence ontology annotation guide:
> Conference papers," Comp. Funct. Genomics, vol. 5, pp. 642–647,
> December 2004.
> [17] N. Fanizzi, C. d’Amato, and F. Esposito, "Towards the induction
> of terminological decision trees," in Proceedings of the 2010 ACM
> Symposium on Applied Computing, ser. SAC ’10. New York, NY, USA: ACM,
> 2010, pp. 1423–1427.
> Thanks
> --
> Bart Gajderowicz, MSc.
> Ryerson University
> http://www.scs.ryerson.ca/~bgajdero    (03)

Bart Gajderowicz, MSc.
Ryerson University
http://www.scs.ryerson.ca/~bgajdero    (04)

Msg Archives: http://ontolog.cim3.net/forum/ontology-summit/   
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontology-summit/  
Unsubscribe: mailto:ontology-summit-leave@xxxxxxxxxxxxxxxx
Community Files: http://ontolog.cim3.net/file/work/OntologySummit2012/
Community Wiki: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2012  
Community Portal: http://ontolog.cim3.net/wiki/     (05)
<Prev in Thread] Current Thread [Next in Thread>