[Top] [All Lists]

[ontolog-forum] April 20 session on tagging ontolog content

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Nicolas F Rouquette <nicolas.rouquette@xxxxxxxxxxxx>
Date: Thu, 06 Apr 2006 12:10:11 -0700
Message-id: <44356793.9050505@xxxxxxxxxxxx>
Denise,    (01)

Today, you described an example of a functional requirement you intend 
to discuss on April 20
which I summarize below:    (02)

What do users want to do in terms of tagging or using tagged Ontolog 
content?    (03)

In my view, there is an implied architecture/process context behind this 
requirement that might help put the
discussion in a broader perspective. There are 5 parts to this context.    (04)

1) a context of workflow processes/activities    (05)

i.e., the user (an actor) interacts (activities) with ontolog (a service)    (06)

Distinguishing the current context (untagged ontolog) from the desired 
context (tagged ontolog)
may help explain the purpose of tagging in the context of a process 
optimization/improvement problem.    (07)

2) process improvement/optimization    (08)

For example, if the rationale for tagging ontolog is to improve 
searching meetings calls/emails pertaining
to a particular topic then, we're talking about query optimization. To 
understand what kind of optimization
is involved, we need to describe this type of query in the broader 
context of the other activities that use/contribute
to the ontolog forum (i.e., a database).    (09)

It would help to describe what this optimization problem is in the 
context of an explicitly defined workflow/process
and of explicitly defined/analyzable data (I believe you said that a 
requirement for tagging is having an ontology of
the information being tagged).    (010)

Describing "tagging ontolog" as a process optimization problem requires 
explicitly defined & analyzable ontologies.    (011)

3) knowledge base / semantic processing    (012)

At minimum, we need two kinds of ontologies:    (013)

- workflow/process (e..g., in the style of PSL/FLOWS) to describe 
"query" as a process activity
- metrics to talk about the performance of activities and about the 
utility/value of data (e.g., query results, hit/miss, ...)    (014)

Collecting metrics about process activities and about the data these 
activities operate requires an instrumented workflow architecture
that enables the systematic analysis/mining of data about the occurence 
of activities as well as about the input/output data of these activity 
Furthermore, there's really not much of a difference between 
domain-specific data (e.g., ontolog wiki pages) and data from 
instrumented workflows/processes:
it's just different kinds of data. This makes UIMA not just a technology 
solution but an architecture philosophy to approach this problem.    (015)

4) data analysis    (016)

Search engines like Google dwell on data mining as an analytical means 
to extract knowledge (e.g., page ranking).
In turn, this knowledge (i.e., data) drives the optimization of core 
activities of the workflow (i.e., search queries)
 From an artificial intelligence / machine learning perspective, page 
ranking optimization is a form of unsupervised learning
where there is no a-priori learning objective except raw performance 
improvement. With tagging, we can put a supervised
learning perspective on the optimization problem, i.e., minizing false 
positive/negative hits of tag-based searching.
For example, it makes sense to compare different tag ontologies, tagging 
algorithms, tag-based search algorithms, etc...
We can also use semantic descriptions of the workflow activities to 
evaluate optimal points in the workflow for injecting
tags, tag validation, etc... There are lots of technologies that can 
help to do this (data mining, support vector machines, etc...)
but fundamentally, it is worth optimizing the activities we trust and 
the information we consider valuable. This brings me
to the last part.    (017)

5) trust/secure information & processing    (018)

One way to describe this part is that untagged ontolog content is 
"unsecure/untrusted" information.
Tags also have an implicit trust value. We'd trust the tags from an 
expert but might be skeptical w.r.t the
search results based on tags that a neophyte created. In social 
networks, the friend-of-a-friend (FOAF) ontology
is a kind of limited trust/secure information ontology. Using a 
versatile architecture for data analysis/mining/pattern matching/etc...
and guidance from problem-solving/learning algorithms (e.g., SOAR), we 
can use trust/security as a metric
for the quality/utility of tags and drive the overall optimization 
problem as an issue of maximizing trust/security in live, evolving 
large-scale semantic databases.    (019)

-- Nicolas.    (020)

Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (021)

<Prev in Thread] Current Thread [Next in Thread>