Denise, (01)
Today, you described an example of a functional requirement you intend
to discuss on April 20
which I summarize below: (02)
What do users want to do in terms of tagging or using tagged Ontolog
content? (03)
In my view, there is an implied architecture/process context behind this
requirement that might help put the
discussion in a broader perspective. There are 5 parts to this context. (04)
1) a context of workflow processes/activities (05)
i.e., the user (an actor) interacts (activities) with ontolog (a service) (06)
Distinguishing the current context (untagged ontolog) from the desired
context (tagged ontolog)
may help explain the purpose of tagging in the context of a process
optimization/improvement problem. (07)
2) process improvement/optimization (08)
For example, if the rationale for tagging ontolog is to improve
searching meetings calls/emails pertaining
to a particular topic then, we're talking about query optimization. To
understand what kind of optimization
is involved, we need to describe this type of query in the broader
context of the other activities that use/contribute
to the ontolog forum (i.e., a database). (09)
It would help to describe what this optimization problem is in the
context of an explicitly defined workflow/process
and of explicitly defined/analyzable data (I believe you said that a
requirement for tagging is having an ontology of
the information being tagged). (010)
Describing "tagging ontolog" as a process optimization problem requires
explicitly defined & analyzable ontologies. (011)
3) knowledge base / semantic processing (012)
At minimum, we need two kinds of ontologies: (013)
- workflow/process (e..g., in the style of PSL/FLOWS) to describe
"query" as a process activity
- metrics to talk about the performance of activities and about the
utility/value of data (e.g., query results, hit/miss, ...) (014)
Collecting metrics about process activities and about the data these
activities operate requires an instrumented workflow architecture
that enables the systematic analysis/mining of data about the occurence
of activities as well as about the input/output data of these activity
occurences.
Furthermore, there's really not much of a difference between
domain-specific data (e.g., ontolog wiki pages) and data from
instrumented workflows/processes:
it's just different kinds of data. This makes UIMA not just a technology
solution but an architecture philosophy to approach this problem. (015)
4) data analysis (016)
Search engines like Google dwell on data mining as an analytical means
to extract knowledge (e.g., page ranking).
In turn, this knowledge (i.e., data) drives the optimization of core
activities of the workflow (i.e., search queries)
From an artificial intelligence / machine learning perspective, page
ranking optimization is a form of unsupervised learning
where there is no a-priori learning objective except raw performance
improvement. With tagging, we can put a supervised
learning perspective on the optimization problem, i.e., minizing false
positive/negative hits of tag-based searching.
For example, it makes sense to compare different tag ontologies, tagging
algorithms, tag-based search algorithms, etc...
We can also use semantic descriptions of the workflow activities to
evaluate optimal points in the workflow for injecting
tags, tag validation, etc... There are lots of technologies that can
help to do this (data mining, support vector machines, etc...)
but fundamentally, it is worth optimizing the activities we trust and
the information we consider valuable. This brings me
to the last part. (017)
5) trust/secure information & processing (018)
One way to describe this part is that untagged ontolog content is
"unsecure/untrusted" information.
Tags also have an implicit trust value. We'd trust the tags from an
expert but might be skeptical w.r.t the
search results based on tags that a neophyte created. In social
networks, the friend-of-a-friend (FOAF) ontology
is a kind of limited trust/secure information ontology. Using a
versatile architecture for data analysis/mining/pattern matching/etc...
and guidance from problem-solving/learning algorithms (e.g., SOAR), we
can use trust/security as a metric
for the quality/utility of tags and drive the overall optimization
problem as an issue of maximizing trust/security in live, evolving
large-scale semantic databases. (019)
-- Nicolas. (020)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Unsubscribe/Config:
http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (021)
|