Marcos,
(comments below)
On 5/21/2013 12:31 AM, Osorno, Marcos wrote:
John,
Thank you for the insights.
Great question and one
that has had my interest for quite a while.
Testing usually centers on two critical elements, 1.) what
are the requirements that must be met by the system, and
2.) is the testing to the spec. done by: a.) Inspection,
b.) demonstration, or c.) test.
Those certainly seem like good categories. It seems like I
encounter two general categories of systems: (1) generalized
knowledge systems (like Wikipedia) and (2) domain specific
applications of ontologies/schemas within other applications
(like Yelp or Google Maps). In the first case, I'd be curious
about general fundamentals for evaluating the requirements and
specifications of a generalized KR system. I think the second
case is more difficult because it requires analysis of the role
of the ontology within the context of the domain.
It sounds like you the systems you are dealing with are
deterministic, that is as long as you specify sufficient
requirements to only get one answer to a question. In a GPS system
you can get an answer you didn't expect if you haven't specified
some constraining metric such as "no highways". If that is the goal
then there needs to be a concerted effort to identify all the
metrics that constrain the possible types of responses. In this
case, the answer to your question can be restated as a statistics
problem: "How many random tests do I need to perform for a specified
confidence factor." My view is that as systems become more and more
complex, we need to match the complexity of the testing to one that
"makes sense" for the volume of data being processed. With the most
complex data sets I've worked on, which is on the order to 150
million points of dirty data, a psychometrician was involved to make
estimates and craft the statistics processing. (And with humans, we
use a therapist which is familiar with abnormal behavior, and they
are still fighting over the diagnostic process.)
The questions you ask
are answered differently depending on these two primary
issues. For example, your question about KR is best
answered as addressing it as a derived requirement. The
design approach addresses the problem and may select one
of a number of different types of KR. (A solid reason I
prefer to talk about data structures and methods, rather
than implementation languages.)
Lately, I've been thinking about it in similar ways but more
closely related to engineering cost: availability of libraries,
complexity of supporting code, complexity of join operations,
availability of support IT staff/developers, complexity of
backend support systems, etc. However, that still doesn't really
help me nail down how well the model performs as a possible
representation of the world for the system nor does it help make
any sort of case for using anything more esoteric like OWL or CL
in lieu of simple one-off JSON /XML or DB ER representations.
The world of no-SQL makes delaying ontological decision making
even easier since I'm burdened less by the persistence layer
(though I still have to map the business logic). I'm drawn to
the concept of A|B and usability testing for models to see if
different models help users to better answer questions about the
the domain or derive deeper insights. We often tweak the UI, but
how do we capture similar KR feedback and tweak the model? How
do we test various alternative representations? Also, I would
argue that often the representation isn't the derived
requirement, but rather that the representation is fairly
central to many systems while the UI and other implementation
details are actually the derived requirements. I believe that
many of our newer web-based systems are effectively knowledge
systems helping represent the world for us to aid in our daily
lives and decision making more than they are brick-and-mortar,
meatspace applications. Yet, while we talk about usability quite
a bit, we don't really focus on the evaluation of
representativeness, insightfulness, etc.
Welcome to the world of assessment. There are a few general purpose
tools but the questions being asked often exceed the capability of
the tools. And management doesn't always appreciate the amount of
work needed to get the answers they need. The developer (and test
designer) much be a fairly good toolsmith and able to find the
testing solution that is tailored to the scope of the project. That
is why languages such as R and Python have come about that offer
quick, incremental testing. The goal for these types of project is
not to find all the errors in the system, but to find the level of
errors that is appropriate to the project. Commercial electronics
test systems are typically designed to find 97-98% - of
manufacturing bugs. Those last few percent are found by the consumer
at a lower cost. Still, occasionally, the consumer uses the device
in an unexpected way that reveal bugs in an unexpected way. The way
to address this is through α and
Β
testing using groups of increasing size.
The problem statement
starts the design process and shapes the subsequent stages
of the process. If you have an idea of the types of
problem you have in mind it will aid in the discussion.
Right now I'm dealing
with projects where the model is the product which is what's
making things complicated from an evaluation perspective.
The use case includes a variety of new schemas and standards
for sharing computer security information. The schemas are
well thought out and comprehensive. But since a generalized
model is the product, I'm not sure I have the tools/approach
to evaluate the model without re-inventing it as I go. This
means that I have to sort out theoretical use cases and
requirements for the evaluation of the schema which is fine.
The tricker part is populating the model with anything
resembling realistic data or a real use case. This is
troubling because at that point am I evaluating the model as
a possible representation of the world or am I evaluating
how difficult the model is to populate?
This is a common problem with pushing the edges of new technology.
If you are working beyond the proximal zone the tools are never
there that you need. If you can identify other groups (physics labs
or similar) you might be able to set up a consortium that can
contribute resources for common well-defined problems. That is one
of the benefits of speaking at conferences to describe the work you
need to do. You can also lobby in forums for assistance. The free
software community likes to see a core application or demonstration
before others consider it useful to cooperate.
-John Bottoms
FirstStar Systems
Concord, MA
Cheers,
Marcos
=-=-=-=
Marcos Osorno
JHU/APL
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (01)
|