ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Unit testing and usability validation of schemas and

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: John Bottoms <john@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 21 May 2013 10:20:39 -0400
Message-id: <519B82B7.20601@xxxxxxxxxxxxxxxxxxxx>
Marcos,
(comments below)

On 5/21/2013 12:31 AM, Osorno, Marcos wrote:
John,

Thank you for the insights.

Great question and one that has had my interest for quite a while.
Testing usually centers on two critical elements, 1.) what are the requirements that must be met by the system, and 2.) is the testing to the spec. done by: a.) Inspection, b.) demonstration, or c.) test.

Those certainly seem like good categories.  It seems like I encounter two general categories of systems: (1) generalized knowledge systems (like Wikipedia) and (2) domain specific applications of ontologies/schemas within other applications (like Yelp or Google Maps). In the first case, I'd be curious about general fundamentals for evaluating the requirements and specifications of a generalized KR system. I think the second case is more difficult because it requires analysis of the role of the ontology within the context of the domain.
It sounds like you the systems you are dealing with are deterministic, that is as long as you specify sufficient requirements to only get one answer to a question. In a GPS system you can get an answer you didn't expect if you haven't specified some constraining metric such as "no highways". If that is the goal then there needs to be a concerted effort to identify all the metrics that constrain the possible types of responses. In this case, the answer to your question can be restated as a statistics problem: "How many random tests do I need to perform for a specified confidence factor." My view is that as systems become more and more complex, we need to match the complexity of the testing to one that "makes sense" for the volume of data being processed. With the most complex data sets I've worked on, which is on the order to 150 million points of dirty data, a psychometrician was involved to make estimates and craft the statistics processing. (And with humans, we use a therapist which is familiar with abnormal behavior, and they are still fighting over the diagnostic process.)

The questions you ask are answered differently depending on these two primary issues. For example, your question about KR is best answered as addressing it as a derived requirement. The design approach addresses the problem and may select one of a number of different types of KR. (A solid reason I prefer to talk about data structures and methods, rather than implementation languages.)

Lately, I've been thinking about it in similar ways but more closely related to engineering cost: availability of libraries, complexity of supporting code, complexity of join operations, availability of support IT staff/developers, complexity of backend support systems, etc. However, that still doesn't really help me nail down how well the model performs as a possible representation of the world for the system nor does it help make any sort of case for using anything more esoteric like OWL or CL in lieu of simple one-off JSON /XML  or DB ER representations. The world of no-SQL makes delaying ontological decision making even easier since I'm burdened less by the persistence layer (though I still have to map the business logic). I'm drawn to the concept of A|B and usability testing for models to see if different models help users to better answer questions about the the domain or derive deeper insights. We often tweak the UI, but how do we capture similar KR feedback and tweak the model? How do we test various alternative representations? Also, I would argue that often the representation isn't the derived requirement, but rather that the representation is fairly central to many systems while the UI and other implementation details are actually the derived requirements. I believe that many of our newer web-based systems are effectively knowledge systems helping represent the world for us to aid in our daily lives and decision making more than they are brick-and-mortar, meatspace applications. Yet, while we talk about usability quite a bit, we don't really focus on the evaluation of representativeness, insightfulness, etc.
Welcome to the world of assessment. There are a few general purpose tools but the questions being asked often exceed the capability of the tools. And management doesn't always appreciate the amount of work needed to get the answers they need. The developer (and test designer) much be a fairly good toolsmith and able to find the testing solution that is tailored to the scope of the project. That is why languages such as R and Python have come about that offer quick, incremental testing. The goal for these types of project is not to find all the errors in the system, but to find the level of errors that is appropriate to the project. Commercial electronics test systems are typically designed to find 97-98% - of manufacturing bugs. Those last few percent are found by the consumer at a lower cost. Still, occasionally, the consumer uses the device in an unexpected way that reveal bugs in an unexpected way. The way to address this is through α and Β testing using groups of increasing size.

The problem statement starts the design process and shapes the subsequent stages of the process. If you have an idea of the types of problem you have in mind it will aid in the discussion.

Right now I'm dealing with projects where the model is the product which is what's making things complicated from an evaluation perspective. The use case includes a variety of new schemas and standards for sharing computer security information. The schemas are well thought out and comprehensive. But since a generalized model is the product, I'm not sure I have the tools/approach to evaluate the model without re-inventing it as I go. This means that I have to sort out theoretical use cases and requirements for the evaluation of the schema which is fine. The tricker part is populating the model with anything resembling realistic data or a real use case. This is troubling because at that point am I evaluating the model as a possible representation of the world or am I evaluating how difficult the model is to populate?
This is a common problem with pushing the edges of new technology. If you are working beyond the proximal zone the tools are never there that you need. If you can identify other groups (physics labs or similar) you might be able to set up a consortium that can contribute resources for common well-defined problems. That is one of the benefits of speaking at conferences to describe the work you need to do. You can also lobby in forums for assistance. The free software community likes to see a core application or demonstration before others consider it useful to cooperate.

-John Bottoms
 FirstStar Systems
 Concord, MA

Cheers,

Marcos
=-=-=-=
Marcos Osorno
JHU/APL


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>