[Top] [All Lists]

Re: [ontolog-forum] unnatural language

To: "[ontolog-forum] " <ontolog-forum@xxxxxxxxxxxxxxxx>
From: David Eddy <deddy@xxxxxxxxxxxxx>
Date: Fri, 26 Mar 2010 11:27:19 -0400
Message-id: <96E1EB29-0CB0-488F-A08F-79F3AB94DF7A@xxxxxxxxxxxxx>
All -

Where (if?) in the spectrum of this discussion does "unnatural language" fit?

"Natural language" obviously has a 50+ year (just in my memory) run.  In the beginning the driving motivation was the Cold War & translating Russian documents into English.  Both the originating & ending documents were intended for human consumption.  A basic foundation, then & now, was fundamentally word counting... extracting statistics from large corpus of text to define/infer meaning.

By "unnatural language" I mean the language used inside our computer systems, where there are major differences from normal, human centric documents:

- most commercial software is never subjected to peer-review (e.g. being read/reviewed/revised for readability).  If code works, the only person to look at it will be the programmer(s).  While it is widely known in the software profession that peer-reviews of code is tremendously productive, it is not a widely practiced discipline.

- software is a "language" written to work, not to be read for understanding

- for a variety of often non-negotiable "reasons" the nouns in software can be short & cryptic.  To a software program, M0760 and MENSA-FL are equally meaningful.  A human may find such words/nouns/labels a tad on the opaque side.

- the corpus of software text for an application isn't going to be even remotely close to statistical significance that would make MT (machine translation) tools happy

- a "typical application" (in an organization) is written in 6+ software languages

- software, like financial "management", is a fashion business.  COBOL & FORTRAN are "out" (although still in widespread use) and Java and Python are "in."  In 20 years Java & Python will be out (although still in widespread use in legacy—e.g. working—systems) & something else will be in.

- the language drift issue... EAM --> ADP --> EDP --> MIS --> IS --> IT --> ?????.....  Point being... when I open a 25 year old piece of code, how do I know what the cryptic words mean?

- the chasm between business intent & working code is broad.  By the time someone gets around to writing code, most of the context has been consciously stripped away

THAT's what I mean by "unnatural language."

Isn't a major objective here to make it easier/more effective to get data (not the same as information) out of systems & then combine said data in new & useful ways?

David Eddy


Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>