Where (if?) in the spectrum of this discussion does "unnatural language" fit?
"Natural language" obviously has a 50+ year (just in my memory) run. In the beginning the driving motivation was the Cold War & translating Russian documents into English. Both the originating & ending documents were intended for human consumption. A basic foundation, then & now, was fundamentally word counting... extracting statistics from large corpus of text to define/infer meaning.
By "unnatural language" I mean the language used inside our computer systems, where there are major differences from normal, human centric documents:
- most commercial software is never subjected to peer-review (e.g. being read/reviewed/revised for readability). If code works, the only person to look at it will be the programmer(s). While it is widely known in the software profession that peer-reviews of code is tremendously productive, it is not a widely practiced discipline.
- software is a "language" written to work, not to be read for understanding
- for a variety of often non-negotiable "reasons" the nouns in software can be short & cryptic. To a software program, M0760 and MENSA-FL are equally meaningful. A human may find such words/nouns/labels a tad on the opaque side.
- the corpus of software text for an application isn't going to be even remotely close to statistical significance that would make MT (machine translation) tools happy
- a "typical application" (in an organization) is written in 6+ software languages
- software, like financial "management", is a fashion business. COBOL & FORTRAN are "out" (although still in widespread use) and Java and Python are "in." In 20 years Java & Python will be out (although still in widespread use in legacy—e.g. working—systems) & something else will be in.
- the language drift issue... EAM --> ADP --> EDP --> MIS --> IS --> IT --> ?????..... Point being... when I open a 25 year old piece of code, how do I know what the cryptic words mean?
- the chasm between business intent & working code is broad. By the time someone gets around to writing code, most of the context has been consciously stripped away
THAT's what I mean by "unnatural language."
Isn't a major objective here to make it easier/more effective to get data (not the same as information) out of systems & then combine said data in new & useful ways?