At 11:46 PM +0700 2/14/08, paola.dimaio@xxxxxxxxx wrote:
>Thanks Matthew
>sounds like simple rules then? how does the system know if a word is a
>proper name though, does it use an ontology or taxonomy as a
>reference, I ll look into it (01)
No, it uses simple criteria based essentially on
statistical information. Proper names tend to be
capitalized, may have initials (recognizable by
the dot convention) and are either drawn from a
known list (Smith, Warren, etc.) or are not
English words, etc.. Also it uses clues from
surrounding words when possible, eg "The
commander of the fort, Captain Arlington,
said..." tells you that there is a person
(because only people can be commanders: a small
piece of ontological common sense) whose surname
is "Arlington" (because when (Western) people are
referred to by one name, its their surname:
another piece) , all because the construction
"The <a>, <b>, <verb> " is used when <b> is
coreferential with <The <a>>. These tools have
large corpora of rules like this (not 'simple' !)
which they use to decide which matches are
likeliest. They are very complex AI systems, the
state of the text-comprehender's art at present. (02)
Pat (03)
>
>p
>
>
>On 2/14/08, matthew.west@xxxxxxxxx <matthew.west@xxxxxxxxx> wrote:
>> Dear Paola,
>>
>> > >if you consider this sort of thing to be a tool for computer
>> > > assisted ontology development, then it can be very helpful,
>> > particularly
>> > > where we are talking about extracting brute facts. However,
>> > if you are
>> > > talking about more general ontology extraction, then of
>> > course the ontolgoy
>> > > produced is going to be no better than that of the document
>> > considered, and
>> > > there are the usual issues with ambiguity that computers
>> > usually struggle
>> > > with, especially using words with different senses in close
>> > proximity.
>> > >
>> >
>> > I agree, and think that such functionality can be useful to aid
>> > concept extraction for further refinement. How does the system
>> > identify the classes, objects and relations I dont understand, do they
>> > have to be in RDF? Or how does it do it? (haven't read the
>> > documentation)
>>
>> MW: What I have seen is a kind of NLP parsing, so nouns are classes,
>> proper names are individuals, spotting patterns for names of people,
>> addresses, and company names, dates, and then understanding the patterns
>> around certain key words, so when you see something like:
>> "Shell bought XYZ co from ABC Corp on July 17th 2006"
>> it can create the appropriate records of the activity, when it happened
>> and who the parties were involved.
>>
>> Regards
>>
>> Matthew West
>> Reference Data Architecture and Standards Manager
>> Shell International Petroleum Company Limited
>> Registered in England and Wales
>> Registered number: 621148
>> Registered office: Shell Centre, London SE1 7NA, United Kingdom
>>
>> Tel: +44 20 7934 4490 Mobile: +44 7796 336538
>> Email: matthew.west@xxxxxxxxx
>> http://www.shell.com
>> http://www.matthew-west.org.uk/
>>
>> >
>> >
>> > Paola Di Maio
>> >
>> > _________________________________________________________________
>> > Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>> > Subscribe/Config:
>> http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
>> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
>> Shared Files: http://ontolog.cim3.net/file/
>> Community Wiki: http://ontolog.cim3.net/wiki/
>> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>>
>>
>>
>>
>> _________________________________________________________________
>> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>> Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
>> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
>> Shared Files: http://ontolog.cim3.net/file/
>> Community Wiki: http://ontolog.cim3.net/wiki/
>> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>>
>>
>
>
>--
>Paola Di Maio
>School of IT
>www.mfu.ac.th
>*********************************************
>
>_________________________________________________________________
>Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
>Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
>Shared Files: http://ontolog.cim3.net/file/
>Community Wiki: http://ontolog.cim3.net/wiki/
>To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
> (04)
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 cell
http://www.ihmc.us/users/phayes phayesAT-SIGNihmc.us
http://www.flickr.com/pathayes/collections (05)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Subscribe/Config: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx (06)
|