ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] Average Daily Word Exposure

To: "[ontolog-forum]" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: Ali Hashemi <ali@xxxxxxxxx>
Date: Thu, 12 Aug 2010 14:41:58 -0400
Message-id: <AANLkTinCLwQuz02iS62c-XfQ0ciS0=zCReP4+whzLryi@xxxxxxxxxxxxxx>
Hi John,

The BNC does not boast of 100 million unique words, but of 100 million words that include repetitions. Depending on what numbers you use for an average novel, that's 750-2000 novels.

When I said exposed, I meant just that -- the amount of words that our visual / auditory cortices are exposed to, but perhaps the rest of our brains don't consciously process. So if I'm in a busy cafeteria, all the words that are picked up by my ears (mic), if I'm walking down the street, all the street signs, adverts, etc. I happen to glance at (video'd).

But really, I'd be happy with any study which has an explicated methodology for daily word counts in any communication modality regardless of whether the words were exposed / processed / attended, so long as it is stated which. And preferably in English or a non-agglutinative language - I'm not sure how useful it would be for a language like German.

Indeed, given that an experiment such as the one mentioned above won't be happening any time soon, I'm just hoping to piece together other studies (if any) to get a sense of what 100 million words means in terms of the amount of time it would take for a single individual to "naturally" be exposed to / process etc. such a corpus.

Best,
Ali

On Thu, Aug 12, 2010 at 1:36 PM, John Bottoms <john@xxxxxxxxxxxxxxxxxx> wrote:
Ali,

I think you need to qualify it even further. What do you mean
by "exposed"? If I pick up a newspaper am I exposed? If the
telly is on but I'm not listening, am I exposed?

One way to bound the task is to look at human word processing
capabilities. It maxes out at around 400 wpm on average, despite
what your reading teacher told you. And the average person reads,
listens and comprehends to just a few hours at that rate each
day. The scenario you select would then tell you the gross number
of words of exposure per day. But most of those contain repeats,
and cannot be compared to the unique words in a corpus.

Further, your question about the BNC corpus should be clarified
to describe how a metric could be applied to compare it to your
daily word exposure list or count.

-John Bottoms
 FirstStar
 Concord, MA USA
 T: 978-505-9878

On 8/12/2010 11:55 AM, Ali Hashemi wrote:
> Hi John,
>
> Thanks for the reply. Your response made me realize that I need to
> clarify my question.
>
> I'm not looking just for unique words / vocabulary. This question is
> motivated by BNC's 100 million word corpus and trying to get a sense of
> what that number actually means.
>
> How many days worth of word exposure is the BNC? Though to be more
> precise, see the earlier question phrasing below. (And yes, I know the
> BNC covers many different domains, while most people stay within a few
> domains over a period of weeks.)
>
> Had I the resources and funding to actually conduct an experiment, I
> would probably mic and/or video-camera N people for X days, and then do
> an average word count based on that. (Obviously trying to control for
> population demographics, and seasonal changes in behaviour among other
> confounding factors...)
>
> I also suspect that much has changed since Ogden's day. We net, we watch
> TV, we phone, we walk by billboards, subway ads, hear conversations in
> the street etc. etc. and we occasionally still have face-to-face
> conversations :P.
>
> So I'm not looking for average daily vocabulary count (though that data
> would derivable from the above experiment), but for the more expansive
> average daily word exposure count.
>
> I wake up, and begin my interaction with the world. By the time I go to
> sleep, how many /words/ has my brain consciously / unconsciously been
> exposed to?
>
> Cheers,
> Ali
>
> On Thu, Aug 12, 2010 at 11:38 AM, John Bottoms <john@xxxxxxxxxxxxxxxxxx
> <mailto:john@xxxxxxxxxxxxxxxxxx>> wrote:
>
>     Ali,
>
>     That is an old and classic question. Charles Kay Ogden addressed
>     it when he was exploring the concepts underpinning "Basic English"
>     in 1925. He arrived at a multi-tiered solution with a core
>     vocabulary of 850 words and 0 or more technical vocabularies
>     on top of that.
>
>     So, you probably knew that, but it does point out a number of
>     issues. Ogden was working with rural settlements at an earlier
>     time than today. Common English was not nearly as rich as today,
>     and cultures were not as highly mixed.
>
>     Short of replicating his work your best approach might be to
>     select a corpus of representative works minus an appropriate
>     number of noise words, and see what comes of it. We use 100 and
>     500 word noise lists, and you can readily find lists on the web.
>     We have found that 6000-8000 words is fairly typical but your
>     mileage may vary. I'm sure the context which calls for a number
>     of technical vocabularies will be the determining factor.
>
>     David Eddy maintains a list of "common" usage terms for computing,
>     such as ZIP, SS#, employeeID, for computing and it contains a
>     brazillion entries.
>
>     -John Bottoms
>       FirstStar
>       Concord, MA USA
>       T: 978-505-9878
>
>     On 8/12/2010 10:43 AM, Ali Hashemi wrote:
>      > Hi All,
>      >
>      > I've been digging around for the past few days and have hit a
>     dead end.
>      >
>      > I'm looking for the average number of words that an average western
>      > adult is exposed to daily.
>      >
>      > Any combination of words:
>      > spoken
>      > heard
>      > read
>      > seen
>      > would help.
>      >
>      > The main "sources" I've found are via:
>      >
>     http://www.boston.com/news/globe/ideas/articles/2006/09/24/sex_on_the_brain/
>      > which focus on words spoken. And as that article notes (and I've
>      > confirmed for the studies I've been able to track down), almost
>     none of
>      > those "sources" actually cite or indicate how the word count was
>     derived.
>      >
>      > Does anyone have any leads on something a bit more substantive?
>     And on
>      > anything that g oes beyond spoken to also heard + read + seen?
>      >
>      > Thanks,
>      > Ali
>      >
>      > --
>      > www.reseed.ca <http://www.reseed.ca> <http://www.reseed.ca>
>      > www.pinkarmy.org <http://www.pinkarmy.org> <http://www.pinkarmy.org>
>      >
>      > (•`'·.¸(`'·.¸(•)¸.·'´)¸.·'´•) .,.,
>      >
>      >
>      >
>      >
>      > _________________________________________________________________
>      > Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>      > Config Subscr:
>     http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
>      > Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
>     <mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx>
>      > Shared Files: http://ontolog.cim3.net/file/
>      > Community Wiki: http://ontolog.cim3.net/wiki/
>      > To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>      > To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>     <mailto:ontolog-forum@xxxxxxxxxxxxxxxx>
>      >
>
>     _________________________________________________________________
>     Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
>     Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
>     Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
>     <mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx>
>     Shared Files: http://ontolog.cim3.net/file/
>     Community Wiki: http://ontolog.cim3.net/wiki/
>     To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>     To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>     <mailto:ontolog-forum@xxxxxxxxxxxxxxxx>
>
>
>
>
> --
> www.reseed.ca <http://www.reseed.ca>
> www.pinkarmy.org <http://www.pinkarmy.org>
>
> (•`'·.¸(`'·.¸(•)¸.·'´)¸.·'´•) .,.,
>
>
>
>
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
> To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx
>

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx




--
www.reseed.ca
www.pinkarmy.org

(•`'·.¸(`'·.¸(•)¸.·'´)¸.·'´•) .,.,

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
To Post: mailto:ontolog-forum@xxxxxxxxxxxxxxxx    (01)

<Prev in Thread] Current Thread [Next in Thread>