MatthewL,
Your original questions were very broad in scope. And they appear to
be types of Fermi questions that can only be answered, if at all,
with estimates based upon some assumptions. The DIK pyramid suggests
that data, information and knowledge may all be considered facts
with some relevance.
(http://en.wikipedia.org/wiki/DIKW)
You seem to have a good idea of what constitutes a fact and no one
can dispute your interests. In addressing a community the answers
will vary by participants. Also, your facts below mix physical facts
with conceptual facts which is fine, but broadens the scope of your
questions. I believe drilling down to qualify what a fact is, is
appropriate given your broad query. If you could share the basis for
your questions it might help.
Your questions have been of significant interest to me, although it
has been some time since I looked at them. In response to your
question, there is little scholarship in this area as far as I know.
But the field is so large that it is difficult to track the work. In
addition, the people who have knowledge about information stored on
websites are not free about sharing that information.
Google's existing dataset is on the order of 1 Petabyte, but I can't
verify that number.
(http://www.quirkeysolutions.com/index.php?option=com_content&view=article&id=103:how-big-is-google-s-database&catid=21&Itemid=224)
I know that at Harvard they are working with satellite systems
storing 1.4 TB per day. Are those facts? It is not uncommon to see
new data numbers of this magnitude for other systems.
Most of the research in this area traces back to a paper by Blair
and Maron. Salton also wrote on the subject but neither of these
works have been updated. Blair and Maron's findings were associated
with IBM's Stairs product and related to the query response rates of
large document sets. Their corpus was on the order of 40,000
documents. Their finding was that the information you search for is
a function of how broadly the query is stated. It appears that there
is a power curve associated with the distribution of responses to
queries of various specificities.
(http://deepblue.lib.umich.edu/bitstream/2027.42/28883/1/0000719.pdf)
You will find references to Salton's papers in this document. His
book is "Automatic Text Processing: The
Transformation Analysis and Retrieval of Information by Computer"
Returning to the question of how to respond to the Fermi question,
there may be ways to bound estimates based on the size of document
sets. If the facts to which you refer are of value at the university
level, and you assume universities share a common core set of facts,
then you might be able to track the changes in the size or budgets
of libraries. This would need to be done for corporate libraries as
well. Then it would be necessary to do the same for online documents
and private libraries including those of individual professionals.
This would be a significant undertaking. You might make an estimate
of size against Google's online size. You would also need to
determine what percentage is facts.
Another approach would be to estimate how many facts are created
each day or year. Again, relevance is important to weed out the
insignificant and unimportant facts.
-John Bottoms
FirstStar Systems
Concord, MA USA
On 5/24/2012 2:27 PM, matthew lange wrote:
I am really feeling like my thread has been hijacked
by people who like to read their own writing:> conjecture. I
have purposefully avoided quoting any one person--but you know who
you are.
Perhaps folks are afraid to read/respond to my real-world examples
of facts, or did my propositions just get lost in the list mud?
Here again are some examples of facts, I would be delighted if
someone would attempt to bound factual knowledge so that they
could be quantified--or otherwise provide succinct reasons
about why my examples are not facts.
Fact examples:
- The earth revolves around the sun.
-
The Greek letter Pi represents the irrational number that is
the ratio between a circle's circumference and diameter.
-
A calorie is the amount of energy it takes to raise the
temperature of 1cc of water 1 deg. C at sea level.
-
Chemical X contains Y calories of available energy. (of
course substituting where appropriate)
Are these not facts? Are they not countable?
Again, aside from bending the space-time continuum, or dismissing
laws of nature like thermodynamics...I fail to see the need for
relativism here...or, what am I missing? If you agree that these
are facts, then let's get pragmatic and enumerate the
properties/boundaries around the nature of a fact.
Also, I must express my displeasure with several members'
netiquette on this list:
1) In addition to Mr. West, my name is also Matthew (this is a
FACT)--please use unambiguous identifiers in responses
2) Spell Check is courteous--not a fact, but perhaps an opinion
shared by many--one or two misspelled words words I can
understand...but some of these posts are ridiculous.
Best,
~mc
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
|
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (01)
|