Great note! (01)
Ron
On 02/12/2012 10:47 AM, John F Sowa wrote:
> Dear Matthew,
>
> MW
>> OK, so based on this list, big data is mostly about massively parallel
>> data warehousing and the low level technologies that support various
>> approaches to this, particularly on cheap hardware.
> I agree that many of the terms in the list suggest that conclusion --
> the terms 'Hadoop', 'map/reduce', and the names of software designed
> to process huge amounts of data. Map/reduce is an algorithm published
> by Google, and Hadoop is an open source implementation by Yahoo.
>
> But Google and Yahoo process huge amounts of web data. Cost/performance
> is very important for them, but so is semantics. They also process,
> store, and analyze the links in and among web pages and their contents.
>
> The letter D is represented by Data warehousing. But note slide 5:
>> But as data volumes explode, data warehouse systems are rapidly
>> changing. They need to store more data -- and more kinds of data
>> -- making their management a challenge.
>
>http://www.crn.com/slide-shows/data-center/240142568/big-data-buzzwords-from-a-to-z.htm?pgno=5
>
> MW
>> In addition, analysis of unstructured data is thrown in as well, but
>> I guess that is just an input to the data warehousing.
> Note the terms 'text analytics', 'geospatial analysis', 'quantitative
> data analytics', and 'visualization'. They are definitely concerned
> about the semantics of everything in the warehouse.
>
> As slide 5 says, data warehousing means much more than it did when
> the term was introduced 25 years ago. Google and Yahoo, for example,
> have enormous data warehouses, but each web page has as much semantic
> information about its content as they can derive from it and from
> all the pages it's linked to and from.
>
> Note all the terms that refer to databases: 'columnar database',
> 'NoSQL', and 'relational database'. The term 'extract, transform,
> and load' addresses the issues of aligning and mapping independent
> databases. Note 'sharding' for partitioning databases -- that
> requires a lot of semantics about a database and how it's used.
> Note 'Whirr' for "running libraries for data cloud services".
>
> Note Kafka -- a messaging system developed by LinkedIn and
> contributed to the Apache Foundation. LinkedIn maintains a lot
> of semantic information about their members and their interests.
>
> The tools contributed to the Apache Foundation are developed and used
> by multibillion dollar corporations. They hire the best graduates from
> the best universities. Compared to them, the academics who designed
> OWL and SPARQL are amateurs who don't understand the problems.
>
> I don't believe we should dump everything developed by the SW.
> But if the academics want mainstream IT to adopt their toys, they
> have to understand the problems of mainstream IT. They can start
> by studying what industry does now and needs to do in the future.
>
> John
>
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>
> (02)
--
Ron Wheeler
President
Artifact Software Inc
email: rwheeler@xxxxxxxxxxxxxxxxxxxxx
skype: ronaldmwheeler
phone: 866-970-2435, ext 102 (03)
_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J (04)
|