ontolog-forum
[Top] [All Lists]

Re: [ontolog-forum] So you want to be a Data Scientist?

To: ontolog-forum@xxxxxxxxxxxxxxxx
From: Kingsley Idehen <kidehen@xxxxxxxxxxxxxx>
Date: Fri, 28 Dec 2012 15:18:38 -0500
Message-id: <50DDFE9E.7030509@xxxxxxxxxxxxxx>
On 12/28/12 10:48 AM, John F Sowa wrote:
> That's the title of an article by Charles Rose:
>
>      http://www.dataversity.net/so-you-want-to-be-a-data-scientist/
>
> The skills required for data scientists have a high overlap with skills
> that would be very useful for ontologists.  Some excerpts:
>
> CR
>> Do you ever wonder what Facebook does with all those likes that
>> people give and receive? Or how Netflix figures out exactly what
>> movies to recommend to you? Or how Google can infer exactly what
>> you are trying to search for as soon as you begin typing something
>> into the search box? What about the ads on LinkedIn that relates
>> directly to your profile, the music lists in iTunes and any other
>> scores of connections that just happen? Those examples are just
>> a few of the multitudes of data-related instances constantly being
>> collected and analyzed all the time.
> Rose quotes people he calls "some of the biggest names in the data
> management field" who define what a data scientist does:
>
>> “Data scientists are part digital trend spotter and part storyteller
>> stitching various pieces of information together. These are people
>> or teams at organizations that sift through the explosion of data to
>> discover what the data is telling them.” Anjul Bhambhri, Vice President
>> of Big Data Products at IBM.
> IBM has a VP of Big Data Products, but not a VP of ontology or the
> Semantic Web.
>
>> “A data scientist is that unique blend of skills that can both unlock
>> the insights of data and tell a fantastic story via the data.”
>> Dr. DJ Patil is a Data Scientist in Residence at Greylock Partners,
>> as well as the former Chief Scientist, Chief Security Officer and
>> Head of Analytics and Data Teams at the LinkedIn Corporation.
> They have "Data Teams", but not ontology teams.
>
>> “A data scientist is a rare hybrid, a computer scientist with the
>> programming abilities to build software to scrape, combine, and manage
>> data from a variety of sources and a statistician who knows how to
>> derive insights from the information within. S/he combines the skills
>> to create new prototypes with the creativity and thoroughness to ask and
>> answer the deepest questions about the data and what secrets it holds.”
>> Jake Porway, Data without Borders and New York Times.
> Note the emphasis on integrating theory and practice.  One ontologist
> who contributes to this forum claimed that "tools aren't interesting".
> That attitude helps explain why people who have real work to do don't
> find ontology interesting.
>
>> Data scientists are “analytically-minded, statistically and
>> mathematically sophisticated data engineers who can infer insights
>> about business and other complex systems from large quantities of data.”
>> Steve Hillion, Vice President of Analytics at EMC Greenplum.
> Neither Charles Rose nor the people he quotes mention logic, ontology,
> the Semantic Web, or any of the SW notations and tools.
>
> CR
>> What does it take to be a Data Scientist?
>> 1. Mathematics: Data Scientists must be competent mathematicians...
>> 2. Statistical Analysis: A strong knowledge of R, SAS, SciPy, Stata, SPSS...
>> 3. Programming/Scripting Languages: ... C/C++, Java, PHP, Ruby, Perl, 
>Python...
>> 4. Relational Databases: Know your way around SQL-based systems...
>> 5. Distributed Computing Systems and Tools: NoSQL platforms...
>> 6. Data Mining: Learn the primary tools used in Data Mining today...
>> 7. Data Modeling: ... able to understand the models, present them to
>>     C-Level Executives and ... the many modeling 
>tools/techniques/methodologies
>>     such as ERWin, Agile, ORM diagrams, UML class diagrams, CRC cards,
>>     conceptual/logical/physical schema, DDL, Bachman diagrams, Zachman 
>Framework...
>> 8. Visualization: ... tools such as Flare, HighCharts, AmCharts, D3.js,
>>     Google Visualization API, Raphael.js ...  Data Scientists have to tell
>>     a story with their data; they must provide a data narrative that anyone
>>     in the enterprise can follow, understand and utilize.
>> 9. Creativity and Innovation: ... Data Scientists must be able to innovate
>>     the collection, analysis and usage of data ... in novel  and fantastic
>>     ways so all that “enterprise-critical” data is put to advantageous use.
>> 10. Communication and Business Perspicacity: Data Scientists are crossbreeds,
>>     the amalgamation of IT expertise and business smarts...
>> 11. Education: ... Math, Statistics, Computer Science, Engineering or some
>>     other related technical field... A BS in one  of those fields is a must 
>and
>>     an MS shows the ability to work within a system, complete tasks with 
>deadlines
>>     and a background in theoretical principles. Add to that MS many years of
>>     experience in the field...
> That web site also has links to a talk on "Practical Data Modeling"
> by Peter Aiken:
>
> http://www.dataversity.net/data-ed-slides-practical-data-modeling/9792/
>
> Type 50 in the box to jump to slide 50, which summarizes "7 mistakes
> you can't afford to make in enterprise data modeling."  Ontologists
> who want to make ontology useful should also avoid those mistakes.
>
> John
>   
> _________________________________________________________________
> Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/
> Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/
> Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
> Shared Files: http://ontolog.cim3.net/file/
> Community Wiki: http://ontolog.cim3.net/wiki/
> To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J
>   
>
>
John,    (01)

As per usual, I agree with your core insights and key message. That 
said, I would like to add that "Big Data" and "Data Scientists" are 
buzzwords *primarily* conjured up by marketeers who lack profound 
understanding of the fundamental challenges that have dogged data since 
the inception of modern computing. For instance, they are ultimately 
peddling data silos because they just can't comprehend the deeper 
virtues associated with open data access, integration and dissemination 
(in the right form, to the right people or machines, at the optimal time).    (02)

BTW -- building on you Slideshare reference:    (03)

1. 
http://www.slideshare.net/Dataversity/dataed-online-a-practical-approach-to-data-modeling-12007838/50    (04)

-- that's a hyperlink that denotes the specific slide you mentioned 
(Data Scientists still don't appreciate the value of Hyperlinks as 
denotation mechanisms and super keys    (05)

2. 
http://linkeddata.uriburner.com/about/id/entity/http/www.slideshare.net/Dataversity/dataed-online-a-practical-approach-to-data-modeling-12007838    (06)

-- link (in the form of a Linked Data URI style of super key ) that 
denotes the actual slideshare hosted presentation    (07)

3. http://bit.ly/TuVFkW -- entity description page that lends itself to 
follow-your-nose exploration through a Linked Data based Entity 
Relationship Graph    (08)

4. http://bit.ly/RlTth3 -- presentations grouped by topic .    (09)

Personally, I think we might as well add the moniker "Data Artist" to 
the mix. This would define those that are interested in remixing 
heterogeneous data sources, and like artists do, share those remixes by 
leveraging existing standards such as those that drive the Internet and 
World Wide Web :-)    (010)

--     (011)

Regards,    (012)

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen    (013)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (01)

<Prev in Thread] Current Thread [Next in Thread>