ontolog-forum
[Top] [All Lists]

[ontolog-forum] So you want to be a Data Scientist?

To: "'[ontolog-forum] '" <ontolog-forum@xxxxxxxxxxxxxxxx>
From: John F Sowa <sowa@xxxxxxxxxxx>
Date: Fri, 28 Dec 2012 10:48:11 -0500
Message-id: <50DDBF3B.4090208@xxxxxxxxxxx>
That's the title of an article by Charles Rose:    (01)

    http://www.dataversity.net/so-you-want-to-be-a-data-scientist/    (02)

The skills required for data scientists have a high overlap with skills
that would be very useful for ontologists.  Some excerpts:    (03)

CR
> Do you ever wonder what Facebook does with all those likes that
> people give and receive? Or how Netflix figures out exactly what
> movies to recommend to you? Or how Google can infer exactly what
> you are trying to search for as soon as you begin typing something
> into the search box? What about the ads on LinkedIn that relates
> directly to your profile, the music lists in iTunes and any other
> scores of connections that just happen? Those examples are just
> a few of the multitudes of data-related instances constantly being
> collected and analyzed all the time.    (04)

Rose quotes people he calls "some of the biggest names in the data
management field" who define what a data scientist does:    (05)

> “Data scientists are part digital trend spotter and part storyteller
> stitching various pieces of information together. These are people
> or teams at organizations that sift through the explosion of data to
> discover what the data is telling them.” Anjul Bhambhri, Vice President
> of Big Data Products at IBM.    (06)

IBM has a VP of Big Data Products, but not a VP of ontology or the 
Semantic Web.    (07)

> “A data scientist is that unique blend of skills that can both unlock
> the insights of data and tell a fantastic story via the data.”
> Dr. DJ Patil is a Data Scientist in Residence at Greylock Partners,
> as well as the former Chief Scientist, Chief Security Officer and
> Head of Analytics and Data Teams at the LinkedIn Corporation.    (08)

They have "Data Teams", but not ontology teams.    (09)

> “A data scientist is a rare hybrid, a computer scientist with the
> programming abilities to build software to scrape, combine, and manage
> data from a variety of sources and a statistician who knows how to
> derive insights from the information within. S/he combines the skills
> to create new prototypes with the creativity and thoroughness to ask and
> answer the deepest questions about the data and what secrets it holds.”
> Jake Porway, Data without Borders and New York Times.    (010)

Note the emphasis on integrating theory and practice.  One ontologist
who contributes to this forum claimed that "tools aren't interesting".
That attitude helps explain why people who have real work to do don't
find ontology interesting.    (011)

> Data scientists are “analytically-minded, statistically and
> mathematically sophisticated data engineers who can infer insights
> about business and other complex systems from large quantities of data.”
> Steve Hillion, Vice President of Analytics at EMC Greenplum.    (012)

Neither Charles Rose nor the people he quotes mention logic, ontology,
the Semantic Web, or any of the SW notations and tools.    (013)

CR
> What does it take to be a Data Scientist?
> 1. Mathematics: Data Scientists must be competent mathematicians...
> 2. Statistical Analysis: A strong knowledge of R, SAS, SciPy, Stata, SPSS...
> 3. Programming/Scripting Languages: ... C/C++, Java, PHP, Ruby, Perl, 
>Python...
> 4. Relational Databases: Know your way around SQL-based systems...
> 5. Distributed Computing Systems and Tools: NoSQL platforms...
> 6. Data Mining: Learn the primary tools used in Data Mining today...
> 7. Data Modeling: ... able to understand the models, present them to
>    C-Level Executives and ... the many modeling tools/techniques/methodologies
>    such as ERWin, Agile, ORM diagrams, UML class diagrams, CRC cards,
>    conceptual/logical/physical schema, DDL, Bachman diagrams, Zachman 
>Framework...
> 8. Visualization: ... tools such as Flare, HighCharts, AmCharts, D3.js,
>    Google Visualization API, Raphael.js ...  Data Scientists have to tell
>    a story with their data; they must provide a data narrative that anyone
>    in the enterprise can follow, understand and utilize.
> 9. Creativity and Innovation: ... Data Scientists must be able to innovate
>    the collection, analysis and usage of data ... in novel  and fantastic
>    ways so all that “enterprise-critical” data is put to advantageous use.
> 10. Communication and Business Perspicacity: Data Scientists are crossbreeds,
>    the amalgamation of IT expertise and business smarts...
> 11. Education: ... Math, Statistics, Computer Science, Engineering or some
>    other related technical field... A BS in one  of those fields is a must and
>    an MS shows the ability to work within a system, complete tasks with 
>deadlines
>    and a background in theoretical principles. Add to that MS many years of
>    experience in the field...    (014)

That web site also has links to a talk on "Practical Data Modeling"
by Peter Aiken:    (015)

http://www.dataversity.net/data-ed-slides-practical-data-modeling/9792/    (016)

Type 50 in the box to jump to slide 50, which summarizes "7 mistakes
you can't afford to make in enterprise data modeling."  Ontologists
who want to make ontology useful should also avoid those mistakes.    (017)

John    (018)

_________________________________________________________________
Message Archives: http://ontolog.cim3.net/forum/ontolog-forum/  
Config Subscr: http://ontolog.cim3.net/mailman/listinfo/ontolog-forum/  
Unsubscribe: mailto:ontolog-forum-leave@xxxxxxxxxxxxxxxx
Shared Files: http://ontolog.cim3.net/file/
Community Wiki: http://ontolog.cim3.net/wiki/ 
To join: http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage#nid1J    (019)

<Prev in Thread] Current Thread [Next in Thread>