Republic of Mathematics blog

So you want to be a data scientist?

Posted by: Gary Ernest Davis on: May 6, 2012

Well, listen up.

Here are some a well known data scientists:

What do you need to know, and know how to do to be a data scientist, and hang with these cool folks?

First, you need to be able to hack and scrub data – lots and lots of data, usually messy.  To do this you’ll need to be familiar with a language such as Perl or Python, and keep an eye on the Julia language.

You should know how to work in the command line, in a Unix environment, to interact with APIs.

You need to know how to do a decent statistical analysis of data (again lots and lots of it). You should at least know how to carry out an exploratory data analysis, a regression (maybe even a loess regression) and design an experiment to test a hypothesis.

As part of your statistical background you should be fluent in R, and be up to speed with Python pandas.

You should know how to design and test algorithms, and be familiar with data mining and machine learning.

Database programming, of the SQL variety, should be your bread and butter.

Then you need to be very familiar with techniques of data visualization.

It would help if you knew how to carry out a simulation, and also knew something about Hadoop and Map Reduce, or – nowadays, High Performance Computer Cluster management.

If all that isn’t enough you need to be able to communicate a story really well.

If you lack some or all of these skills, you need to get up to speed by yourself or find someone, somewhere, to teach you.

This sounds like a lot, and it is – yet the work of a data scientist, is so interesting, so rewarding, and so important, that you’ll figure out how to do it.

Here’s links to the data scientists, above:

 

 

 

 

 

 

 

 

 

 

 

Leave a Reply