# Data analysis: is it easier in MathematicaÂ® ?

Posted by: Gary Ernest Davis on: June 12, 2015

I wrote a postÂ yesterday on defining functions in R for beginners. By “beginners” I meanÂ either people who are just learning R, or are just starting out in data analysis, or both.

Today I want toÂ show how this might beÂ easier Â to do in MathematicaÂ®.

First let’s define the function zsf[data,k] which calculates the proportion of data that lies within k standard deviations of the mean of the given data set:

zsf[data_, k_] := Length[Cases[data, x_ /; Abs[x – mean[data]] <= k*StandardDeviation[data]]]/Length[data]

The code “Cases[data, x_ /; Abs[x – mean[data]] <= k*StandardDeviation[data]]”Â keeps those instances, called x, of the data set that are withinÂ k standard deviations of the mean of the data.

As in R, we import the data as a text file from a URL:

The “List” option tells MathematicaÂ® to import the data string as a formatted (ordered) list, which in R would be seen as a vector.

We plot a histogram of the data:

Histogram[nosmokedata] We calculate the fraction of data that lies within 1 standard deviation of the mean and express that both as a fraction and a floating point number:

zsf[nosmokedata, 1]
N[%]

270/371

0.727763

Then we plot zsf[k] as a function of k over the range 0 through 4, subdivided into 20 equal intervals, as well as present the resultsÂ in table form:

T = Table[{N[k], N[zsf[nosmokedata, k]]}, {k, 0, 4, 4/20}];
TableForm[T, TableHeadings -> {None, {“k”, “zsf[k]”}}]
ListPlot[T, Joined -> True, Mesh -> All]

Well, that’s it … the result could have been written very nicely in MathematicaÂ® and saved as a PDF, or as a CDF and placed as an interactive document on the Web.

R has similar capabilities, so you pays your money and takes yourÂ choices.

I just feel data analystsÂ should be aware there is a choice.

Now if Wolfram (Steve) could lower the price ofÂ MathematicaÂ® to \$50Â …