# What is a function, anyway?

x

For programming purposes, one should think of a function as a type of object that takes certain types of INPUTS, and – once those inputs are given – produces a definite OUTPUT.

A simple example is the AVERAGE function, for which the input is a list of numbers

$x_1,\ldots, x_n$

and the output is the average

$\frac{x_1+\ldots +x_n}{n}$

We can think of the AVERAGE function schematically as follows:

The “object” that acts as input to the function is a finite list of numeric data. The “object” that is output by the function is a single numeric value. The function itself is also an “object”.

# Functions in R

x

You can define functions yourself in R using the function command. When you create a function it is an object, that can be used in R like any other object.

A function in R has 3 components:

• The function body. This is the entire code inside the function.
• The function arguments. These are the types of input needed for the function to work. They are specified in the body of the function.
• The function environment. This is a specification of the location of the inputs to the function.

The structure of a function will typically look like this:

functionname<-function(argument1, argument2, … ){computations using the arguments}

## An example

In this example we use a function to compute the fraction of data points, from a given data set, that lie within $k>0$ standard deviations from the mean.

We INPUT a vector of numerical data $c(x_1,\ldots , x_n)$ and a positive number $k$.

Note that the input vector is a single object in R: it has numerical components, yet so far as R is concerned it is a column vector (= an ordered list).

We OUTPUT a single number: the fraction of data points, from$c(x_1,\ldots , x_n)$, that lie within $k>0$ standard deviations from the mean of $c(x_1,\ldots , x_n)$.

In specifying this function we will do something that is very common – we will make use of other, existing R functions, such as mean and standard deviation (among others).

### Let’s see how this might work

Let’s name the function zsf (for “z-score function”).

So our definition will begin:

zsf<-function(argument1, argument2, … ){computations using the arguments}

and we have to fill in the arguments and the computations to get the output.

The function arguments will be a column vector and a numeric value. Let’s call the data vector argument data, and the numeric argument k:

zsf<-function(data, k){computations using the arguments}

To do the computation in the body of the function we have to compute the mean and standard deviation of the input data set:

mean(data), sd(data)

Note that the functions mean and sd are part of the R stats package, which loads automatically when you open R.

Now we have to carry out the remaining computations that will give us the output:

• First, we extract from the data those data points that are within k standard deviations of the mean m. We do this using the built-in subset and function:

subset(data, abs(data-mean(data))<=k*sd(data))

• Then we want the ratio of the size of this data subset to the full data set:

length(subset(data, abs(data-mean(data))<=k*sd(data)))/length(data)

Stringing this together, the final form for the function is:

zsf<-function(data, k){length(subset(data, abs(data-mean(data))<=k*sd(data)))/length(data)}

### Realizing the function in R

To realize this function in R we first need to import a data set. Here is a data set consisting of birth weights, in ounces, of babies of mothers who did not smoke during pregnancy, taken from the Stat Labs Data Page, and formatted as a .txt file: nosmoke

By right-clicking on the “nosmoke” file link you can get the URL of this file, which is http://www.blog.republicofmath.com/wp-content/uploads/2015/06/nosmoke.txt

We import this data into R, name it “nosmoke” and define the z-score function as above, and use it to  compute what fraction of the data is within 1 standard deviation of the mean:

• First, the read.table command reads in the nosmoke data, which has no header row, and for which we have blank separators:

• However, when R reads the data into memory in this format, it is a data frame and not a simple column vector. You can see this by entering:

> str(nosmoke)

to get the structure of the data frame “nosmoke”. You will see in the structure of the data frame a header for the birth weights. In our case the header, given by R, is ‘V1’:

‘data.frame’: 742 obs. of 1 variable:
\$ V1: int 120 113 123 136 138 132 120 140 114 115 …

We can extract the column headed ‘V1’ as follows:

> nosmokedata=nosmoke[[‘V1’]]

The “nosmokedata” is now a column vector, and we can, for example, plot a histogram of this data:

> hist(nosmokedata)

• Now we define the z-score function:

> zsf<-function(data,k){length(subset(data,abs(data-mean(data))<=k*sd(data)))/length(data)}

• And then we use the function to calculate what fraction of the data is within 1 standard deviation of the mean:

> zsf(nosmokedata,1)
[1] 0.7277628

So, approximately 72.8% of the data is within 1 standard deviation of the mean.

### Plotting the function

Given a data set, such as the “nosmokedata” above, we can vary k and plot the fraction of data within k standard deviations of the mean, as simply a function of k (with the data as a given: in other words, with the data argument fixed).

• First we need to calculate a potential range of values for the variable k.

We calculate:

> (mean(nosmokedata)-min(nosmokedata))/sd(nosmokedata)
[1] 3.911052

and

> (max(nosmokedata)-mean(nosmokedata))/sd(nosmokedata)
[1] 3.043495

and so see that all the data lies within 4 standard deviations of the mean.

• We then create a vector of possible values for k using the seq function:

> values<-seq(0, 4, by=4/20)

Here we have chosen the range from 0 through 4, and divided that into 20 equally spaced intervals. A readout shows us the computed range of k values:

> values
[1] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0

We check that “values” is a vector:

> is.vector(values)
[1] TRUE

• Now we apply a function of 1 variable, k, to this vector of values.

First we define the function of the variable k:

> func<-function(k){zsf(nosmokedata,k)}

Then we use the function lapply to create a list as a result of applying this function of 1 variable to the vector “values”, and we unlist the result to get a data frame:

> y=unlist(lapply(values, func))

• We combine these two data frames with the cbind function, to get a data frame of k values paired with zsf(nosmokedata, k) values, and then plot that data frame:

> pairs<-cbind(values,z)

> plot(pairs,type=”b”,main=”zsf(k) as a function of k”, xlab=”k”, ylab=”zsf(k)”)

The plot allows us to see visually how the proportion of data within k standard deviations of the mean varies with k.

Note: the option type=“b” in the plot function draws both points and lines.

A skilled R programmer could have done this quicker and slicker: I hope I have laid out enough information so that you can see how functions are defined, can be applied to vectors, and plotted to give useful visual information.

### Saving the function

Having defined a function such as the zsf function above, we would like to save it for future use.

Customizing R at Quick R byRob Kabacoff has a useful discussion on loading user-defined functions into R on startup.

x

## The number 3608528850368400786036725

Ben Vitale (@BenVitale) announced that 3608528850368400786036725 is  a 25 digit number with the property that  each number formed by its first n digits is divisible by n, for n from 1 through 25.

How could we find similar numbers, and could there be a larger number than 3608528850368400786036725 with this property? (which we will call the Vitale property)

The numbers with 2 digits having the Vitale property are just the even numbers between 10 and 98.

We will investigate this issue using Mathematica®, so here’s our list of such 2 digit numbers generated in Mathematica®:

vitaleproperty[2] = Range[10, 99, 2]

{10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,

52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98}

How about numbers with 3 digits with the Vitale property?

They have to be obtained from even numbers by adding a digit to the left, and must be divisible by 3.

Here’s how Mathematica® can generate all the numbers obtained by appending a digit from a given number:

adddigits[n_] := Table[FromDigits[Append[IntegerDigits[n], d]], {d, 0, 9}]

For example, here’s what we get by applying the adddigits function to 248:

{2480, 2481, 2482, 2483, 2484, 2485, 2486, 2487, 2488, 2489}

To obtain 3 digit numbers with the Vitale property we have to append a digit to even numbers, and check which resulting 3 digit numbers are divisible by 3:

Cases[Flatten[Map[adddigits, vitaleproperty[2]]], x_ /; Mod[x, 3] == 0]

{102,105,108,120,123,126,129,141,144,147,162,165,168,180,183,186,189,201,

204,207,222,225,228,240,243,246,249,261,264,267,282,285,288,300,303,

306,309,321,324,327,342,345,348,360,363,366,369,381,384,387,402,405,

408,420,423,426,429,441,444,447,462,465,468,480,483,486,489,501,504,

507,522,525,528,540,543,546,549,561,564,567,582,585,588,600,603,606,

609,621,624,627,642,645,648,660,663,666,669,681,684,687,702,705,708,

720,723,726,729,741,744,747,762,765,768,780,783,786,789,801,804,807,

822,825,828,840,843,846,849,861,864,867,882,885,888,900,903,906,909,

921,924,927,942,945,948,960,963,966,969,981,984,987}

These are the 3 digit numbers divisible by 3, who’s first 2 digits are divisible by 2.

This suggests how we can recursively build n digit numbers with the Vitale property, from n-1 digit numbers with the Vitale property:

vitaleproperty[n_] := vitaleproperty[n] = Cases[Flatten[Map[adddigits, vitaleproperty[n – 1]]], x_ /; Mod[x, n] == 0]

For example:

vitaleproperty[4]

yields:

{1020,1024,1028,1052,1056,1080,1084,1088,1200,1204,1208,1232,1236,1260,1264,

1268,1292,1296,1412,1416,1440,1444,1448,1472,1476,1620,1624,1628,1652,1656,

1680,1684,1688,1800,1804,1808,1832,1836,1860,1864,1868,1892,1896,2012,2016,

2040,2044,2048,2072,2076,2220,2224,2228,2252,2256,2280,2284,2288,2400,

2404,2408,2432,2436,2460,2464,2468,2492,2496,2612,2616,2640,2644,2648,2672,

2676,2820,2824,2828,2852,2856,2880,2884,2888,3000,3004,3008,3032,3036,

3060,3064,3068,3092,3096,3212,3216,3240,3244,3248,3272,3276,3420,3424,3428,

3452,3456,3480,3484,3488,3600,3604,3608,3632,3636,3660,3664,3668,3692,3696,

3812,3816,3840,3844,3848,3872,3876,4020,4024,4028,4052,4056,4080,4084,

4088,4200,4204,4208,4232,4236,4260,4264,4268,4292,4296,4412,4416,4440,

4444,4448,4472,4476,4620,4624,4628,4652,4656,4680,4684,4688,4800,4804,

4808,4832,4836,4860,4864,4868,4892,4896,5012,5016,5040,5044,5048,5072,

5076,5220,5224,5228,5252,5256,5280,5284,5288,5400,5404,5408,5432,5436,

5460,5464,5468,5492,5496,5612,5616,5640,5644,5648,5672,5676,5820,5824,5828,

5852,5856,5880,5884,5888,6000,6004,6008,6032,6036,6060,6064,6068,6092,

6096,6212,6216,6240,6244,6248,6272,6276,6420,6424,6428,6452,6456,6480,6484,

6488,6600,6604,6608,6632,6636,6660,6664,6668,6692,6696,6812,6816,6840,

6844,6848,6872,6876,7020,7024,7028,7052,7056,7080,7084,7088,7200,7204,

7208,7232,7236,7260,7264,7268,7292,7296,7412,7416,7440,7444,7448,7472,7476,

7620,7624,7628,7652,7656,7680,7684,7688,7800,7804,7808,7832,7836,7860,

7864,7868,7892,7896,8012,8016,8040,8044,8048,8072,8076,8220,8224,8228,

8252,8256,8280,8284,8288,8400,8404,8408,8432,8436,8460,8464,8468,8492,

8496,8612,8616,8640,8644,8648,8672,8676,8820,8824,8828,8852,8856,8880,

8884,8888,9000,9004,9008,9032,9036,9060,9064,9068,9092,9096,9212,9216,

9240,9244,9248,9272,9276,9420,9424,9428,9452,9456,9480,9484,9488,9600,

9604,9608,9632,9636,9660,9664,9668,9692,9696,9812,9816,9840,9844,9848,9872,9876}

The size of the collection of n-digit numbers with the Vitale property grows with n for a while, but then begins to decrease:

T = Table[{n, Length[vitaleproperty[n]]}, {n, 2, 15}]

{{2,45},{3,150},{4,375},{5,750},{6,1200},{7,1713},{8,2227},{9,2492},{10,2492},{11,2225},{12,2041},{13,1575},{14,1132},{15,770}}

A plot shows this initials increase and then decrease quite markedly:

ListPlot[T, PlotStyle -> Red, PlotMarkers -> {Automatic, Medium}]

So, lets’s calculate, and plot, the number of n-digit numbers with the Vitale property versus n for n from 2 through 26:

T = Table[{n, Length[vitaleproperty[n]]}, {n, 2, 26}]

{{2,45},{3,150},{4,375},{5,750},{6,1200},{7,1713},{8,2227},{9,2492},{10,2492},{11,2225},{12,2041},{13,1575},{14,1132},{15,770},{16,571},{17,335},{18,180},{19,90},{20,44},{21,18},{22,12},{23,6},{24,3},{25,1},{26,0}}

ListPlot[T, PlotStyle -> Red, PlotMarkers -> {Automatic, Medium}]

So we see that there is exactly one 25 digit number with the Vitale property – the venerable 3608528850368400786036725 – and that appending any of the digits 0 through 9 to this number does not result in a 26 digit number that is divisible by 26. This means that 3608528850368400786036725 is the LARGEST number with the Vitale property.

Kudos to Ben Vitale for finding it!

Postscript: Éric Angelini points out that the number 3608528850368400786036725, and its properties discussed above, appears in the Online Encyclopedia of Integer Sequences.