My colleague Mercedes McGowen and I examined a measure of individual student gain by pre-service elementary teachers, related to Richard Hake’s use of mean gain in the study of reform classes in undergraduate physics.

The gain statistic assesses the amount individual students increase their test scores from initial-test to final-test, as a proportion of the possible increase for each student.

We examined the written work in mathematics classes of pre-service elementary teachers with very high gain and those with very low gain and showed that these groups exhibit distinct psychological attitudes and dispositions to learning mathematics.

We showed a statistically significant, small, increase in average gain when course goals focus on patterns, connections, and meaning making in mathematics.

A common belief is that students with low initial-test scores will have higher gains, and students with high initial-test scores will have lower gains. We showed that this is not correct for a cohort of pre-service elementary teachers.




Given two data sets it is a not unreasonable question to ask if they have similar distributions.

For example, if we produce one data set of 500 numbers between 0 and 1, chosen uniformly and randomly (at least as randomly as we can using a pseudo-random number generator), and another data set of 500 numbers distributed normally, with mean 0 and variance 1, then eyeballing their histograms tells us , even if we did not know, that they are not similarly distributed:



A more considered way of approaching the question of whether two data sets are similarly distributed is to utilize one or more goodness of fit tests. There are several of these in common use, including:

Mathematica® incorporates these, and other goodness-of-fit tests in the function DistributionFitTest[]

These goodness-of-fit tests basically perform a hypothesis test with the null hypothesis H_0 being that the  data sets are identically  distributed,  and an alternative hypothesis H_a that they are not.

The goodness-of-fit tests return a p-value, and a small p-value indicates it is unlikely the data sets are similarly distributed.

So, if we carry out the Pearson Chi-Square test on the uniform and normal data sets, as above, we get the exceptionally small p-value 5.32085\times10^{-85} indicating, very strongly, that the two data sets are not similarly distributed.

The p-value from a Pearson Chi-Square test is a random variable: if we carry out the test on two other data sets, one from a uniform distribution, the other from a normal distribution, we will get a somewhat different p-value. How different? The plot below shows the variation in p-values when we simulated choosing 500 uniformly distributed numbers and 500 normally distributed numbers 1,000 times:

Distribution fit normal-uniform


We see that despite some variation the values are all very small, as we would expect.

Now let’s see what happens if we choose  500 points from a uniform distribution, and 500 more points from the same uniform distribution. We expect the Pearson Chi-Square test to return a reasonably high p-value, indicating that we cannot reject the idea that the data come from the same distribution.  We did this once and got a satisfying 0.741581 as the p-value.

But what if we repeat this experiment 1,000 times. How will the p-values vary?

The plot below shows the result of 1,000 simulations of choosing two data sets of 500 points, each from the same uniformly distribution:

Distribution fit uniform-uniform

These p-values seem reasonably uniformly spread between 0 and 1. Are they? The Cramér-von Mises goodness-of-fit test indicates that we cannot reject the hypothesis that these p-values are uniformly distributed in the interval [0,1].

We set the confidence level for the Pearson Chi-Square test at 0.01, so we could expect that 1 time in 100 the Pearson Chi-Square test will indicate that the two data sets are not from the same distribution, even though they are. In 1,000 trials we could expect about 10 such instances, and that is more or less what we find.

The uniform distribution of the p-values is, at first glance, quite surprising, but since the p-values themselves are random values we expect that they will indicate something other than what we know to be the case every so often, dependent on the confidence level we set beforehand. For example, with the confidence level set at 0.05, we see that about 5% of the time the Pearson Chi-Square test indicates that the two data sets are not from the same distribution even though they are. :



  1. We randomly reset the seed for the pseudo-random number generator in Mathematica® at each of the 1,000 simulations.
  2. The result of uniformly distributed p-values for data sets from the same distribution  is not peculiar to the Pearson Chi-Square test.
  3. The uniform distribution of p-values under the null hypothesis is proved here.



Excel has many statistical functions. Here is a description of one of the most commonly used: AVERAGE







Questions about some infinite series and necklaces of partitions of labeled beads

November 5, 2013

TranslationWe learn early in a study of infinite series that the geometric series sums to 1. Sometimes you will see this sort of reasoning: so so which is somewhat suspect in light of Euler’s “argument”: so . We need first to know that  converges absolutely, which we can do, for example by a use of the […]

Read the full article →

The CATs of mathematics

November 2, 2013

TranslationMathematics has numerous cats. For instance, there’s: Catalan numbers, which appear all over the place in counting situations. The Catalan constant Category theory The Catenary curve The Catenoid surface A Catalan surface Catastrophe theory But here’s a Cat that will never appear in the Encyclopedia of Mathematics: The totally pissed Cat:  Yeah, I’m looking at […]

Read the full article →

Why Jared and Brittany will never be good at math.2

September 19, 2013

TranslationJared was pleased how easy algebra was. He thought it might be hard, but he was finding it much easier than he had imagined. “Hey, Brittany, look at this,” he said. “How simple is this?” Brittany looked at Jared’s work on the problem of calculating how far the point is from the origin of the […]

Read the full article →

Why Jared and Brittany will never be good at math.1

September 15, 2013

TranslationJared “simplified” in algebra class. His math teacher asked how he got that. “Easy!” said Jared. “Just cancel the x’s.” Jared was pretty sure he was becoming skilled at canceling. “But what if you put into ?” the teacher asked Jared. “The result would be 2, not 1.” Brittany jumped into the conversation: “You can’t do […]

Read the full article →

When is an integral not an integral?

September 14, 2013

Translation No surprise to anyone really that students get confused by the difference between definite and indefinite integrals. The so-called indefinite integral is not really an integral at all, not in the sense of area: it’s the solution set to a differential equation. It’s not even usually a single function at all, but a whole […]

Read the full article →

The Rime of the Data Scientist

September 13, 2013

TranslationThe Rime of the Data Scientist (with apologies to Samuel Taylor Coleridge) Part I It is a Data Scientist, And he stoppeth one of three. `By thy Python code and glittering eye, Now wherefore stopp’st thou me?   The classroom doors are opened wide, And I am next one in; The others are met, the […]

Read the full article →

The leading (base 10) digit of an integer

June 25, 2013

TranslationSurely the leading (= left-most) digit of a positive integer is an obvious thing? Just stare at the integer (e.g. 7823) and observe the left-most digit (7, in this example)? Suppose, however, that you wanted to find the leading digit of a very large list of positive integers, a list so large it was hard […]

Read the full article →