I like to think of statistics not as pure math, but as baking with math. Other kinds of math, like geometry, linear algebra or calculus, or probability theory, remain within coherent abstract systems, which follow from self-evident properties of numbers, sets, and equations. Many of the formulas in statistics, on the other hand, are crutches and workarounds; they are meaningful because they work and not necessarily because they cohere in a logical system.

The baking metaphor helps me understand statistics, because I've always been a systems thinker, which means that I retain knowledge best when I can connect it to a logical system. Let's take the most basic concept in statistics, for example: Variance. Mathematically, the variance is the squared distance for any instance of x from the expected value of x. And the expected value of x, in turn, is a probability-weighted sum of the set of possible x values (the sample space). But what does that mean, in practical terms?

Variance, in baking terms, should be thought of as a measure of how well mixed your batter is. The batter is the sample space, and taking a spoonful of that batter is your sample. If you put cinnamon in the batter, how much cinnamon do you expect will be in your spoonful? The expected amount of any ingredient in a spoonful of very well mixed batter is going to be the same as the overall proportion of that ingredient within the whole, right?

On the other hand, if you mix your batter very badly, let's say, there might be huge clumps of flour in one spoonful, and huge clumps of sugar in another spoonful, and globs of egg in yet another spoonful. The variance of any given ingredient is going to be really high in poorly mixed batter, and really low in well mixed batter, because well mixed batter is incorporated. Taking spoonfuls (samples) of poorly mixed batter means you might get 50% flour in one spoonful, 2% flour in another spoonful.

Well mixed batter will have about the same proportion of flour in every spoonful, right? This proportion will match the expected amount of flour, right? And what is the expected amount of flour? Again, it's just the proportion of flour that you added in the first place. So, let's say your batter is 5 cups total, and 3 cups of that is flour. If the flour is completely incorporated, i.e. well mixed in, you'd expect any spoonful of the batter to consist of 3/5ths flour. (Obviously when comparing wet versus dry ingredients, it would be best to compare by weight, but most people don't weigh their ingredients, although some do).

In terms of sample space, the expected value E(x) is often described as a probability-weighted sum of values of x in the sample space. Rolling a six sided die, for instance, has a sample space of {1,2,3,4,5,6} and the probability of each outcome or value of x, just is the proportion of that value in the sample space. In the case of rolling fair dice, each "ingredient" is equally likely, or occupies the same proportion of the sample space, so the ultimate expected value is just the sum of all x's divided by n, which is no different from a simple average.

I start to lose the thread on E(x) almost immediately, when it's stated in terms of probability, because I think of probability as the theoretical possibility of an event, and expectation as the feeling of anticipation of an event. I have trouble connecting something static, like the amount of something in a sample, with a dynamic event, i.e. the probability of an occurence. Baking helps connect these concepts for me. If you want me to tell you the probability of finding flour in a spoonful of batter (i.e. the proportion of flour in the spoonful), I can't even begin to hazard a guess, unless I know how well mixed that batter is (the variance of the batter)!

How can we find the variance of the batter? Unlike the area of a circle, the null space of a matrix, or the derivative of a polynomial equation, I don't and can't know the variance of the batter in any a priori way, right? The only way for me to possibly know how well mixed some batter is, would be to test it empirically! Take one spoonful and analyze what is in it, then repeat, etc. If I find much more flour than expected in one spoonful, and much less flour than expected in another spoonful, I know that my batter is poorly mixed, that is, it's highly variable.

Variance becomes the foundation of so many concepts in statistics: The standard deviation, covariance, correlation, correlation coefficient, coefficient of determination, and probability density. All of these concepts are ultimately constructions, which use variance as a basis. Probability density too is just a representation of how well mixed the "batter" is. If you record the proportion of flour in every spoonful you take, and use it as an x-axis, that yields your probability density curve. It tells you how many spoonfuls are well mixed, and how many spoonfuls are poorly mixed.

Transitioning from pure math to applied math will always be tricky, and there may be better metaphors, which are more instructive for other people. But for me, I'll always find comfort in the fact that statistics is nothing more than baking with math!