I’m frequently told that probabilities are the limit of relative frequencies for an infinite number of repetitions. It sounds nice: it defines a difficult concept – probabilities – in terms of a simple one – frequencies – and even gives us a way to measure probabilities, if we fudge the “infinite” part a bit. The problem with this definition? It is not true.

First of all, this limit does not exist. If one makes an infinite sequence of zeroes and ones by throwing a fair coin (fudging away this pesky infinity again), calling the result of the $i$th throw $s_i$, the relative frequency after $n$ throws is

\[ f_n = \frac1n\sum_{i=1}^{n}s_i.\] What should then $\lim_{n\to\infty}f_n$ be? $1/2$? Why? All sequences of zeros and ones are equally possible – they are even equally probable! What is wrong with choosing the sequence $s = (0,0,0,\ldots)$? Or even the sequence $(0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,\ldots)$, whose frequencies do not converge to any number, but eternally oscillate between $0$ and $1$? If for some reason one chooses a nice1 sequence like $s=(0,1,0,1,0,1,\ldots)$, for which the limit does converge to $1/2$, what is wrong with reordering it to obtain $s’ = (s_1,s_3,s_2,s_5,s_7,s_4,\ldots)$ instead, with limit $1/3$?

No, no, no, you complain. It is true that all sequences are equiprobable, but most of them have limiting frequency $1/2$. Moreover, it is a theorem that the frequencies converge – it is the law of large numbers! How can you argue against a theorem?

Well, what do you mean by “most”? This is already a probabilistic concept! And according to which measure? It cannot be a fixed measure, otherwise it would say that the limiting frequency is *always* $1/2$, independently of the single-throw probability $p$. On the other hand, if one allows it to depend on $p$, one can indeed define a measure on the set of infinite sequences such that “most” sequences have limiting frequency $p$. A probability measure. So you’re not explaining the single-throw probability in terms of the limiting frequencies, but rather in terms of the probabilities of the limiting frequencies. Which is kind of a problem, if “probability” is what you wanted to explain in the first place. The same problem happens with the law of large numbers. Its statement is that

\[\forall \epsilon >0 \quad \lim_{n\to\infty}\text{Pr}(|f_n -p|\ge \epsilon) = 0,\] so it only says that the *probability* of observing a frequency different than $p$ goes to $0$ as the number of trial goes to infinity.

But enough with mocking frequentism. Much more eloquent dismissals have already been written, several times over, and as the Brazilian saying goes, one shouldn’t kick a dead dog. Rather, I want to imagine a world where frequentism is *true*.

What would it take? Well, the most important thing is to make the frequencies converge to the probability in the infinite limit. One also needs, though, the frequencies to be a good approximation to the probability even for a finite number of trials, otherwise empiricism goes out of the window. My idea, then, is to allow the frequencies to fluctuate within some error bars, but never beyond. One could, for example, take the $5\sigma$ standard for scientific discoveries that particle physics use, and declare it to be a fundamental law of Nature: it is only possible to observe a frequency $f_n$ if

\[f_n \in \left(p-5\frac{\sigma}{\sqrt{n}},p+5\frac{\sigma}{\sqrt{n}}\right).\] Trivially, then, for large $\lim_{n\to\infty}f_n = p$, and even better, if we want to measure some probability within error $\epsilon$, we only need $n > \sigma^2/\epsilon^2$ trials, so for example 2500 throws are enough to tomograph any coin within error $10^{-2}$.

In this world, the gambler’s fallacy is not a fallacy, but a law of Nature. If one starts throwing a fair coin and observes 24 heads in row, it is literally impossible to observe another heads in the next throw. It’s as if there is a purpose pushing the frequencies towards the mean. It captures well our intuition about randomness. It is also completely insane: 25 heads are impossible only in the start of a sequence. If before them one had obtained 24 tails, 25 heads are perfectly fine. Also, it’s not as if 25 heads are impossible because their probability is too low. The probability of 24 heads, one tails, and another heads is even lower.

Even worse, if the probability you’re trying to tomograph is the one of obtaining 24 heads followed by one tail, then the frequency $f_1$ must be inside the interval \[[0,2^{-25}+\sqrt{2^{-25}(1-2^{-25})}]\approx [0,2^{-12.5}],\]which is only possible if $f_1 = 0$. That is, it is impossible to observe tails after observing 24 heads, as it would make $f_1=1$, but it is also impossible to observe heads. So in this world Nature would need to keep track not only of all the coin throws, but also which statistics you are calculating about them, and also find a way to keep you from observing contradictions, presumably by not allowing any coin to be thrown at all.

The people who tell you “that probabilities are the limit of relative frequencies for an infinite number of repetitions” make a more fundamental mistake than merely using a definition which isn’t true. They confuse interpretations of axiomatic theories with definitions. In Geometry and Experience from 1922 (before all those discussions about interpretations of quantum mechanics), Einstein started his explanation of the role of axiomatic theories by saying “As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”

Avoiding interpretation as often as possible is not a bad strategy. However, if the outcome of your model are predictions involving probabilities (like for the daily weather forecast, or for a geological survey on the chance that an earthquake of magnitude 6.7 or greater will occur before the year 2030 in the San Francisco Bay Area), then avoiding interpretation is no longer an option. The same is true if the inputs to your model include probabilities. However, this does not extent to internal probabilities. The fact that an infinite sequence does not have a probability (or that it has probability 0 if you insist on assigning a probability), or that there are non-measurable sets of infinite sequences for which it is impossible to assign any probability, it does not need to be interpreted (i.e. connected to reality).

No, I think what they are doing is perfectly legitimate; if one could show that the limiting relative frequency indeed equals the probability, this would make it possible to reduce the concept of probabilities to that of relative frequencies.

Alternatively, if they could show that it does have a well-defined value, even if it does not equal the probability (as is the case when one considers the relative frequencies of finite sequences), then they would be proposing new axioms for probability theory.

The only problem with what they’re doing is that it does not work.

When I think of definitions, then I have something like the definition of inertial and gravitational mass within Newton’s theory in mind. If you want to give a similar definition for probability, then you first need something analogous to Newton’s theory within which that definition lives and makes sense. But probability is not a concept like mass. I thought that the people who tell you “that probabilities are …” make a mistake on this level. Maybe they do, but I think I understand now that you have something else in mind, while evaluating their proposal.

A better analogy would be the notion of absolute space in Newton’s theory. It is a convenient ontological overcommitment, in the sense that if somebody would try to interpret absolute velocity with respect to physical reality, I would tell him that it does not need to be interpreted (similar to the internal probabilities in my first comment). But the notion of Euclidean space is independent of Newton’s theory, and one might want to define it, in the sense of giving one or more “explicit models” of it, which reduce it to something more basic.

This is what Descartes did in a certain sense. He reduced the n-dimensional Euclidean space to n one-dimensional coordinates, i.e. n real numbers. The real numbers can be reduced by Dedekind cuts to rational numbers, and rational numbers are a sufficiently clear (basic) notion that no further reduction is required. (Dedekind cuts alone without rational numbers risk being circular, and could produce the surreal numbers instead of the real numbers.)

The cartesian product also occurs for probabilities, they model independence. This is a very important notion, which can often be interpreted independent of the probabilities themselves. But after that it gets really difficult, since now you need to come up with an intuitive model, where sigma-algebras (and probability measures) make sense, and still describe everything in terms of single events (more or less).