What is the probability of an infinite sequence of coin tosses?

It’s 0, except on the trivial cases where it is 1.

But clearly this is the wrong way to formulate the question, as there are interesting things to be said about the probabilities of infinite sequences of coin tosses. The situation is analogous to uniformly sampling real numbers from the $[0,1]$ interval: the probability of obtaining any specific number is just 0. The solution, however, is simple: we ask instead what is the probability of obtaining a real number in a given subinterval. The analogous solution works for the case of coin tosses: instead of asking the probability of a single infinite sequence, one can ask the probability of obtaining an infinite sequence that starts with a given finite sequence.

To be more concrete, let’s say that the probability of obtaining Heads in a single coin toss is $p$, and for brevity let’s denote the outcome Heads by 1 and Tails by 0. Then the probability of obtaining the sequence 010 is $p(1-p)^2$, which is the same as the probability of obtaining the sequence 0100 or the sequence 0101, which is the same as the probability of obtaining a sequence in the set {01000, 01001, 01010, 01011}, which is the same as the probability of obtaining an infinite sequence that starts with 010.

There is nothing better to do with infinite sequences of zeroes and ones than mapping them into a real number in the interval $[0,1]$, so we shall do that. The set of infinite sequences that start with 010 are then very conveniently represented by the interval $[0.010,0.010\bar1]$, also known as $[0.010,0.011]$ for those who do not like infinite strings of ones, or $[0.25,0.375]$ for those who do not like binary. Saying then that the probability of obtaining a sequence in $[0.010,0.010\bar{1}]$ is $p(1-p)^2$ is assigning a measure to this interval, which we write as
\[ \rho([0.010,0.010\bar{1}]) = p(1-p)^2 \]Now if we can assign a sensible probability to every interval contained in $[0,1]$ we can actually extend it into a proper probability measure over the set of infinite sequences of coin tosses using standard measure-theoretical arguments. For me this is the right answer to the question posed on the title of this post.

So, how do we go about assigning a sensible probability to every interval contained in $[0,1]$? Well, the argument of the previous paragraph can clearly be extended to any interval of the form $[k/2^n, (k+1)/2^n]$. We just need write $k$ in the binary basis, padded with zeroes on the left until it reaches $n$ binary digits, and count the number of 0s and 1s. In symbols:
\[ \rho\left(\left[\frac{k}{2^n}, \frac{k+1}{2^n}\right]\right) = p^{n_1(k,n)}(1-p)^{n_0(k,n)} \]The extension to any interval where the extremities are binary fractions is straightforward. We just break them down into intervals where the numerators differ by one and apply the previous rule. In symbols:
\[ \rho\left(\left[\frac{k}{2^n}, \frac{l+1}{2^n}\right]\right) = \sum_{i=k}^{l} p^{n_1(i,n)}(1-p)^{n_0(i,n)} \]We are essentially done, since we can approximate any real number as well as we want we want by using binary fractions 1. But life is more than just binary fractions, so I’ll show explicitly how to deal with the interval
\[[0,1/3] = [0,0.\bar{01}] \]The key thing is to choose a nice sequence of binary fractions $a_n$ that converges to $1/3$. It is convenient to use a monotonically increasing sequence, because then we don’t need to worry about minus signs. If furthermore the sequence starts with $0$, then \[ [0,1/3] = \bigcup_{n\in \mathbb N} [a_n,a_{n+1}] \] and
\[ \rho([0,1/3]) = \sum_{n\in \mathbb N} \rho([a_n,a_{n+1}]) \] An easy sequence that does the job is $(0,0.01,0.0101,0.010101,\ldots)$. It lets us write the interval as
\[ [0,1/3] = [0.00, 0.00\bar{1}] \cup [0.0100, 0.0100\bar{1}] \cup [0.010100, 0.010100\bar{1}] \cup … \] which gives us a simple interpretation of $\rho([0,1/3])$: it is the probability of obtaining a sequence of outcomes starting with 00, or 0100, or 010100, etc. The formula for the measure of $[a_n,a_{n+1}]$ is also particularly simple:
\[ \rho([a_n,a_{n+1}]) = p^{n-1}(1-p)^{n+1} \] so the measure of the whole interval is just a geometric series:
\[ \rho([0,1/3]) = (1-p)^2\sum_{n\in\mathbb N} \big(p(1-p)\big)^{n-1} = \frac{(1-p)^2}{1-p(1-p)} \]

It might feel like something is missing because we haven’t examined irrational numbers. Well, not really, because the technique used to do $1/3$ clearly applies to them, as we only need a binary expansion of the desired irrational. But still, this is not quite satisfactory, because the irrationals that we know and love like $1/e$ or $\frac{2+\sqrt2}4$ have a rather complicated and as far as I know patternless binary expansion, so we will not be able to get any nice formula for them. On the other hand, one can construct some silly irrationals like the binary Liouville constant
\[ \ell = \sum_{n\in\mathbb N} 2^{-n!} \approx 0.110001000000000000000001\]whose binary expansion is indeed very simple: every $n!$th binary digit is a one, and the rest are zeroes. The measure of the $[0,\ell]$ interval is then
\[ \rho([0,\ell]) = \sum_{n\in \mathbb N} \left(\frac{p}{1-p}\right)^{n-1} (1-p)^{n!} \]Which I have no idea how to sum (except for the case $p=1/2$ ;)

But I feel that something different is still missing. We have constructed a probability measure over the set of coin tosses, but what I’m used to think of as “the probability” for uncountable sets is the probability density, and likewise I’m used to visualize a probability measure by making a plot of its density. Maybe one can “derive” the measure $\rho$ to obtain a probability density over the set of coin tosses? After all, the density is a simple derivative for well-behaved measures, or the Radon-Nikodym derivative for more naughty ones. As it turns out, $\rho$ is too nasty for that. The only condition that a probability measure needs to satisfy in order to have a probability density is that it needs to attribute measure zero to every set of Lebesgue measure zero, and $\rho$ fails this condition. To show that, we shall construct a set $E$ such that its Lebesgue measure $\lambda(E)$ is zero, but $\rho(E)=1$.

Let $E_n$ be the set of infinite sequences that start with a $n$-bit sequence that contains at most $k$ ones2. Then
\[ \rho(E_n) = \sum_{i=0}^k \binom ni p^i (1-p)^{n-i} \] and
\[ \lambda(E_n) = 2^{-n} \sum_{i=0}^k \binom ni \] These formulas might look nasty if you haven’t fiddled with entropies for some time, but they actually have rather convenient bounds, which are valid for $p < k/n < 1/2$: \[ \rho(E_n) \ge 1 - 2^{-n D\left( \frac kn || p\right)} \] and \[ \lambda(E_n) \le 2^{-n D\left( \frac kn || \frac 12\right)} \] where $D(p||q)$ is the relative entropy of $p$ with respect to $q$. They show that if $k/n$ is smaller than $1/2$ then $\lambda(E_n)$ is rather small (loosely speaking, the number of sequences whose fraction of ones is strictly less than $1/2$ is rather small), and that if $k/n$ is larger than $p$ then $\rho(E_n)$ is rather close to one (so again loosely speaking, what this measure does is weight the counting of sequences towards $p$ instead of $1/2$: if $k/n$ were smaller than $p$ then $\rho(E_n)$ would also be rather small).

If we now fix $k/n$ in this sweet range (e.g. by setting $k = \lfloor n(p + 0.5)/2\rfloor$)3 then
\[ E = \bigcap_{i \in \mathbb N} \bigcup_{n \ge i} E_n,\]is the set we want, some weird kind of limit of the $E_n$. Then I claim, skipping the boring proof, that
\[ \rho(E) = 1 \]and
\[ \lambda(E) = 0 \]But don’t panic. Even without a probability density, we can still visualize a probability measure by plotting its cumulative distribution function
\[ f(x) = \rho([0,x]) \]which for $p = 1/4$ is this cloud-like fractal:
Cumulative distribution function of probability measure rho

This entry was posted in Uncategorised. Bookmark the permalink.