A colleague of mine, Simon Morelli, told me about a fascinating puzzle, the two-envelope paradox: you can take one of two envelopes with money inside them, promised to contain one $x$ and the other $2x$ for some $x > 0$1. You take the envelope and see that it has a quantity $y$ inside. Now you are given the option of switching to the other envelope. Should you?
Naïvely, you might reason that with probability 1/2 the other envelope will contain $2y$, and with probability 1/2 it will contain $y/2$, so the expectation value of switching is
\[ \frac122y + \frac12\frac{y}2 = \frac54y > y,\] and therefore it is always advantageous to switch, independently of the value of $y$. You don’t even need to look at it, you can just switch immediately after taking the envelope. And if offered the chance to switch again, you would. Although at this point most people would have realised that something is wrong and would stop the nonsense, only badly programmed robots would be stuck in an infinite loop.
What did go wrong, though? Checking the Wikipedia article about it won’t help, it’s a Library of Babel with incorrect solutions, correct solutions, several versions of the problem, and general babble. We have to work out the solution ourselves (and by doing that, increasing the size of the library, but hopefully increasing the proportion of correct solutions).
Well, the assertion that this probability above equals 1/2 is obviously suspect, we need to justify that somehow. It is either the probability that the envelope you chose, given $y$, is the one with the smallest amount of money, so $p(S|y)$, or the probability that it is the one with the biggest amount of money, so $p(B|y)$. The correct expectation value is then
\[ 2y p(S|y) + \frac{y}2 p(B|y),\] and the question of whether this is always larger than $y$ is the question of whether
\[ p(S|y) > 1/3 \] for all $y$. How do we calculate it? By Bayes’ rule we have that
\[ p(S|y) = \frac{ f(y|S)p(S) } { f(y|S)p(S) + f(y|B)p(B) } = \frac1{1+\frac{f(y|B)}{f(y|S)} }, \] where $f(y|S)$ and $f(y|B)$ are the probability densities of having $y$ in the envelopes $S$ and $B$, respectively, and we are assuming that $p(S) = p(B)$, that is, that the probability of picking either envelope to start with is the same.
We just need to know these densities then to solve the problem. Since they’re not given we can’t actually know whether it is a good idea to switch, but we can reason about which densities give us which conclusions, and in particular which density gives us the paradoxical conclusion that switching is always advantageous.
First not that the densities are not independent; since envelope $B$ is constrained to have twice the amount of envelope $S$, it follows that2
\[ f(y|B) = \frac12 f(y/2|S). \] Now let
\[ f(y|S) = \frac1M [y \le M] \quad\text{and}\quad f(y|B) = \frac1{2M} [y \le 2M],\] that is, we assume that the amount of money in envelope $B$ is uniformly distributed from 0 to some upper bound $M$. A perfectly reasonable, physical, probability density. It results in $p(S|y) = 2/3$ for $y \le M$, and $p(S|y) = 0$ for $y > M$. Which agrees very well with intuition: if your envelope contains less than $M$, chances are you got the shorter end of the stick and you should switch, and if has more than that you certainly got lucky, and shouldn’t switch.
Now we can let $M$ go to infinity, which is unphysical, but gives us a well-defined result: $p(S|y) = 2/3$ for all $y$, and you should always switch. Which again agrees with intuition, twofold: first of all, you make a nonsensical assumption, and got a nonsensical answer. Secondly, if every positive real number is in fact equally likely, it is always more probable that you got the smallest one. What doesn’t agree with intuition is that $p(S|y)$ is $2/3$ instead of $1/2$, as the naïve calculation assumed, presumably for such a uniform distribution. Can we make it be equal to $1/2$ with some other density?
For that we need $f(y|S) = f(y|B) = \frac12 f(y/2|S)$ for all $y$. Now I’m not a mathematician, but I bet that the unique solution to this functional equation, given some regularity assumption, is $f(y|S) = k/y$ for a positive constant $k$. Which is indeed the Jeffrey’s prior for a number distributed on the positive half-line, a much more meaningful notion of uniform distribution than the previous one, which makes sense for the whole real line. This is not a normalisable density, though, which is probably for the best: it would give the paradoxical conclusion that it is always advantageous to switch envelope.
We can make this a bit more precise if we consider it as a limit of normalisable densities. Let then
\[ f(y|S) = \frac1{y \log(M^2)} [1/M \le y \le M] \]
and
\[ f(y|B) = \frac1{y \log(M^2)} [2/M \le y \le 2M]. \]
This gives $p(S|y) = 1$ for $y \in [1/M,2/M]$, as you certainly got envelope $S$, the desired $p(S|y) = 1/2$ for $y \in [2/M,M]$, and $p(S|y) = 0$ for $y \in [M,2M]$, as you certainly got envelope $B$. If we let $M$ go to infinity again, we have indeed $p(S|y) = 1/2$ for all $y$, as our intuition would predict, but probably shouldn’t.
Another intuition says that you should always be indifferent to switching, as you have no idea about what $f(y|S)$ is, so learning $y$ doesn’t give you any information. Maybe there exists an $f(y|S)$ for which this is true? We need $2f(y|S) = f(y|B) = \frac12 f(y/2|S)$, which I bet has as unique solution $f(y|S) = k/y^2$. Again this is not normalisable, so this intuition is on rather shaky ground.
From all these examples, one might think that the paradox only arises from unnormalisable densities, but this is not the case. As shown by Broome, it also arises from the innocent-looking
\[ f(y|S) = \frac1{(1+y)^2}. \] It results in
\[ p(S|y) = \frac{y^2 + 4y + 4}{3y^2 + 8y + 6},\] which is indeed always strictly larger than $1/3$.
No, the real problem is that these densities are unphysical. Broome claims that his isn’t:
Both my examples of paradoxical distributions are decent statistical
distributions. There are processes for generating either: […] My continuous distribution can be generated as follows. Pick a number $t$ at random from a uniform distribution between 0 and 1, and let $y$ be $t/(1-t)$. Then $y$ has the required distribution.
While it is true that this $y$ would have the required distribution, there’s no conceivable physical process that could generate it. You would need $t$ to actually come from the continuum, in particular being able to get arbitrarily close to 1 so that $y$ gets arbitrarily large. The continuum is only a mathematical abstraction, though, any physical process for generating $t$ will need to discretize it; if $\epsilon$ is the smallest step you can do, then the largest $t$ strictly smaller than 1 is $1-\epsilon$, which gives you the largest $y$ as $(1-\epsilon)/\epsilon$. But having any upper bound on $y$ is enough to dispel the paradox.
But Broome also presented a discrete distribution – we put $2^n$ and $2^{n+1}$ in the envelopes with the probability $2^n/3^{n+1}$ for all non-negative integers $n$. This doesn’t look unphysical. For example, in a radioactive decay we can check the state of decay every second and return the number of seconds since start if decay happened. This gives us a distribution on a countable but infinite set of events.
That’s also unphysical, because it requires you to be able to generate arbitrarily large integers. If you’re using radioactive decay to generate your random numbers, you need to be able to wait infinitely long for decay to happen. Which is of course impossible. As soon as you have a finite upper bound on how long you’re willing to wait — say until the sun becomes a red giant — you change the probability distribution, and the paradox disappears.