I’m not happy with any of them. Before going into detail why, I need to say what are my criteria for a satisfactory derivation of the Born role:

- It needs to use postulates that either come from the non-probabilistic part of quantum mechanics, or are themselves better motivated than the Born rule itself.
- The resulting Born rule must predict that repeated measurements on a state $\alpha\ket{0}+\beta\ket{1}$ will often yield relative frequencies of 0 close to $|\alpha|^2/(|\alpha|^2+|\beta|^2)$.

The first criterion is hopefully uncontroversial: if the derivation of the Born rule depends on the probabilistic part of quantum mechanics it is circular, and if it uses more arcane postulates than the Born rule itself, why bother?

The second criterion tries to sidestep the eternal war about what is the definition of probability, focussing on the one point that everyone should agree about: probabilities must predict the frequencies. Should, but of course they don’t. Subjectivists insist that probabilities are the degrees of belief of a rational agent, leaving the connection with frequencies at best indirect. To the extent that this is a matter of semantics, fine, I concede the point, define probabilities to be subjective, but then I don’t care about them. I care about whatever objective property of the world that causes the frequencies to come out as they do. Call it propensity, call it chance, call it objective probability, doesn’t matter. This thing is what the Born rule is about, not about what is going on in the heads of rational agents.

And there *is* such a thing, there is a regularity in Nature that we have to explain. If you pass 1000 photons through a polariser, then a half-wave plate oriented at $\pi/4$ radians, and then measure their polarisation, you will often find the number of photons with horizontal polarisation to be close to 500.1 To simulate this experiment,2 I generated 8000 hopefully random bits from `/dev/urandom`

, divided them into sets of 1000, and counted the number of ones in each set, obtaining $504, 495, 517, 472, 504, 538, 498$, and $488$. We often got something close to 500. This is the regularity in Nature, this is the brute fact that a derivation of the Born rule must explain. Everything else is stamp collection.3

I’ve emphasized the second criterion because it’s where most derivations fail. Consider the most fashionable one, by Deutsch and Wallace, which I reviewed here. They have some axioms about what it means to be a rational agent, some axioms about what is a measurement, and some axioms about how rational agents should be indifferent to some transformations done on the quantum state. From these axioms they conclude that rational agents should bet on the outcomes of a quantum experiment with probabilities given by the Born rule. Who cares? Should we really believe that the statistics of an experiment will be constrained by rationality axioms? And conversely, if you can show that the statistics of a quantum experiment follow the Born rule, doesn’t it become obvious that rational agents should bet that they do, making the whole decision-theoretic argument superfluous? It’s worth noting that this same criticism applies to my derivation, as it is just a cleaned up version of the Deutsch-Wallace argument.

Let’s move on to Vaidman, Carroll, and Sebens. Their derivations differ on several important points, but I’m interested here in their common point: they passionately argue that probability is about uncertainty, that a genuine source of uncertainty in Many-Worlds is self-locating uncertainty, and that locality implies that your self-locating uncertainty must be given by the Born rule. Arguing about whether probability is uncertainty is a waste of time4, but their second point is well-taken: after a measurement has been done and before you know the outcome, you are genuinely uncertain about in which branch of the wavefunction you are. I just don’t see how could this be of fundamental relevance. I can very well do the experiment with my eyes glued to the screen of the computer, so that I’m at first aware that all possible outcomes will happen, and then aware of what the outcome in my branch is, without ever passing through a moment of uncertainty in between. Decoherence does work fast enough for that to happen.5 What now? No probability anymore? And then it appears when I close my eyes for a few seconds? That makes sense if probability is only in my head, but then you’re not talking about how Nature works, and I don’t care about your notion of probability.

Żurek’s turn, now. His proof follows the pattern of all the ones I talked about until now: it argues that probabilities should be invariant under some transformations on entangled states, and from a bunch of equalities gets the Born rule. An important different is that he refuses to define what probabilities are, perhaps in order to avoid the intractable debate. I think it’s problematic for his argument, though. One can coherently argue about what agents can know about, what they should care about, as the other authors did. But an undefined notion? It can do anything! Without saying what probability is how can you say it is invariant under some transformation? Also, how can this undefined notion explain the regularity of the frequencies? Maybe it has nothing to do with frequencies, who knows?

Let’s go back to Everett’s original attempt. It is of a completely different nature: he wanted to derive a measure $\mu$ on the set of worlds, according to which most worlds would observe statistics close to what the Born rule predicted. I think this makes perfect sense, and satisfies the second criterion. The problem is the first. He assumed that $\mu$ must be additive on a superposition of orthogonal worlds, i.e., that if $\ket{\psi} = \sum_i \ket{\psi_i}$ then

\[ \mu(\ket{\psi}) = \sum_i \mu(\ket{\psi_i}). \] I have no problem with this. But then he also assumed that $\mu$ must depend only on the 2-norm of the state, i.e., that $\mu(\ket{\psi}) = f(\langle \psi|\psi\rangle)$. This implies that

\[ f\left( \sum_i \langle \psi_i|\psi_i\rangle \right) = \sum_i f(\langle \psi_i|\psi_i\rangle), \] which is just Cauchy’s equation. Assuming that $\mu$ is continuous or bounded is enough to imply that $\mu(\ket{\psi}) = \alpha \langle \psi|\psi\rangle$ for some positive constant $\alpha$, that can be taken equal to 1, and we have a measure that gives rise to the Born rule.

But why assume that $\mu$ must depend only on the 2-norm of the state? This unjustified assumption is what is doing all the work. If we assume instead that $\mu$ depends only on the $p$-norm of the state, the same argument implies that $\mu(\ket{\psi}) = \|\ket{\psi}\|_p^p$, an absolute value to the $p$ rule, instead of an absolute value squared rule. Everett would be better off simply postulating that $\mu(\ket{\psi}) = \langle \psi|\psi\rangle$. Had he done that, it would be clear that the important thing was replacing the mysterious the Born rule, with its undefined probabilities, with the crystal clear Born measure. Instead, the whole debate was about how proper was his derivation of the Born measure itself.

The last proof I want to talk about, by Mandolesi, agrees with Everett’s idea of probability as measure theory on worlds, and as such satisfies the second criterion. The difficulty is deriving the Born measure. To do that, he uses not only the quantum states $\ket{\psi}$, but the whole rays $R_\psi = \{c \ket{\psi}; c \in \mathbb C \setminus 0\}$. He then considers a subset $R_{\psi,U} = \{c \ket{\psi}; c \in U\}$ of the ray, for some measurable subset $U$ of the complex numbers, to which he assigns the Lebesgue measure $\lambda(R_{\psi,U})$. Given some orthogonal decomposition of a ray $c \ket{\psi} = \sum_i c \ket{\psi_i}$ – which is most interesting when the states $\ket{\psi_i}$ correspond to worlds – the fraction of worlds of kind $i$ is given by

\[ \frac{\lambda(R_{\psi_i,U})}{\lambda(R_{\psi,U})} = \frac{\langle \psi_i|\psi_i \rangle}{\langle \psi|\psi \rangle}, \] which does not depend on $U$, and coincides with the Born measure.

I think he’s definitely using postulates more arcane than the Born measure itself. He needs not only the rays and their subsets, by also the Lebesgue measure on the complex line, for which his only justification is that it is natural. No argument here, it is very natural, but so is the Born measure! It is just an inner product, and is there anything more natural than taking inner products in a Hilbert space?

To conclude, I’m skeptical that there can even be a satisfactory derivation of the Born rule. It obviously doesn’t follow from the non-probabilistic axioms alone; we need anyway an additional postulate, and it’s hard to think of a more natural one than $\mu(\ket{\psi}) = \langle \psi|\psi\rangle$. Perhaps we should focus instead in showing that such an axiomatization already gives us everything we need? In this vein, there’s this curious paper by Saunders, which doesn’t try to derive the Born rule, but instead carefully argues that the mod-squared amplitudes play all the roles we require from objective probabilities. I find it curious because he never mentions the interpretation of the mod-squared amplitudes as the amounts of worlds with a given outcome, which is what makes the metaphysics intelligible.

]]>Still, still. The result is that there’s yet another paper perpetuating the use of $p$-values, and moreover my colleagues might see it and think that *I* defend them! The horror! My reputation would never recover. To set things right, I’m writing this post here to explain why $p$-values are nonsense, and restate our results in terms of Bayes factors.

First of all, what is a $p$-value? It is the probability of observing the data you got, or something more extreme, under the assumption that the null hypothesis is true2. If this probability is less than some arbitrary threshold, often set at $0.05$ or $10^{-5}$, the null hypothesis is said to be rejected.

The very definition already raises several red flags. Why are we considering the probability of “something more extreme”, instead of just the probability of the data we actually got? Also, what counts as “more extreme”? It doesn’t only sound ambiguous, it is ambiguous. And what about this arbitrary threshold? How should we set it, and why should we care about this number? More worrisome, how can we possibly regard the null hypothesis in isolation? In practice we always have another hypothesis in mind, but come on. What if we do reject the null hypothesis, but the alternative hypothesis attributes only slightly higher probability to the observed outcome? Obviously the experiment was just inconclusive, but apparently not if you take $p$-values seriously.

Perhaps these are just counterintuitive properties of a definition that has a solid basis in probability theory, and we need to just grow up and accept that life is not as simple as we would want. Well, as it turns out, there’s absolutely no basis for using $p$-values. Pearson just pulled the definition out of his ass, for use in his $\chi^2$-test. People then ran with it and started applying it everywhere.

In stark contrast, a Bayes factor is defined simply as the ratio of the probabilities of observing the data you got, given the or the alternativenull hypotheses:

\[ K := \frac{p(D|H_1)}{p(D|H_0)}.\] It suffers none of the problems that plague the definition of $p$-value: We only consider the actually observed data, no “more extreme” events are relevant. It explicitly depends on both hypotheses, there’s no attempt to reject an hypothesis in isolation. There’s no arbitrary threshold to set, it is just a number with a clear operational interpretation.

More importantly, it doesn’t come out of anybody’s ass, but from Bayes’ theorem: it is the data we need to calculate the posterior probability $p(H_0|D)$. In the case of two hypotheses, we have that

\[ p(H_0|D) = \frac{p(D|H_0)p(H_0)}{p(D|H_0)p(H_0) + p(D|H_1)p(H10)} = \frac1{1+K\frac{p(H_1)}{p(H_0)}}.\] How do our results look when written in terms of the Bayes factor then? Our goal was to minimize the expected $p$-value from a Bell test. The expression for the expected $p$-value is a hideous expression that we have no hope of optimizing directly, so with a lot of work we showed that the expected $p$-value after $n$ rounds $\langle p_n \rangle$ is upperbounded by

\[ \langle p_n \rangle \le (1-\chi^2)^n, \] where $\chi := \omega_q-\omega_\ell$ is the gap of the nonlocal game, and $\omega_q$ and $\omega_\ell$ are the Tsirelson and local bounds. We can easily maximize the gap, and a large gap is a sufficient condition for having a small $\langle p_n \rangle$, so that’s we did. We suspect that a large gap is also a necessary condition, and that the lower bound $\frac12(1-\chi)^n \le \langle p_n \rangle$ holds, but we couldn’t prove it.

With the Bayes factor we have pure bliss. The expression for the expected Bayes factor after $n$ rounds $\langle K_n \rangle$ is simple, no bound is needed:

\[ \langle K_n \rangle = \left( 1 + \frac{\chi^2}{\omega_\ell(1-\omega_\ell)} \right)^n. \] Maybe we can even optimize that directly. In any case, a large gap is also a sufficient condition for a large $\langle K_n \rangle$, as we have the bound

\[ \frac1{(1-\chi)^n} \le \langle K_n \rangle, \] valid for $\chi \ge 1/2$3 In contrast to the $p$-value case, it’s obvious from the expression that a large gap is *not* necessary for a large $\langle K_n \rangle$. We even have an explicit example of a nonlocal game with arbitrarily small gap and arbitrarily large $\langle K_n \rangle$, the Khot-Vishnoi game. Ironically enough, we use this same example in the paper to argue for the opposite conclusion, as it has $\langle p_n \rangle$ close to 1.

The Khot-Vishnoi game is of widespread interest in the literature because its ratio $\omega_q/\omega_\ell$ is arbitrarily large, which implies a large resistance to noise. We wanted to argue that this is useless if $\omega_q$ is too small, as the favourable outcome would happen too seldom in a finite experiment. As pointed out above, this is not true for this game, but the reasoning is still sound: if we have a nonlocal game with $\omega_q = 1/d^{3/4}$ and $\omega_\ell = 1/d$, the ratio is $d^{1/4}$, which does get arbitrarily large with $d$, but $\langle K_n \rangle$ goes roughly like $1+n/\sqrt{d}$ for large $d$, so the large ratio doesn’t help at all.

It’s easy to show that

\[ \langle K_n \rangle \le \left( \frac{\omega_q}{\omega_\ell} \right)^n, \] so having a large ratio is a necessary condition for a large $\langle K_n \rangle$. We have then a nice duality between the ratio and the gap: the gap is sufficient, but not necessary, and the ratio is necessary, but not sufficient.

I don’t think that beauty implies true; otherwise neutrinos would be massless. But it’s a joy when you have an independent argument for something to be true, and it turns out to be quite beautiful as well.

]]>Gödel has a Freezer which contains 4 people in a state of suspended animation, with you among them. He picks 2 of them randomly, with uniform probability, to send to Hilbert’s Hotel, a nice place which is never full. The other 2 are doomed to stay in the Freezer forever. Being a bit mad and a bit evil, Gödel decides to do the transfer in a capricious way. He sends them via two tram trips, each with 3 people: one of the chosen ones, who stays at Hilbert’s Hotel at the end of the trip, and the doomed ones, who are taken back to the Freezer. Not being too evil, Gödel erases the memory of the doomed ones after each trip so that they won’t be tortured by their dream of freedom. You wake up in the tram, are told about the whole setup, and ask yourself: what’s the probability that I’m going to Hilbert’s Hotel?

There are two answers that seem obviously correct, but conflict with each other:4

- $1/2$ – As 2 out of 4 people are going to Hilbert’s Hotel, and by assumption they were picked randomly and uniformly, the probability of each person must be $1/2$. Since everybody will wake up with certainty and ask themselves the question, you don’t learn anything by waking up, so the probability stays $1/2$.
- $1/3$ – Out of the 3 people in the tram only one is going to Hilbert’s Hotel. Since there’s nothing distinguishing anyone there, they must all have the same probability of being the chosen one, which must therefore be $1/3$.

Now suppose there are $n$ people instead of 4, with still only 2 doomed ones. The probability $1/2$ becomes $(n-2)/n$, while $1/3$ stays $1/3$. The probability that you will go to Hilbert’s Hotel is therefore rather high. But you wake up in the tram – which will happen in any case – and now suddenly you should be sad that your probability is only $1/3$? How can that possibly be true?

The difficulty here is that we’re trying to work with an informal, intuitive notion of probability. That’s not good enough. If we want to have actual answers we need to figure out what probability *means*. We should first notice that the paradox is about *subjective* probability. The probability that Gödel used to pick the chosen ones can be taken to be objective, but that’s in the past, now the objective probability that you’ll go to Hilbert’s Hotel is either zero or one, you just don’t know what it is. So even if we had an amazing and universally agreed upon definition of objective probability that wouldn’t help us at all.

What definition of subjective probability do we have then? I’m happy with the decision-theoretical one: you say that an event $E$ happens with subjective probability $p$ if $p$ is the fair price you assign to a bet that pays 1€ if $E$ happens, and nothing otherwise. Fair here means that you don’t expect to win or lose money from this bet, so you’d be fine buying or selling the bet for this this price. For example, if $E$ happens with objective probability $q$, and you pay $p$ for the bet, then with probability $q$ you win $(1-p)$€, and with probability $1-q$ you win $-p\,$€. The expected value is $q(1-p)-(1-q)p=q-p$, so you will expect to win money if $p < q$, and you will expect to lose money if $p>q$, so the fair price is $p=q$.

You wake up in the tram then, and consider, what is the fair price of a bet that pays 1€ if I end up at Hilbert’s Hotel? There are two cases to consider: if you are a chosen one, you’ll pay $p$€ and win 1€, so your net gain is $(1-p)$€. But if you’re a doomed one, you’ll pay $p$€ and win nothing, with a net gain of $-p$€. In this trip. But if you are a doomed one you will certainly do another trip (or will have already done it). So you’ll do the same bet twice, paying in total $-2p$€. The bet just became less attractive, and it stands to reason that you’ll be willing to pay less for it! The expectation value is $(1-p)/2 -2p/2 = (1-3p)/2$, which is zero for $p=1/3$.

But wait. Is it fair to make the doomed one pay twice for the bet? Well, the bet is imaginary in any case, but the doomed one is who wakes up twice and asks themselves the question twice. Paying for the bet twice is just a formalisation of being wrong twice. Intuitively, you might not want to guess you’re a doomed one because even though the probability is $1/2$, when you’re wrong you’ll be wrong twice.

Maybe we shouldn’t count the mistake of the doomed one twice, because their memory will be erased between the trips? Wiseman suggested this would be natural in my original post about the Sleeping Beauty. I find it rather artificial. If we erase their memory after the second trip as well (as the statement of the paradox says), should we then never count their mistake? And thus assign subjective probability 1 to going to Hilbert’s Hotel? Or in a completely different situation, should I assign probability 1 to winning a game of Russian roulette, as my memory will be erased if I lose? I think this is really changing the question from being wrong about your prediction than knowing that you were wrong.

It does raise an interesting question, though, of what does it take to make the subjective probability of going to Hilbert’s Hotel match the objective probability $1/2$. One could get rid of the multiple awakenings, by saying that Gödel will only let the passengers of the tram wake up in the first trip. But this is not enough, because now the probability that you will wake up if given that you are a chosen one is $1/2$, while the probability that you will wake up given that you are a doomed one is still 1, so the expectation value is $(1-p)/4 – p/2 = (1-3p)/4$, so the fair price is still $1/3$.

It does work if Gödel decides to wake up each of the four people exactly once, say all passengers of the tram wake up once, in the first trip, and the remaining chosen one wakes up alone in the second trip (and somehow doesn’t notice that the other two passengers are still asleep). Now everybody will wake up once, and only once, with probability 1, and the expectation value is $(1-p)/2 – p/2 = 1/2 – p$, and finally we have $p=1/2$. But the problem clearly changed.

]]>The question I wanted to answer was which beautiful angles have beautiful cosines? My completely objective definitions are that a beautiful angle is of the form $\frac{p}{q}\pi$ for small integers $p,q$, and a beautiful cosine is of the form $\frac{a}{b}+\sqrt{\frac{c}{d}}$, again for small integers. Surprisingly, the question can be answered completely, and it is not even difficult if you know number theory (I don’t).

The first thing to notice is that any algebraic number uniquely defines its minimal polynomial, which is the polynomial with rational coefficients (and leading coefficient one) of least degree that has this algebraic number as a root. So we only need to compute the minimal polynomial of $\alpha$ and see if it matches the minimal polynomial of $\cos(\frac{p}{q}\pi)$ for any $p,q$. Which doesn’t really help, because what the hell is the minimal polynomial of $\cos(\frac{p}{q}\pi)$?

Luckily though the minimal polynomial of $e^{i\pi\frac{p}{q}}$ instead is very well characterized: it is known as a cyclotomic polynomial, and it is given simply by $\Phi_n(x) := \prod_k (x-e^{i2\pi\frac{k}{n}})$, over all $k$ that are coprime to $n$.

Since $\cos(\frac{p}{q}\pi) = \alpha$ iff $e^{i\frac{p}{q}\pi} = \alpha \pm i\sqrt{1-\alpha^2}$ what we do instead is to compute the minimal polynomial of $\alpha + i\sqrt{1-\alpha^2}$ (which is equal to the minimal polynomial of $\alpha – i\sqrt{1-\alpha^2}$), and see if it matches some cyclotomic polynomial.

Let’s start by disregarding the trivial cases where $\frac{a}{b}+\sqrt{\frac{c}{d}}$ is $1,0,$ or $-1$. We know that $\cos(0),\cos(\pi/2),\cos(-\pi/2),$ and $\cos(\pi)$ give them, there’s no need for the big guns. We’ll assume from here on that $\frac{a}{b}+\sqrt{\frac{c}{d}}$ is never an integer.

The first interesting case is when $c=0$, so we’re just asking which beautiful angles have rational cosines. To answer that we need the minimal polynomial of $z=\frac{a}{b}+i\sqrt{1-\frac{a}{b}^2}$. It’s clear that $(x-z)(x-z^*)$ works, and we can’t get rid of the root $z^*$, so that’s the minimal one. Explicitly it is

\[ x^2 -2 \frac{a}{b} x +1, \]so it can only match cyclotomic polynomials of degree 2. Well, the degree of a cyclotomic polynomial $\Phi_n(x)$ is simply the number of integers $k \le n$ that are coprime to $n$, so we need to find which integers $n$ have exactly 2 smaller integers that are coprime to it.

This is kind of a strange question, but the reverse is rather straightforward: the number of integers coprime to $n$ is the definition of Euler’s totient function $\varphi(n)$. It has the simple lower bound of $\varphi(n) \ge \sqrt{n/2}$, which allows us to brute-force our problem: we just need to look at $n$ up to 8 and we’ll have all integers with exactly 2 coprimes. They are 3,4, and 6, so we just need to check whether our minimal polynomial matches $(x-e^{i2\pi\frac{1}{3}})(x-e^{i2\pi\frac{2}{3}})$, $(x-e^{i2\pi\frac{1}{4}})(x-e^{i2\pi\frac{3}{4}})$, or $(x-e^{i2\pi\frac{1}{6}})(x-e^{i2\pi\frac{5}{6}})$. Which is kind of overkill, because we already have the angles, so we can simply check instead which of them have rational cosines, which turn out to be $\cos(\frac{2}{3}\pi) = \cos(\frac{4}{3}\pi) = -\frac{1}{2}$, and $\cos(\frac{1}{3}\pi) = \cos(\frac{5}{3}\pi) = \frac{1}{2}$. And these are the only ones.

Moving on to the next interesting case, we set $a=0$, so that we are asking which beautiful angles have cosines equal to $\sqrt{\frac{c}{d}}$. We need then to figure out the minimal polynomial of $z = \sqrt{\frac{c}{d}} +i\sqrt{1-\frac{c}{d}}$. Now the polynomial $(x-z)(x-z^*)$ doesn’t work anymore, as it has irrational coefficients, but this is easy to fix by considering instead $(x-z)(x-z^*)(x+z)(x+z^*)$, which does work and is also clearly the minimal polynomial. Working it out, we get

\[x^4 + \left(2-4\frac{c}{d}\right)x^2 + 1.\] Calculating which integers have exactly four coprimes, like before, we see that they are 5, 8, 10, and 12, which is way too much work to check by hand, so we’ll do something else instead: another nice property of cyclotomic polynomials is that they only have integer coefficients, so we only need to look for $c,d$ such that $4\frac{c}{d}$ is integer. The only nontrivial cases are $c=1,d=2$ and $c=3,d=4$, which tells us that the only beautiful cosines that are square roots of rationals are $\pm\sqrt{1/2}$ and $\pm\sqrt{3/4}$. The angles that give these cosines are the well-known $\pi/4,7\pi/4,3\pi/4,5\pi/4$ and $\pi/6,11\pi/6,5\pi/6,5\pi/6$.

To finalize let’s consider cosines of the form $\frac{a}{b}+\sqrt{\frac{c}{d}}$ for $a \neq 0$ and $c \neq 0$. We need the minimal polynomial for the hideous thing

\[z = \frac{a}{b}+\sqrt{\frac{c}{d}}+i\sqrt{1-\left(\frac{a}{b}+\sqrt{\frac{c}{d}}\right)^2}.\] The previous case does not work, but if you think about it, it is easy to see that with

\[w = \frac{a}{b}-\sqrt{\frac{c}{d}}+i\sqrt{1-\left(\frac{a}{b}-\sqrt{\frac{c}{d}}\right)^2}\] the polynomial $(x-z)(x-z^*)(x-w)(x-w^*)$ will do the job. Working it out with Mathematica, because that’s already above what I can do by hand, we get

\[x^4 – 4\frac{a}{b}x^3 + \left(2+4\frac{a^2}{b^2}-4\frac{c}{d}\right)x^2 – 4\frac{a}{b}x + 1.\] At this point it’s better to just look up what are the cyclotomic polynomials for $n=5,8,10,12$, which as we said before are the only ones with degree 4. They are

\begin{align*}

\Phi_5(x) &= x^4 + x^3 + x^2 + x +1 \\

\Phi_8(x) &= x^4 + 1 \\

\Phi_{10}(x) &= x^4 – x^3 + x^2 – x + 1 \\

\Phi_{12}(x) &= x^4 – x^2 + 1,

\end{align*}which is not that bad. $\Phi_8(x)$ and $\Phi_{12}(x)$ are right out because they require $a=0$, which we analysed before. To make our minimal polynomial equal to $\Phi_5(x)$ we need $a=-1$ and $b=4$, which tells us that $c/d = 5/16$. We have a match! It implies that the cosines of the angles $2\pi/5, 4\pi/5, 6\pi/5, 8\pi/5$ must be $-\frac{1}{4}+\sqrt{\frac{5}{16}}$ or $-\frac{1}{4}-\sqrt{\frac{5}{16}}$. A similar matching occurs with $\Phi_{10}(x)$, the only difference is that $a=1$ instead of $-1$. It implies that the cosines of the angles $\pi/5, 3\pi/5, 7\pi/5, 9\pi/5$ must be $\frac{1}{4}+\sqrt{\frac{5}{16}}$ or $\frac{1}{4}-\sqrt{\frac{5}{16}}$. And that’s it! There can be no more.

To summarize, we found out that the only possible beautiful cosines of beautiful angles are $0, \pm1, \pm\frac{1}{2},\pm\frac{1}{\sqrt{2}},\pm\frac{\sqrt{3}}{2},\frac{1\pm\sqrt{5}}{4},\frac{-1\pm\sqrt{5}}{4}$. And the beautiful angles with these cosines are all the integer multiples of $\pi,\pi/2,\pi/3,\pi/4,\pi/5,\pi/6$. But be careful, this holds really only for cosines, the sines of integer multiples of $\pi/5$ are never beautiful.2

Funnily enough, these are precisely the angles and cosines we learn in school, except for $\pi/5$. Usually we only learn a couple of examples that are good enough, and leave behind an incredible wealth of interesting cases. But here no, out teachers taught us all the angles and cosines that were worth knowing! $\pi/5$ is an understandable exception, it is not as beautiful as the others, and we should anyway demand both the sines and cosines to be beautiful.

]]>A quick search through the literature revealed that nobody had answered this question, so I proved it was impossible and wrote a short and simple paper about it. There’s also the plot twist that implementing control-$U$ is often easy in practice, as any realistic implementation of an unknown unitary will have some extra information that makes it possible, but that’s another story.

In any case, at that time I believed that if one is given a large number of copies of $U$, one could just do process tomography on it, and thus easily implement control-$e^{i\phi(U)}U$. Luckily I didn’t write that down anywhere, because a couple of weeks ago Gavorová et al. proved that it is not possible to implement control-$e^{i\phi(U)}U$ for *any* number of copies of $U$, so in particular even tomography cannot do it.

Now how could this possibly be true? One can always do tomography, right? And if you did tomography, you can implement any function you want of the thing you tomographed, right? It turns out it is not so simple. If you successfully tomograph $U$, what you end up with is with a unitary $U’$ such that $\|\mathfrak{C}(U’) – \mathfrak{C}(U)\| < \epsilon$, where $\mathfrak{C}(U)$ is the Choi-Jamiołkowski representation of $U$. This complication is unfortunately necessary, as $U$ is not physically observable, only $\mathfrak{C}(U)$ is, for the very reason that $\mathfrak{C}(e^{i\alpha}U)$ does not depend on the global phase $\alpha$. It is just not true that $\|U’-U\| < \epsilon$, because we would need to tomograph the global phase to make it happen.

Ok, great, we have tomography, how about the function then? We want that $\|f\big(\mathfrak{C}(U’)\big)-f\big(\mathfrak{C}(U)\big)\| < \delta$, but this only follows from $\|\mathfrak{C}(U’) – \mathfrak{C}(U)\| < \epsilon$ if $f$ is continuous! If $f$ is not continuous then by definition there will exist a pair $U,U’$ such that $\mathfrak{C}(U’),\mathfrak{C}(U)$ are arbitrary close but $f\big(\mathfrak{C}(U’)\big),f\big(\mathfrak{C}(U)\big)$ are not. And Gavorová et al. showed precisely that if $f$ implements the controlled unitary it must be discontinuous.

The source of the discontinuity is not really the controlling, but just the killing off the global phase, so the seemingly simpler task of producing a unitary that does not depend on the global phase is equally impossible.

To be more precise, let $f$ be a function that takes $\mathfrak{C}(U)$ to a unitary $V$ such that $\mathfrak{C}(U) = \mathfrak{C}(V)$. Or equivalently, let $g$ be a function such that $g(U) = g(e^{i\alpha}U)$ for all $\alpha$ and $\mathfrak{C}\big(g(U)\big) = \mathfrak{C}(U)$. Then it is impossible to tomograph $U$ and produce $U’$ such that $\|f\big(\mathfrak{C}(U’)\big)-f\big(\mathfrak{C}(U)\big)\| < \delta$, or that $\|g(U’)-g(U)\| < \delta$.

To see why, let’s consider a specific example of such a function, $g(U) = \frac{u^*}{|u|}U$, where $u$ is the first nonzero element of the first column of $U$. Let

\[U = \begin{pmatrix} \sqrt{p} & -\sqrt{1-p}\ e^{i\beta} \\ \sqrt{1-p}\ e^{i\alpha} & \sqrt{p}\ e^{i(\alpha+\beta)} \end{pmatrix} \] be an arbitrary unitary of dimension $2$ with a global phase already removed, so that for $p > 0$ we have $g(U) = U$, and for $p=0$ we have

\[g(U) = \begin{pmatrix} 0 & -e^{i(\beta-\alpha)} \\ 1 & 0 \end{pmatrix}. \]Suppose that $p >0$, $\alpha=\pi$, and $\beta=0$, so that

\[U = \begin{pmatrix} \sqrt{p} & -\sqrt{1-p} \\ -\sqrt{1-p} & -\sqrt{p}\end{pmatrix}. \] Now if $p$ is small enough, this is very close to

\[U’ = \begin{pmatrix} 0 & -1 \\ -1 & 0\end{pmatrix}, \] so tomography might very well return us that instead. The problem is that

\[g(U’) = \begin{pmatrix} 0 & 1 \\ 1 & 0\end{pmatrix}, \] which is as far from $g(U)$ as it gets.

You might object that the probability of obtaining such $U’$ is extremely small; it is given roughly2 by the probability of measuring the state $U|0\rangle$ in the computational basis and always obtaining outcome 1, which is $(1-p)^n$. It is way too high for small $p$! In particular for fixed $n$, there is always a $p$ such that this probability is higher than any failure probability $f$ that we accept.

This breaks the fundamental promise of tomograhy: that for any error tolerance $\epsilon > 0$ and failure probability $f > 0$ you just need to measure $n$ copies of $U$ in order to obtain an estimate $U’$ that with probability $1-f$ is $\epsilon$-close to $U$. Sure, for any $U$ there will be an $n$ such that tomography works, but this doesn’t help us since we don’t know what $U$ is before we tomographed it!

Ok, ok, so tomography cannot converge to $g(U)$; it does converge instead to $g(U)$ modulo some random phase. Doesn’t seem so dramatic, does it? Since you didn’t care about the global phase to start with, why should you worry about whether it is random or a function of $U$? Well, yes, if you just do tomography it doesn’t matter, but perhaps you have some other, efficient protocol that does the control modulo a random phase. Now in every iteration of the protocol you might get a different phase, and if you just ignore this you’re going to lose coherence.

]]>First of all, what is the problem? Quantum mechanics is local. Very local in fact, if you look at QFT. It’s Lorentz covariant, by construction. Non-relativistic quantum mechanics is not *very* local, as it is still non-relativistic, but still much more local than people usually think: the only outrageously nonlocal thing about it is the collapse of the wavefunction, but everybody knows that it is not real. Even Bell’s theorem doesn’t imply nonlocality, as long as you don’t pretend there is a wavefunction collapse happening.

Nevertheless, quantum states seem rather nonlocal. If Alice and Bob share an entangled state, and Alice acts locally on it, the quantum state changes in a seemingly global way, oblivious to the locality of her action. Furthermore, you can’t separate it in Alice’s part of the state and Bob’s part of the state, it only exists as an indivisible whole. Sure, you can define reduced density matrices for Alice and for Bob, but in this way you lose a lot of information, the quantum state is much more than the union of these parts.2

What Deutsch and Hayden showed is that it doesn’t need to be like this. It is possible to define a kind of quantum state that you *can* separate in Alice’s part and Bob’s part, in such a way that the global quantum state is just the union of both parts, and they both evolve in a perfectly local way: whatever Alice does cannot change Bob’s part and vice-versa. I’ll call them ontic states, because perfectly local states seem perfectly real.

The key idea is to define an ontic state based on the dynamics of quantum mechanics, since they are local: instead of focusing on a quantum state $\ket{\psi}$ we focus on the unitary $U$ such that

\[\ket{\psi} = U\ket{0},\]where $\ket{0}$ is a standard fixed state, like how on quantum circuits we usually assume that all qubits are initialized on the $\ket{0}$ state. The ontic state of the global system is then just a Choi-Jamiołkowski representation of $U$2

\[ \lambda(U) := \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger \ket{j}\bra{i} U.\]So far so boring. Things get interesting when we represent the ontic states of the *parts* of the quantum system. First we rewrite the global ontic state explicitly as a composite system:

\[ \lambda^{AB}(U) := \sum_{ijkl} \ket{i}\bra{j} \otimes \ket{k}\bra{l} \otimes U^\dagger \big( \ket{j}\bra{i} \otimes \ket{l}\bra{k} \big)U,\] and we name the tensor factors as $A_1,B_1,A_2,B_2$, in that order. The ontic state of Alice’s part is then

\[ \lambda^{A}(U) := \tr_{B_1} \lambda^{AB}(U) = \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger \big(\ket{j}\bra{i} \otimes \id\big) U,\] and the ontic state of Bob’s part is

\[ \lambda^{B}(U) := \tr_{A_1} \lambda^{AB}(U) = \sum_{kl} \ket{k}\bra{l} \otimes U^\dagger \big( \id \otimes \ket{l}\bra{k}\big) U.\]First we show that the ontic states of the parts do evolve locally: if Bob applies an unitary $V$ to his part of the system, the global unitary is now $U’ = (\id \otimes V) U$. Alice’s ontic state is

\begin{align*}

\lambda^{A}(U’) &= \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger (\id \otimes V^\dagger)\big(\ket{j}\bra{i} \otimes \id\big) (\id \otimes V)U \\

&= \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger \big(\ket{j}\bra{i} \otimes V^\dagger V \big) U \\

&= \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger \big(\ket{j}\bra{i} \otimes \id \big) U \\

&=\lambda^{A}(U),

\end{align*} so Bob cannot change it, whatever he does.

Now we show that the ontic state of the global system is just the union of the ontic states of the parts. In fact it’s just the product of them, with some strategically placed indentities:

\[\lambda^{AB}(U) = \big[\lambda^{A}(U)\otimes \id_{B_1}\big]\big[\lambda^{B}(U)\otimes \id_{A_1}\big].\] That’s it. A demonstration in a couple of lines of something that was widely regarded as impossible. Neat, no?

We should also remark that these ontic states do give you back the usual quantum states. The global quantum state is

\[ \rho^{AB} := \bra{0}_{A_2B_2} \lambda^{AB}(U) \ket{0}_{A_2B_2} = U\ket{0}\bra{0}U^\dagger, \] and the reduced density matrices are given by the analogous formulas:

\[ \rho^{A} := \bra{0}_{A_2B_2} \lambda^{A}(U) \ket{0}_{A_2B_2} = \tr_B\big(U\ket{0}\bra{0}U^\dagger\big), \] and

\[ \rho^{B} := \bra{0}_{A_2B_2} \lambda^{B}(U) \ket{0}_{A_2B_2} = \tr_A\big(U\ket{0}\bra{0}U^\dagger\big).\]

One might wonder if it is possible to do a similar trick using just the quantum state $\ket{\psi} = U\ket{0}$ instead of the unitary $U$ that produces it. As Deutsch and Hayden argue (and Raymond-Robichaud gives a formal proof), the answer is no. The crucial problem is that $(X \otimes \id)\ket{\phi^+} = (\id \otimes X)\ket{\phi^+}$. That is, Alice can act on a quantum state in a completely local way, but you can’t tell from looking at resulting state whether it was changed by Alice or by Bob. This never happens if we look at $U$ instead: if $(S\otimes \id)U = (\id \otimes T)U$ then $S$ and $T$ must be just identity with a phase.

With the introduction done, we can speak about Bédard’s talk itself. He was interested in the number of parameters that are necessary to describe the ontic states. He showed that global ontic state of $n$ quantum systems of dimension $d$ requires $d^{2n}-1$ parameters. This might be obvious from the Choi-Jamiołkowski representation, as it is just the number of parameters of the global unitary $U$ (the global phase doesn’t count), but it was not obvious from the Deutsch-Hayden notation he was working with. More interestingly, he showed that the ontic state of a *single* quantum system needs $(1-d^{-2})d^{2n}$ parameters. Proving this is difficult in either notation, but intuitively we can expect this number to be true, since the ontic state of a single system does not depend on unitaries that act on the other systems. Since these have $d^{2n-2}-1$ parameters, we subtract this from the number of parameters of the global unitary and voilà!

Now, the dimension of the ontic state of a single system grows exponentially with the number of systems. This might seem strange, but Bédard had a good argument about why this is actually necessary: the quantum state of $n$ systems of dimension $d$ already needs $2d^n-2$ parameters. Since we want the local ontic states to recover at least the global quantum state, if follows that each must have at least $\approx d^n/n$ parameters, if we assume they are uniformly spread.

There was a lively discussion after the talk about Wallace and Timpson’s criticism of the Deutsch-Hayden model. They notice that for any unitary W that acts on the whole Hilbert space except $\ket{0}$, the unitary $U(1\oplus W)$ will give rise to the same quantum states, and therefore to the same observations, but in general different ontic states. They argue that this $W$ is akin to a gauge freedom, and we should identify all these different ontic states, lest we suffer from indeterminism and (ontic) state underdetermination. If we do that, however, we end up back with just regular quantum states and their nonlocality, making the Deutsch-Hayden model completely pointless.

Wallace and Timpson missed a very important point, however. Doing quantum state tomography is not the only way to determine $U$; this would be a hopelessly positivist point of view. No, determining $U$ is determining what are the dynamics of our theory. Doing this is what science is all about, and we use all tools we can to do that, more importantly assuming that the fundamental dynamics do not change with time, and that we can reproduce experiments. Once we know that fundamental dynamics of a (closed) system, we know what $U$ is, there is no underdetermination; the fact that $U(1\oplus W)$ would also give rise to the same observations is not relevant, it’s just a fun fact. The transformation from $U$ to $U(1\oplus W)$ is not a gauge transformation, it is a transformation from the real dynamics to some incorrect dynamics.

]]>The first problem with this attitude, of course, is that it’s solipsism. QBists maintain that quantum states are subjective, merely a degree of belief, and therefore everything that is ultimately a quantum state is also subjective: measurement outcomes, Hamiltonians, unitaries, atoms, molecules, trees, capivaras, other people, planets, galaxies, etc. Not yourself, though. Despite being made of atoms, you somehow avoid this subjectiveness, and are unquestionable real. You are also definitely not a quantum state, and cannot be in a superposition. But isn’t that inconsistent with other people being subjective? No, say the QBists. Quantum theory, they maintain, is a *single-user* theory. You can use it, and it will work for you. Jacques can also use it, and it will work for him. But you both together? Nope, that’s forbidden. And when your predictions conflict with Jacques’? It doesn’t matter, everything is subjective anyway. Hence, solipsism.

Hopefully I don’t need to spend more than a couple of lines explaining why solipsism is such a terrible idea, because I won’t. I’ll merely remark that solipsism can explain any conceivable experience you might have, being therefore no explanation at all. Also that it is not even possible for two different people to agree that solipsism is a great idea, because they will each have their own, incompatible version of solipsism.

Solipsism has such a bad reputation that QBists bristle at this accusation and reply, offended, that of course they are not solipsists, they accept the existence of an objective reality. When pressed on what this objective reality is, then, if not quantum states, they reply that Hilbert space dimension is definitely real. And… that’s it? We just have some Hilbert spaces with some given dimensions, and we’re supposed to model the world with that? Of course not, they say, clearly Hilbert space dimension is not enough. But they can’t say what else is real, and don’t seem terribly interested in finding out. QBism is nearly two decades old, and still they only go on and on about the subjective stuff. I find this attitude incomprehensible. Surely after you identify which parts of your theory are merely subjective, you would discard them and focus on understanding the objective parts, you know, to understand the world out there?

Ok, if they can’t say what is real, can they at least say how their subjective quantum states relate to reality? Are they a probability distribution over the unknown real stuff, as in a $\psi$-epistemic model? Some sort of hidden-variable theory? No, no, no, they say, of course not, they would then be vulnerable to all sorts of no-go theorems. Their quantum states are just beliefs, and how these beliefs relate to reality can never be specified. See, you can never be wrong if you don’t say anything definite!

Let’s turn our attention to the QBist “explanation” for two quantum puzzles. First, the old classic, the nonlocal collapse of the wave function. The explanation, as you might have guessed, is that it doesn’t matter, because the wave function is just subjective anyway. Well, ok, but what happens in reality then? How are the nonlocal correlations produced? Their answer is “Who knows, who cares, and what is this reality thing anyway? And by the way we’re definitely not solipsists!” Over and over again I hear somebody repeating that this QBist “explanation” works as well as the explanation for the nonlocal collapse of *classical* correlations. It doesn’t. In the classical case the probability distribution that collapses is indeed subjective, arising from ignorance of the actual state of the system. And we can explain how the correlations arise, because the local hidden-variables theory is actually true! Classical relativistic mechanics is indeed local and deterministic. But to apply this explanation to the quantum regime makes no sense: the local hidden-variables theory is *not* true. And to simply ignore the necessity of being able to produce the correlations in a local way, as the QBists do, is just ridiculous.

The second puzzle is Wigner’s friend, the *gedankenexperiment* that’s all the rage in the arXiv nowadays. I have explained it in detail before, but the short version is that Wigner’s friend is stuck inside a decoherence-free box, where she makes a measurement on a qubit that’s in the state $\frac{\ket{0}+\ket{1}}{\sqrt2}$. Assuming that the measurement is unitary, this results in the usual world-splitting represented as

\[ \ket{\text{Wigner}}\frac{\ket{\text{friend}_0}\ket{0} + \ket{\text{friend}_1}\ket{1}}{\sqrt2}. \]Wigner will make a measurement of his friend together with the qubit in the Bell basis, expecting to get result $\Phi^+$ with probability 1. If the friend is a Many-Worlder, both of her copies friend$_0$ and friend$_1$ will have no trouble agreeing with this conclusion, but if she is a QBist both of her copies will believe that her measurement caused a collapse, and therefore that Wigner will get result $\Phi^+$ or $\Phi^-$ with probability $1/2$. It takes a bit of mental gymnastics to get to this result, as QBism forbids her from assigning a quantum state to herself and to consider what Wigner will observe. In any case, her expectations are incompatible with Wigner’s, but that’s ok, as everything is subjective anyway and therefore it is not a problem if different agents assign different probabilities to the same event.

That’s what Rüdiger Schack said in a lecture in Vienna several years ago. Apparently this conclusion was too solipsistic even for the QBists, because they recently released a paper with a different take on this: now they refuse to say which probabilities the friend will assign to Wigner’s measurement, saying that it depends on too many things. They emphasize that the friend is not required to agree with Wigner, and that she can anyway learn what the relevant probabilities are if the experiment is repeated many times.

Now that’s just lame. This is a *gedankenexperiment*, for the love of Zeus, you’re free to specify whatever you think was not precise enough in order to get an answer. Just refusing to give one is the same trick of never being wrong by never saying anything.

That’s why I much prefer Bohmian mechanics and collapse models to QBism. At least they’re trying to do the right thing, and they’re wrong. QBism is not even trying, and is not even wrong.

**EDIT**: Shortly after posting this I got an angry email from Chris Fuchs. He sent a link to the post to his group, cc’ed to me, with the comment:

An actual scientist would wait until understanding what he was talking about before speaking. But the author of this surely knows his own inadequacies deep inside. My guess is they eat at him.

I think he needs a hug.

]]>- “It’s not a mistake.” (and they’re right)
- “Indeed, that’s wrong, I’ll correct it.”
- “It’s not a mistake.” (and they’re wrong)
- “Indeed, that’s wrong, but I’ll not correct it.”
- “Indeed, that’s wrong, people already told me about it, but I’ll not correct it.”

The first two need no explanation. The third one is of course frustrating to hear, but sometimes people will just disagree about what is a mistake or not, and we need to live and let live. The fourth one is really annoying. Your paper is wrong, just fix it, how can you leave something wrong on the arXiv? It’s so easy to fix! But maybe it really doesn’t make a difference, perhaps I’m the only one that even read that part of the paper and got sidetracked by it. But the fifth one really drives me crazy: if several people told you about the mistake it shows that it does matter, it is actually causing problems.

Anyway, I’m ranting about this because I’ve recently updated several of my papers on the arXiv to fix some mistakes. Maybe all my papers are wrong, but at least I correct them when I’m told about the problem. I even get happy about it, it’s a proof that somebody actually reads them. And when the mistake is in Appendix C, wow, respect, you need some determination to reach all the way down there.

Luckily in all the cases the mistakes were only in the proofs, and didn’t change any of the actual results of the papers. In two of the cases, arXiv:1401.8127 and arXiv:1611.08535, it’s easy to understand why: we just had written down the result in a particular way in the first draft, then decided to change the presentation for clarity, did a sloppy job in the rewriting, and ended up with some inconsistent equations. Another case, arXiv:1706.09854, it’s a bit mysterious: we had a completely wrong derivation leading up to a correct quantum circuit. I guess what happened is that we had found the quantum circuit in a different way, and afterwards thought about a way to derive it that didn’t involve black magic, because you really can’t write that you got the answer by sacrificing a black chicken for Exu at some crossroads. We can’t remember what was the original derivation, if it ever existed, but now we managed to find another derivation without black magic that does work.

The last case, arXiv:1903.10526, was really just blind luck. I really had the wrong thing in mind when deriving the result, which does follow from the wrong argument. It’s just that, by chance, it was correct. Zombie Schrödinger guided my hands on this one.

]]>Naïvely, you might reason that with probability 1/2 the other envelope will contain $2y$, and with probability 1/2 it will contain $y/2$, so the expectation value of switching is

\[ \frac122y + \frac12\frac{y}2 = \frac54y > y,\] and therefore it is always advantageous to switch, independently of the value of $y$. You don’t even need to look at it, you can just switch immediately after taking the envelope. And if offered the chance to switch again, you would. Although at this point most people would have realised that something is wrong and would stop the nonsense, only badly programmed robots would be stuck in an infinite loop.

What did go wrong, though? Checking the Wikipedia article about it won’t help, it’s a Library of Babel with incorrect solutions, correct solutions, several versions of the problem, and general babble. We have to work out the solution ourselves (and by doing that, increasing the size of the library, but hopefully increasing the proportion of correct solutions).

Well, the assertion that this probability above equals 1/2 is obviously suspect, we need to justify that somehow. It is either the probability that the envelope you chose, given $y$, is the one with the smallest amount of money, so $p(S|y)$, or the probability that it is the one with the biggest amount of money, so $p(B|y)$. The correct expectation value is then

\[ 2y p(S|y) + \frac{y}2 p(B|y),\] and the question of whether this is always larger than $y$ is the question of whether

\[ p(S|y) > 1/3 \] for all $y$. How do we calculate it? By Bayes’ rule we have that

\[ p(S|y) = \frac{ f(y|S)p(S) } { f(y|S)p(S) + f(y|B)p(B) } = \frac1{1+\frac{f(y|B)}{f(y|S)} }, \] where $f(y|S)$ and $f(y|B)$ are the probability densities of having $y$ in the envelopes $S$ and $B$, respectively, and we are assuming that $p(S) = p(B)$, that is, that the probability of picking either envelope to start with is the same.

We just need to know these densities then to solve the problem. Since they’re not given we can’t actually know whether it is a good idea to switch, but we can reason about which densities give us which conclusions, and in particular which density gives us the paradoxical conclusion that switching is always advantageous.

First not that the densities are not independent; since envelope $B$ is constrained to have twice the amount of envelope $S$, it follows that2

\[ f(y|B) = \frac12 f(y/2|S). \] Now let

\[ f(y|S) = \frac1M [y \le M] \quad\text{and}\quad f(y|B) = \frac1{2M} [y \le 2M],\] that is, we assume that the amount of money in envelope $B$ is uniformly distributed from 0 to some upper bound $M$. A perfectly reasonable, physical, probability density. It results in $p(S|y) = 2/3$ for $y \le M$, and $p(S|y) = 0$ for $y > M$. Which agrees very well with intuition: if your envelope contains less than $M$, chances are you got the shorter end of the stick and you should switch, and if has more than that you certainly got lucky, and shouldn’t switch.

Now we can let $M$ go to infinity, which is unphysical, but gives us a well-defined result: $p(S|y) = 2/3$ for all $y$, and you should always switch. Which again agrees with intuition, twofold: first of all, you make a nonsensical assumption, and got a nonsensical answer. Secondly, if every positive real number is in fact equally likely, it is always more probable that you got the smallest one. What doesn’t agree with intuition is that $p(S|y)$ is $2/3$ instead of $1/2$, as the naïve calculation assumed, presumably for such a uniform distribution. Can we make it be equal to $1/2$ with some other density?

For that we need $f(y|S) = f(y|B) = \frac12 f(y/2|S)$ for all $y$. Now I’m not a mathematician, but I bet that the unique solution to this functional equation, given some regularity assumption, is $f(y|S) = k/y$ for a positive constant $k$. Which is indeed the Jeffrey’s prior for a number distributed on the positive half-line, a much more meaningful notion of uniform distribution than the previous one, which makes sense for the whole real line. This is not a normalisable density, though, which is probably for the best: it would give the paradoxical conclusion that it is always advantageous to switch envelope.

We can make this a bit more precise if we consider it as a limit of normalisable densities. Let then

\[ f(y|S) = \frac1{y \log(M^2)} [1/M \le y \le M] \]

and

\[ f(y|B) = \frac1{y \log(M^2)} [2/M \le y \le 2M]. \]

This gives $p(S|y) = 1$ for $y \in [1/M,2/M]$, as you certainly got envelope $S$, the desired $p(S|y) = 1/2$ for $y \in [2/M,M]$, and $p(S|y) = 0$ for $y \in [M,2M]$, as you certainly got envelope $B$. If we let $M$ go to infinity again, we have indeed $p(S|y) = 1/2$ for all $y$, as our intuition would predict, but probably shouldn’t.

Another intuition says that you should always be indifferent to switching, as you have no idea about what $f(y|S)$ is, so learning $y$ doesn’t give you any information. Maybe there exists an $f(y|S)$ for which this is true? We need $2f(y|S) = f(y|B) = \frac12 f(y/2|S)$, which I bet has as unique solution $f(y|S) = k/y^2$. Again this is not normalisable, so this intuition is on rather shaky ground.

From all these examples, one might think that the paradox only arises from unnormalisable densities, but this is not the case. As shown by Broome, it also arises from the innocent-looking

\[ f(y|S) = \frac1{(1+y)^2}. \] It results in

\[ p(S|y) = \frac{y^2 + 4y + 4}{3y^2 + 8y + 6},\] which is indeed always strictly larger than $1/3$.

No, the real problem is that these densities are unphysical. Broome claims that his isn’t:

Both my examples of paradoxical distributions are decent statistical

distributions. There are processes for generating either: […] My continuous distribution can be generated as follows. Pick a number $t$ at random from a uniform distribution between 0 and 1, and let $y$ be $t/(1-t)$. Then $y$ has the required distribution.

While it is true that this $y$ would have the required distribution, there’s no conceivable physical process that could generate it. You would need $t$ to actually come from the continuum, in particular being able to get arbitrarily close to 1 so that $y$ gets arbitrarily large. The continuum is only a mathematical abstraction, though, any physical process for generating $t$ will need to discretize it; if $\epsilon$ is the smallest step you can do, then the largest $t$ strictly smaller than 1 is $1-\epsilon$, which gives you the largest $y$ as $(1-\epsilon)/\epsilon$. But having any upper bound on $y$ is enough to dispel the paradox.

]]>No, the problem is, what the hell is your DNA? It needs to respect the consistency condition that the DNA of you-father is exactly the same as the DNA of you-child. But the DNA of you-child is determined by the recombination of the DNA of you-father with the DNA of your mother. For instance, the pair of homologous chromosomes 1 of you-father, F1 and F2, recombine with the your mother’s pair, M1 and M2, giving the possible pairs of you-child as F1M1, F1M2, F2M1, and F2M2 (ignoring crossover). Now F1F2 must be equal to one of these four pairs, which means that at least one of your chromosomes must be equal to your mother’s! That’s nothing surprising if you don’t time-travel, but if you do, it means that at some point you share a chromosome with this girl you just met.

Let’s say then that F1 = M1. Then the possible pairs of you-child are M1M1, M1M2, F2M1, and F2M2. The first possibility is bound to give you some horrible genetic disease, so let’s exclude that3. The second possibility would make you an identical twin of your mother, which is fine for chromosome 1, but cannot be the case for the sex chromosomes, unless you are one of the exceedingly rare true hermaphrodites and possibly also your own mother.

This leaves us with the most prosaic alternatives F2M1 or F2M2. Which are apparently unproblematic, until you ask where does F2 come from. It doesn’t depend on anything from the past, even probabilistic. It’s completely arbitrary, its genes don’t even need to be human, apart from the constraint that they shouldn’t make reproduction with humans impossible.

If that doesn’t disturb you, consider Everett and Roman’s undeserved Fields medal paradox2: one day you wake up to find on your desk a paper, signed by you, with a proof of the Riemann hypothesis. You read the paper, check that the proof is in fact correct, and publish it. You become famous for the result and eventually receive the Fields medal. You then find a time machine, and use it to deposit the paper on your desk just before you found it. Now, who wrote this proof? Who deserves the Fields medal?

]]>