# Probability from decision theory

There exists a problem in the world, that is even more pressing than the position of cheese in the cheeseburger emoji: namely that nobody™ understands the Deutsch-Wallace theorem. I’ve talked to a lot of people about it, and the usual reaction I get is that they have heard of it, are vaguely interested in how can one prove the Born rule, but have no idea how Deutsch and Wallace actually did it.

It’s hard to blame them. The original paper by Deutsch is notoriously idiosyncratic: he even neglected to mention that one of his assumptions was the Many-Worlds interpretation[1]! Several people wrote papers trying to understand it: Barnum et al. mistakenly concluded that Deutsch was simply wrong, Gill made a valiant effort but gave up without a conclusion, and Wallace finally succeeded, formalising Deutsch’s proof and putting it in context.

Wallace was not successful, however, in popularising the theorem. I think this is because his paper is a 27-page mess. It did not help, either, that Wallace quickly moved on to improving Deutsch’s theorem, providing an even more complicated proof from weaker assumptions, leaving the community with no easy entry point in this confusing literature.

To fill this hole, then, I’m writing two “public service” blog posts. The first (this one) is to explain how to derive probabilities from decision theory, and the second is to show how this decision-theoretical argument, together with the Many-Worlds interpretation, yields the Born rule.

Unlike Deutsch, I’m going to use a standard decision theory, taken from the excellent “The Foundations of Causal Decision Theory” by James Joyce. We’re going to consider a simple betting scenario, where an agent decides how much to pay to take part in a game where they receive $a$ euros if event $E$ happens, and $b$ euros if event $\lnot E$ happens[2]. The game is then defined by the vector $(a,b)$, and the agent wants to decide its value $V(a,b)$.

The first rationality axiom we demand is that if the game is certain to pay them back $c$ euros, they must assign value $c$ to the game. This means that the agent is indifferent to betting per se, they don’t demand some extra compensation to go through the effort of betting, nor are they accept to lose money just go experience the thrill of betting (unlike real gambling addicts, I must say). The axiom is then

• Constancy: $V(c,c) = c$.

The second axiom we demand is that if for a pair of games $(a,b)$ and $(c,d)$ it happens that $a \ge c$ and $b \ge d$, that is, if in both cases where $E$ happens or $\lnot E$ happens the first game pays a reward that is larger or equal than the second game, then the agent must value the first game no less than the second game. The axiom is then

• Dominance: if $(a,b) \ge (c,d)$ then $V(a,b) \ge V(c,d)$.

The third and last axiom we need sounds very innocent: if the agent is willing to pay $V(a,b)$ to play the game with rewards $(a,b)$, and thinks that playing for rewards $(c,d)$ is worth $V(c,d)$, then the price they should pay for getting the rewards $(a+c,b+d)$ must be $V(a,b) + V(c,d)$. In other words: it shouldn’t matter if tickets for the game with rewards $(a+c,b+d)$ are sold at once, or broken down into first a ticket for rewards $(a,b)$ followed by a ticket for rewards $(c,d)$. The axiom is then

• Additivity: $V(a+c,b+d) = V(a,b) + V(c,d)$.

One problem with Additivity is that real agents don’t behave like this. People usually assign values such that $V(a+c,b+d) < V(a,b) + V(c,d)$, because if you have nothing then 10€ might be the difference between life and death, whereas if you already have 10,000€ then 10€ is just a nice gift. Besides not matching reality, this linear utility function implied by Additivity causes pathological decisions such as the the St. Petersburg paradox or Pascal’s Wager. But these problems do not appear if the amounts at stake are small compared to the agent’s wealth, which we can assume to be the case, and Additivity makes for a rather simple and elegant decision theory, so we’ll use it anyway[3]. After all, I’m not writing for the people whose objection to the Deutsch-Wallace theorem is that Deutsch’s decision theory implies linear utilities, but rather for those whose objection is “What the hell is going on?”.

Now, to work. First we shall show how Additivity allows us to write the value of any game as a function of the value of the elementary games $(1,0)$ and $(1,0)$. Additivity immediately implies that
$V(a,b) = V(a,0) + V(0,b),$and that for any positive integer $n$
$V(na,0) = nV(a,0).$Taking now $a=1/n$, the previous equation gives us that $V(1,0) = nV(1/n,0),$ or that $V(1/n,0) = \frac1n V(1,0).$ Considering $m$ such games, we have now that $V(m/n,0) = \frac{m}{n} V(1,0)$ for any positive rational $m/n$. We can extend this to all rationals if we remember that by Constancy $V(0,0) = 0$ and that by Additivity
$V(0,0) = V(m/n,0) + V(-m/n,0).$Now one could extend this argument to all reals by taking some continuity assumption, but I don’t think it is interesting to do so. I’d rather assume that one can only have rational amounts of euros[4]. Anyway, now we have shown that for all rational $a$ and $b$ we have that
$V(a,b) = aV(1,0) + bV(0,1).$What is left to see is that the values of the elementary games $(1,0)$ and $(0,1)$ behave like probabilities. If we consider Constancy with $c=1$ we have that $V(1,0) + V(0,1) = 1,$ so these “probabilities” are normalised. If we now consider Dominance, we get that
$V(1,0) \ge V(0,0) = 0,$so the “probabilities” are positive. Is there anything left to show? Well, if you are a Bayesian, no. The probability of an event $E$ is defined as the price a rational agent would pay for a lottery ticket that gives then 1€ if $E$ happens and nothing otherwise. Bayesians have the obligation to show that these probabilities to obey the usual Kolmogorov axioms, but on the interpretational side there is nothing left to explain.

# On the morality of blackholing in conferences

Consider the entirely hypothetical situation where you are in a physics conference with really bad wifi. Either because the router has a hard limit in the amount of devices that can connect simultaneously, or the bandwidth is too small to handle everyone’s OwnClouds trying to sync, or it is a D-Link. The usual approach is just to be pissed off and and complain to the organizers, to no avail (while ignoring the talks and trying to reconnect like crazy). Here I’d like to describe a different approach, that if not morally commendable at least lead to more results: blackholing.

To blackhole, what you do is to create a hotspot with your phone with the same name, encryption type, and password as the conference wifi. You then disable the data connection of your phone, and turn on the hotspot. What happens is that the devices of the people close to you will automatically disconnect from the conference router and connect to your hotspot instead, since they will think that your hotspot is a repeater with a stronger signal. But since you disabled your data connection, they are connecting to a sterile hotspot, so you are creating a kind of wifi black hole. To the people far from your, however, this is a gift from the gods, as they keep connected to the conference router, and can use the bandwidth that was freed up by the poor souls that fell in your black hole.

The question is, is it moral to do this? Obviously the people who did fall in your black hole are not going to like it, but one thing to notice is that this technique is intrinsically altruistic, as you cannot use wifi either, since you are in the middle of the black hole (and as far as I know it is not possible to defend oneself against it). It is even more altruistic if you like to sit close to your friends, who will then sacrifice their wifi in favour of a more distant acquaintance. It does become immoral if you arrange with a friend to sit close to the conference router, and you blackhole some random people far from it with the specific intent of giving your friend wifi, without caring about the other people who will also get it.

But let’s consider that you don’t have such tribalistic morals, and consider everyone’s welfare equally. Then the question is whether the utility of $n$ people with bad wifi is smaller than the utility of $k$ people with no wifi and $n-k$ people with good wifi, that is, whether
$n\, U(\text{bad wifi}) \le k\,U(\text{no wifi}) + (n-k)\,U(\text{good wifi}).$Now, assuming that the utility is a function only of the bandwidth available, this simplifies to
$n\,U(B/n) \le k\,U(0) + (n-k)\,U(B/(n-k)),$where $B$ is the total bandwidth of the conference router. Therefore, to determine whether blackholing is moral or not we need to find out how people’s happiness scale as a function of the available bandwidth.

One immediately sees that if the happiness scales linearly with the bandwidth, it is indifferent whether to blackhole or not. But to make relevant moral judgements, we need to find out what the actual utility functions are. By asking people around, I empirically determined that
$u(x) = \frac{1}{1+\left(\frac{B_0}{x}\right)^2},$where $B_0$ is the critical bandwidth that allows people to do basic surfing. Substituting in the previous inequality, we see that blackholing is moral iff
$k \le \frac{n^2 – \left(\frac{B}{B_0}\right)^2}{n},$which is better understood if we rewrite $\frac{B}{B_0} = fn$, that is, as the fraction $f$ of people that can do basic surfing with the given bandwidth. We have then
$k \le (1-f^2)n,$which shows that if $f = 1$ it is never moral to blackhole, whereas if $f \approx 0$ it always is. In an hypothetical conference held in Paraty with $n=100$ and $\frac{B}{B_0} = 50$, it is moral to blackhole up to $k=75$ people.

# Is gravity quantum?

Last week two curious papers appeared on the arXiv, one by Marletto and Vedral, and the other by Bose et al., proposing to test whether the gravitational field must be quantized. I think they have a nice idea there, that is a bit obscured by all the details they put in the papers, so I hope the authors will forgive me for butchering their argument down to the barest of the bones.

The starting point is a worryingly common idea that maybe the reason why a quantum theory of gravity is so damn difficult to make is because gravity is not actually quantum. While concrete models of “non-quantum gravity” tend to be pathological or show spectacular disagreement with experiment, there is still a lingering hope that somehow a non-quantum theory of gravity will be made to work, or that at least a semi-classical model like QFT in a curved spacetime will be enough to explain all the experimental results we’ll ever get. Marletto and Bose’s answer? Kill it with fire.

Their idea is to put two massive particles (like neutrons) side-by-side in two Mach-Zender interferometers, in such a way that their gravitational interaction is non-negligible in only one of the combination of arms, and measure the resulting entanglement as proof of the quantumness of the interaction.

More precisely, the particles start in the state $\ket{L}\ket{L},$ which after the first beam splitter in each of the interferometers gets mapped to $\frac{\ket{L} + \ket{R}}{\sqrt2}\frac{\ket{L} + \ket{R}}{\sqrt2} = \frac12(\ket{LL} + \ket{LR} + \ket{RL} + \ket{RR}),$ which is where the magic happens: we can put these interferometers together in such a way that the right arm of the first interferometer is very close to the left arm of the second interferometer, and all the other arms are far away from each other. If the basic rules of quantum mechanics apply to gravitational interactions, this should give a phase shift corresponding to the gravitational potential energy to the $\ket{RL}$ member of the superposition, resulting in the state
$\frac12(\ket{LL} + \ket{LR} + e^{i\phi}\ket{RL} + \ket{RR}),$ which can even be made maximally entangled if we manage to make $\phi = \pi$. Bose promises that he can get us $\phi \approx 10^{-4}$, which would be a tiny but detectable amount of entanglement. If we now complete the interferometers with a second beam splitter, we can do complete tomography of this state, and in particular measure its entanglement.

Now I’m not sure about what “non-quantum gravity” can do, but if it can allow superpositions of masses to get entangled via gravitational interactions, the “non-quantum” part of its name is as appropriate as the “Democratic” in Democratic People’s Republic of Korea.

# How quantum teleportation actually works

EDIT: Philip Ball has updated his article on Nature News, correcting the most serious of its errors. While everyone makes mistakes, few actually admit to them, so I think this action is rather praiseworthy. Correspondingly, I’m removing criticism of that mistake in my post.

Recently I have read an excellent essay by Philip Ball on the measurement problem: clear, precise, non-technical, free of bullshit and mysticism. I was impressed: a journalist managed to dispel confusion about a theme that even physicists themselves are confused about. It might be worth checking out what this guy writes in the future.

I was not so impressed, however, when I saw his article about quantum teleportation, reporting on Jian-Wei Pan’s group amazing feat of teleporting a quantum state from a ground station to a satellite. While Philip was careful to note that nothing observable is going on faster than light, he still claims that something unobservable is going on faster than light, and that there is some kind of conspiracy by Nature to cover that up. This is not only absurd on its face, but also needs the discredited notion of wavefunction collapse to make sense, which Philip himself noted was replaced by decoherence as a model of how measurements happen. For these reasons, very few physicists still take this description of the teleportation protocol seriously. It would be nice if the media would report on the current understanding of the community instead of repeating misconceptions from the 90s.

But enough ranting. I think the best way to counter the spreading of misinformation about quantum mechanics is not to just criticize people who get it wrong, but instead to give the correct explanation about the phenomena. I’m going to explain it twice, first in a non-technical way in the hope of helping interested laypeople, and then in a technical way, for people who do know quantum mechanics. So, without further ado, here’s how quantum teleportation actually works (this is essentially Deutsch and Hayden‘s description):

Alice has a quantum bit, which she wants to transmit to Bob. Quantum bits are a bit like classical bits as they can be in the states 0 or 1 (and therefore used to store information like blogs or photos[5]), and entirely unlike classical bits as they can also be in a superposition of 0 and 1. Now if Alice had a classical bit, it would be trivial to transmit it to Bob: she would just use the internet. But the internet cannot handle superpositions between 0 and 1: if you tried to send a qubit via the internet you would lose this superposition information (the Dutch are working on this, though). To preserve this superposition information Alice would need an expensive direct optical fibre connection to Bob’s place, that we assume she doesn’t have.

What she do? She can try to measure this superposition information, record it in classical bits, and transmit those via the internet. But superposition information is incredibly finicky: if Alice has only one copy of the qubit, she cannot obtain it. She can only get a good approximation to it if she measures several copies of the qubit. Which she might not have, or even if she does, it will be only an approximation to her qubit, not the real deal.

So again, what can she do? That’s where quantum teleportation comes in. If Alice and Bob share a Bell state (a kind of entangled state), they can use it to transmit this fragile superposition information perfectly. Alice needs to do a special kind of measurement — called Bell basis measurement — in the qubit she wants to transmit together with her part of the Bell state. Now, this is where everyone’s brains melt and all the faster-than-light nonsense comes from. It appears that after Alice does her measurement the part of the Bell state that belongs to Bob instantaneously becomes the qubit Alice wanted to send, just with some error that depends on her measurement result. In order to correct the error, Bob then needs to know Alice’s measurement result, which he can only find out after a light signal has had time to propagate from her lab to his. So it is as if Nature did send the qubit faster than light, but cleverly concealed this fact with this error, just so that we wouldn’t see any violation of relativity. Come on. Trying to put ourselves back in the centre of the universe, are we?

Anyway, this narrative only makes sense if you believe in some thoroughly discredit interpretations of quantum mechanics[2]. If you haven’t kept your head buried in the sand in the last decades, you know that measurements work through decoherence: Alice’s measurement is not changing the state of Bob in any way. She is just entangling her qubit with the Bell state and herself and anything else that comes in the way. And this entanglement spreads just through normal interactions: photons going around, molecules colliding with each other. Everything very decent and proper, nothing faster than light.

Now, in this precious moment after she has done her measurement and before this cloud of decoherence has had time to spread to Bob’s place, we can compare the silly story told in the previous paragraph with reality. We can compute the information about Alice’s qubit that is available in Bob’s place, and see that it is precisely zero. Nature is not trying to conceal anything from us, it is just a physical fact that the real quantum state that describes Alice and Bob’s systems is a complicated entangled state that contains no information about Alice’s qubit in Bob’s end. But the cool thing about quantum teleportation is that if Bob knows the measurement result he is able to sculpt Alice’s qubit out of this complicated entangled state. But he doesn’t, because the measurement result cannot get to him faster than light.

Now, if we wait a couple of nanoseconds more, the cloud of decoherence hits Bob, and then we are actually in the situation where Bob’s part of the Bell state has become Alice’s qubit, modulo some easily correctable error. But now there is no mystery to it: the information got there via decoherence, no faster than light.

Now, for the technical version: Alice has a qubit $\ket{\Gamma} = \alpha\ket{0} + \beta\ket{1}$, which she wishes to transmit to Bob, but she does not have a good noiseless quantum transmission channel that she can use, just a classical one (aka the Internet). So what can they do? Luckily they have maximally entangled state $\ket{\phi^+} = \frac1{\sqrt2}(\ket{00}+\ket{11})$ saved from the time when they did have a good quantum channel, so they can just teleport $\ket{\Gamma}$.

To do that, note that initial state they have, written in the order Alice’s state, Alice’s part of $\ket{\phi^+}$, and Bob’s part of $\ket{\phi^+}$, is
$\ket{\Gamma}\ket{\phi^+} = \frac{1}{\sqrt2}( \alpha\ket{000}+\alpha\ket{011} + \beta\ket{100} + \beta{111}),$ and if we rewrite the first two subsystems in the Bell basis we obtain
$\ket{\Gamma}\ket{\phi^+} = \frac{1}{2}( \ket{\phi^+}\ket{\Gamma} + \ket{\phi^-}Z\ket{\Gamma} + \ket{\psi^+}X\ket{\Gamma} + \ket{\psi^-}XZ\ket{\Gamma}),$ so we see that conditioned on Alice’s state being a Bell state, Bob’s state is just a simple function of $\ket{\Gamma}$. Note that at this point nothing was done to the quantum system, so Bob’s state did not change in any way. If we calculate the reduce density matrix at his lab, we see that it is the maximally mixed state, which contains no information about $\ket{\Gamma}$ whatsoever.

Now, clearly we want Alice to measure her subsystems in the Bell basis to make progress. She does that, first applying an entangling operation to map the Bell states to the computational basis, and then she makes the measurement in the computational basis.[3] After the entangling operation, the state is
$\frac{1}{2}( \ket{00}\ket{\Gamma} + \ket{01}Z\ket{\Gamma} + \ket{10}X\ket{\Gamma} + \ket{11}XZ\ket{\Gamma}),$ and making a measurement in the computational basis — for now modelled in a coherent way — and storing the result in two extra qubits results in the state
$\frac{1}{2}( \ket{00}\ket{00}\ket{\Gamma} + \ket{01}\ket{01}Z\ket{\Gamma} + \ket{10}\ket{10}X\ket{\Gamma} + \ket{11}\ket{11}XZ\ket{\Gamma}).$ Now something was done to this state, but still there is no information at Bob’s: his reduced density matrix is still the maximally mixed state. Looking at this entangled state, though, we see that if Bob applies the operations $\mathbb{I}$, $X$, $Z$, or $ZX$ to his qubit conditioned on the measurement result he will extract $\ket{\Gamma}$ from it. So Alice simply sends the qubits with the measurement result to Bob, who uses it to get $\ket{\Gamma}$ in his side, the teleportation protocol is over, and Alice and Bob lived happily ever after. Nothing faster than light happened, and the information from Alice to Bob clearly travelled through the qubits with the measurement results. The interesting thing we saw was that by expending one $\ket{\phi^+}$ and by sending two classical bits we can transmit one quantum bit. Everything ok?

No, no, no, no, no!, you complain. What was this deal about modelling a measurement coherently? This makes no sense, measurements must by definition cause lots of decoherence! Indeed, we’re getting there. Now with decoherence, the state after the measurement in the computational basis is $\frac{1}{2}( \ket{E_{00}}\ket{00}\ket{00}\ket{\Gamma} + \ket{E_{01}}\ket{01}\ket{01}Z\ket{\Gamma} + \ket{E_{10}}\ket{10}\ket{10}X\ket{\Gamma} + \ket{E_{11}}\ket{11}\ket{11}XZ\ket{\Gamma}),$ where $\ket{E_{ij}}$ is the state of the environment, labelled according to the result of the measurement. You see that there is no collapse of the wavefunction[4]: in particular Bob’s state is in the same entangled superposition as before, and his reduced density matrix is still the maximally mixed state. Moreover, as any physical process, decoherence spreads at most as fast as the speed of light, so even after Alice has been engulfed by the decoherence and has obtained a definite measurement result, Bob will still for some time remain unaffected by it, with the state still being adequately described by the above superposition. Only after a relativity-respecting time interval he will become engulfed as well, coherence will be killed, and the state relative to him and Alice will be adequately described by (e.g.) $\ket{E_{10}}\ket{10}\ket{10}X\ket{\Gamma}.$ Now we are in the situation people usually describe: his qubit is in a definite state, and he merely does not know which is it. Alice then sends him the measurement result — 10 — via the Internet, from which he deduces that he needs to apply operation $X$ to recover $\ket{\Gamma}$, and now the teleportation protocol is truly over.

# Pure quantum operations

Everybody knows how to derive what are the most general operations one can apply to a quantum state. You just need to assume that a quantum operation

1. Is linear.
2. Maps quantum states to quantum states.
3. Still maps quantum states to quantum states when applied to a part of a quantum system.

And you can prove that such quantum operations are the well-known completely positive and trace preserving maps, which can be conveniently represented using the Kraus operators or the Choi-Jamiołkowski isomorphism.

But what if one does not want general quantum operations, but wants to single out pure quantum operations? Can one have such an axiomatic description, a derivation from intuitive[5] assumptions?

Well, the usual argument one sees in textbooks to show that the evolution of quantum states must be given by a unitary assumes that the evolution

1. Is linear.
2. Maps pure quantum states to pure quantum states.

From this, you get that a quantum state $\ket{\psi}$ is mapped to a quantum state $U\ket\psi$ for a linear operator $U$, and furthermore since by definition quantum states have 2-norm equal to 1, we need the inner product $\bra\psi U^\dagger U \ket\psi$ to be 1 for all $\ket\psi$, which implies that $U$ must be a unitary matrix.

The only problem with this argument is that it is false, as the map
$\mathcal E(\rho) = \ket\psi\bra\psi \operatorname{tr} \rho,$which simply discards the input $\rho$ and prepares the fixed state $\ket\psi$ instead is linear, maps pure states to pure states, and is not unitary. The textbooks are fine, as they usually go through this argument before density matrices are introduced, and either implicitly or explicitly state that the evolution takes state vectors to state vectors. But this is not good enough for us, as this restriction to state vectors is both unjustified, and does not satisfy our requirement of being an “intuitive assumption”.

Luckily, the fix is easy: we just need to add the analogue of the third assumption used in the derivation of general quantum operations. If we assume that a pure quantum operation

1. Is linear.
2. Maps pure quantum states to pure quantum states.
3. Still maps pure quantum states to pure quantum states when applied to a part of a quantum system.

then we can prove that pure quantum operations are just unitaries[2]. Since the proof is simple, I’m going to show it in full.

Let $\mathcal F$ be the pure quantum operation we are interested in. If we apply it to the second subsystem of a maximally entangled state, $\ket{\phi^+} = \frac1{\sqrt d}\sum_{i=1}^d \ket{ii}$, by assumption 3 the result will be a pure state, which we call $\ket{\varphi}$. In symbols, we have
$\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+}) = \ket{\varphi}\bra{\varphi},$where $\mathcal I$ represents doing nothing to the first subsystem. Now the beautiful thing about the maximally entangled state is that if $\mathcal F$ is a linear map then $\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})$ contains all the information about $\mathcal F$. In fact, if we know $\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})$ we can know how $\mathcal F$ acts on any matrix $\rho$ via the identity
$\mathcal F (\rho) = \operatorname{tr}_\text{in} [(\rho^T \otimes \mathbb I) \mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})].$
This is the famous Choi-Jamiołkowski isomorphism[3]. Now let’s use the fact that the result $\ket{\varphi}\bra{\varphi}$ is a pure state. If we write it down in the computational basis
$\ket\varphi = \sum_{i,j=1}^d \varphi_{ij} \ket{i j},$we see that if we define a matrix $\Phi$ with elements $\Phi_{ij} = \varphi_{ji} \sqrt d$ then $\ket\varphi = \mathbb I \otimes \Phi \ket{\phi^+}$[4], so
$\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+}) = (\mathbb I \otimes \Phi) \ket{\phi^+}\bra{\phi^+} (\mathbb I \otimes \Phi^\dagger).$
Using the identity above we have that
$\mathcal F(\rho) = \Phi \rho \Phi^\dagger,$and since $\operatorname{tr}(\mathcal F(\rho)) = 1$ for every $\rho$ we have that $\Phi^\dagger\Phi = \mathbb I$, so $\Phi$ is an isometry. If in addition we demand that $\mathcal F(\rho)$ has the same dimension as $\rho$, then $\Phi$ must be a square matrix, and therefore has a right inverse which is equal to its left inverse, so $\Phi$ is a unitary.

This result is so amazing, so difficult, and so ground-breaking that the referees allowed me to include it as a footnote in my most recent paper without bothering to ask for a proof or a reference. But joking aside, I’d be curious to know if somebody already wrote this down, as a quick search through the textbooks revealed me nothing.

But how about Wigner’s theorem, I hear you screaming. Well, Wigner was not concerned with deriving what were the quantum operations, but what were the symmetry transformations one could apply to quantum states. Because of this he did not assume linearity, which was not relevant to him (and in fact would make his theorem wrong, as one can have perfectly good anti-linear symmetries, such as time reversal). Also, he assumed that symmetry transformations preserve inner products, which is too technical for my purposes.

# What is the probability of an infinite sequence of coin tosses?

It’s 0, except on the trivial cases where it is 1.

But clearly this is the wrong way to formulate the question, as there are interesting things to be said about the probabilities of infinite sequences of coin tosses. The situation is analogous to uniformly sampling real numbers from the $[0,1]$ interval: the probability of obtaining any specific number is just 0. The solution, however, is simple: we ask instead what is the probability of obtaining a real number in a given subinterval. The analogous solution works for the case of coin tosses: instead of asking the probability of a single infinite sequence, one can ask the probability of obtaining an infinite sequence that starts with a given finite sequence.

To be more concrete, let’s say that the probability of obtaining Heads in a single coin toss is $p$, and for brevity let’s denote the outcome Heads by 1 and Tails by 0. Then the probability of obtaining the sequence 010 is $p(1-p)^2$, which is the same as the probability of obtaining the sequence 0100 or the sequence 0101, which is the same as the probability of obtaining a sequence in the set {01000, 01001, 01010, 01011}, which is the same as the probability of obtaining an infinite sequence that starts with 010.

There is nothing better to do with infinite sequences of zeroes and ones than mapping them into a real number in the interval $[0,1]$, so we shall do that. The set of infinite sequences that start with 010 are then very conveniently represented by the interval $[0.010,0.010\bar1]$, also known as $[0.010,0.011]$ for those who do not like infinite strings of ones, or $[0.25,0.375]$ for those who do not like binary. Saying then that the probability of obtaining a sequence in $[0.010,0.010\bar{1}]$ is $p(1-p)^2$ is assigning a measure to this interval, which we write as
$\rho([0.010,0.010\bar{1}]) = p(1-p)^2$
Now if we can assign a sensible probability to every interval contained in $[0,1]$ we can actually extend it into a proper probability measure over the set of infinite sequences of coin tosses using standard measure-theoretical arguments. For me this is the right answer to the question posed on the title of this post.

So, how do we go about assigning a sensible probability to every interval contained in $[0,1]$? Well, the argument of the previous paragraph can clearly be extended to any interval of the form $[k/2^n, (k+1)/2^n]$. We just need write $k$ in the binary basis, padded with zeroes on the left until it reaches $n$ binary digits, and count the number of 0s and 1s. In symbols:
$\rho\left(\left[\frac{k}{2^n}, \frac{k+1}{2^n}\right]\right) = p^{n_1(k,n)}(1-p)^{n_0(k,n)}$
The extension to any interval where the extremities are binary fractions is straightforward. We just break them down into intervals where the numerators differ by one and apply the previous rule. In symbols:
$\rho\left(\left[\frac{k}{2^n}, \frac{l+1}{2^n}\right]\right) = \sum_{i=k}^{l} p^{n_1(i,n)}(1-p)^{n_0(i,n)}$
We are essentially done, since we can approximate any real number as well as we want we want by using binary fractions [5]. But life is more than just binary fractions, so I’ll show explicitly how to deal with the interval
$[0,1/3] = [0,0.\bar{01}]$

The key thing is to choose a nice sequence of binary fractions $a_n$ that converges to $1/3$. It is convenient to use a monotonically increasing sequence, because then we don’t need to worry about minus signs. If furthermore the sequence starts with $0$, then $[0,1/3] = \bigcup_{n\in \mathbb N} [a_n,a_{n+1}]$ and
$\rho([0,1/3]) = \sum_{n\in \mathbb N} \rho([a_n,a_{n+1}])$ An easy sequence that does the job is $(0,0.01,0.0101,0.010101,\ldots)$. It lets us write the interval as
$[0,1/3] = [0.00, 0.00\bar{1}] \cup [0.0100, 0.0100\bar{1}] \cup [0.010100, 0.010100\bar{1}] \cup …$ which gives us a simple interpretation of $\rho([0,1/3])$: it is the probability of obtaining a sequence of outcomes starting with 00, or 0100, or 010100, etc. The formula for the measure of $[a_n,a_{n+1}]$ is also particularly simple:
$\rho([a_n,a_{n+1}]) = p^{n-1}(1-p)^{n+1}$ so the measure of the whole interval is just a geometric series:
$\rho([0,1/3]) = (1-p)^2\sum_{n\in\mathbb N} \big(p(1-p)\big)^{n-1} = \frac{(1-p)^2}{1-p(1-p)}$

It might feel like something is missing because we haven’t examined irrational numbers. Well, not really, because the technique used to do $1/3$ clearly applies to them, as we only need a binary expansion of the desired irrational. But still, this is not quite satisfactory, because the irrationals that we know and love like $1/e$ or $\frac{2+\sqrt2}4$ have a rather complicated and as far as I know patternless binary expansion, so we will not be able to get any nice formula for them. On the other hand, one can construct some silly irrationals like the binary Liouville constant
$\ell = \sum_{n\in\mathbb N} 2^{-n!} \approx 0.110001000000000000000001$whose binary expansion is indeed very simple: every $n!$th binary digit is a one, and the rest are zeroes. The measure of the $[0,\ell]$ interval is then
$\rho([0,\ell]) = \sum_{n\in \mathbb N} \left(\frac{p}{1-p}\right)^{n-1} (1-p)^{n!}$Which I have no idea how to sum (except for the case $p=1/2$ ;)

But I feel that something different is still missing. We have constructed a probability measure over the set of coin tosses, but what I’m used to think of as “the probability” for uncountable sets is the probability density, and likewise I’m used to visualize a probability measure by making a plot of its density. Maybe one can “derive” the measure $\rho$ to obtain a probability density over the set of coin tosses? After all, the density is a simple derivative for well-behaved measures, or the Radon-Nikodym derivative for more naughty ones. As it turns out, $\rho$ is too nasty for that. The only condition that a probability measure needs to satisfy in order to have a probability density is that it needs to attribute measure zero to every set of Lebesgue measure zero, and $\rho$ fails this condition. To show that, we shall construct a set $E$ such that its Lebesgue measure $\lambda(E)$ is zero, but $\rho(E)=1$.

Let $E_n$ be the set of infinite sequences that start with a $n$-bit sequence that contains at most $k$ ones[2]. Then
$\rho(E_n) = \sum_{i=0}^k \binom ni p^i (1-p)^{n-i}$ and
$\lambda(E_n) = 2^{-n} \sum_{i=0}^k \binom ni$ These formulas might look nasty if you haven’t fiddled with entropies for some time, but they actually have rather convenient bounds, which are valid for $p < k/n < 1/2$: $\rho(E_n) \ge 1 - 2^{-n D\left( \frac kn || p\right)}$ and $\lambda(E_n) \le 2^{-n D\left( \frac kn || \frac 12\right)}$ where $D(p||q)$ is the relative entropy of $p$ with respect to $q$. They show that if $k/n$ is smaller than $1/2$ then $\lambda(E_n)$ is rather small (loosely speaking, the number of sequences whose fraction of ones is strictly less than $1/2$ is rather small), and that if $k/n$ is larger than $p$ then $\rho(E_n)$ is rather close to one (so again loosely speaking, what this measure does is weight the counting of sequences towards $p$ instead of $1/2$: if $k/n$ were smaller than $p$ then $\rho(E_n)$ would also be rather small).

If we now fix $k/n$ in this sweet range (e.g. by setting $k = \lfloor n(p + 0.5)/2\rfloor$)[3] then
$E = \bigcap_{i \in \mathbb N} \bigcup_{n \ge i} E_n,$
is the set we want, some weird kind of limit of the $E_n$. Then I claim, skipping the boring proof, that
$\rho(E) = 1$and
$\lambda(E) = 0$

But don’t panic. Even without a probability density, we can still visualize a probability measure by plotting its cumulative distribution function
$f(x) = \rho([0,x])$which for $p = 1/4$ is this cloud-like fractal:

# Crackpots in my inbox

Often people ask me why I’m not more open-minded about ideas that defy the scientific consensus. Maybe global warming is just a conspiracy? Maybe Bell’s theorem is in fact wrong? Maybe the EmDrive does provide thrust without using propellant? Maybe the E-Cat can make cold fusion? I mean, it is not logically impossible for some outsider to be correct while the entire scientific community is wrong. Wasn’t Galileo burned at the stake (sic) for defying the scientific consensus? Why should I then dismiss this nonsense outright, without reading it through and considering it carefully?

Well, for starters the scientific method has advanced a lot since the time of Galileo. Instead of asserting dogma we are busy looking at every tiny way experiment can deviate from theory. And if you do prove the theory wrong, you do not get burned at the stake (sic), but get a Nobel Prize (like the prize gave for the discovery of neutrino oscillations in 2015). So I’m naturally very suspicious of outsiders claiming to have found glaring mistakes in the theory.

But the real problem is the sheer amount of would-be Galileos incessantly spamming researchers about their revolutionary theories (despite not being exactly famous, I get to join the fun because they usually write to every academic email address they find online. I can only wonder how Stephen Hawking’s inbox looks like). It is already a lot of work to keep me up-to-date with the serious papers in my field. Imagine if I also had to read every email that proved Einstein wrong?

Without further ado, I’d like to illustrate this point by showing here the most entertaining crackpots that have spammed me:

Probably the most well-known is Gabor Fekete, who has a truly amazing website to expound his theories (don’t forget to press Ctrl or click with the right button of the mouse while you’re there!). Apparently he doesn’t like the square root in the Lorentz factor, and has a nice animation showing it being erased. If you do that I guess you’ll be able to explain all of physics with eight digits accuracy. He has recently taken to spoofing his emails to make it look like they were sent by Nobel laureates, probably thinking that his theories would be accepted if they came from a famous source. While the forgery itself was well-made (one needs to look carefully at the source code of the email to detect it), the content of the email kind of gives it away. Maybe if he had spend his time studying physics instead of the SMTP protocol…

Another persistent spammer is Sorin Cosofret, who started a newsletter about his theories to unwilling subscribers. They are about classical electromagnetism, relativity, quantum mechanics, planetary dynamics, cosmology, chemistry… apparently everything is wrong, but he knows how to correct it. He also has a website, that if not as flashy as Gabor Fekete’s, is at least available in Romenian, English, French, German, and Spanish.

A more aggressive one is stefan:sattler, who has a problem with the known laws of planetary mechanics, and wants the scientific community to help in publicising his “Sattler’s Law of planetary mechanics”. After sending 5 emails in one month he lost his patience, and gave us 48 hours to do it, threatening to publish all our names and email addresses if we don’t (you know, the name and email addresses that are publicly available). He told us

Go now and REPENT – go now and try to offer redemption for the guilt and responsibility you all have loaded upon your shoulders.

Time is ticking – you have 48 hours – the JUDGEMENTS ARE BEING WRITTEN RIGHT NOW…..

I haven’t heard from him since.

More recently, I got an email from an anonymous crackpot who maintains a prolific YouTube channel in Croatian dedicated to showing that the Earth is flat. It was entertaining to see that the crackpot sent me emails to both my University of Vienna address and to my University of Cologne address, each signed as a different person pretending to be interested in whether the videos were correct.

If you want to defy the scientific consensus, first study it for a few years. Then publish a peer-reviewed paper (Reputable journals do accept some pretty outlandish stuff). Then I’ll listen to you.

# My shortest research program ever

$t=0:00$

SIMPLICIO: These quantum gravity people! Always claiming that the world is fundamentally discrete! It’s so stupid!
INGENUO: Humm why is it stupid? They do have good reasons to think that.
SIMPLICIO: But come on, even the most discrete thing ever, the qubit, already needs continuous parameters to be described!
INGENUO: Well, yes, but it’s not as if you can take these parameters seriously. You can’t really access them with arbitrary precision.
SIMPLICIO: What do you mean? They are continuous! I can make any superposition between $\ket{0}$ and $\ket{1}$ that I want, there are no holes in the Bloch sphere, or some magical hand that will stop me from producing the state $\sin(1)\ket{0} + \cos(1)\ket{1}$ as precisely as I want.
INGENUO: Yeah, but even if you could do it, what’s the operational meaning of $\sin(1)\ket{0} + \cos(1)\ket{1}$? It’s not as if you can actually measure the coefficients back. The problem is that if you estimate the coefficients by sampling $n$ copies of this state the number of bits you get goes like $\frac12\,\log(n)$. And this is just hopeless. Even if you have some really bright source that produces $10^6$ photons per second and you do some black magic to keep it perfectly stable for a week, you only get something like 20 bits. So operationally speaking you might as well write
$0.11010111011010101010\ket{0} + 0.10001010010100010100\ket{1}$
SIMPLICIO: Pff, operationally. Operationally it also makes no difference whether the remains of Galileo are still inside Jupiter or not. It doesn’t mean I’m going to assume they magically disappeared. Same thing about the 21st bit. It’s there, even if you can’t measure it.
INGENUO: I would take lessons from operational arguments more seriously. You know, Einstein came up with relativity by taking seriously the idea that time is what a clock measures.
SIMPLICIO: ¬¬. So you are seriously arguing that there might be only 20 bits in a qubit.
INGENUO: Yep.
SIMPLICIO: Come on. Talk is cheap. If you want to defend that you need to come up with a toy theory that is not immediately in contradiction with experiment where the state of a qubit is literally encoded in a finite number of bits.
INGENUO: Hmmm. I need to piss about it. (Goes to the bathroom)
$t = 0:10$
INGENUO: Ok, so if we have $b$ bits we can encode $2^b$ different states. And as long as $b$ is large enough and these states are more-or-less uniformly spread around the Bloch sphere we should be able to model any experiment as well as we want. So we only need to find some family of polyhedrons with $2^b$ vertices that tend to a sphere in the limit of infinite $b$ and we have the qubit part of the theory!
SIMPLICIO: Hey, not so fast! How about the transformations that you can do on these states? Surely you cannot allow unitaries that would map one of these $2^b$ states to some state not encoded in your scheme.
INGENUO: Ok…
SIMPLICIO: So you have some set of allowed transformations that is not the set of all unitaries. And this set of allowed transformations clearly must satisfy some basic properties, like you can compose them and you do not get outside of the set, and it must always be possible to invert any of the transformations.
INGENUO: Yeah, sure. But what are you getting at?
SIMPLICIO: Well, they must form a group. A subgroup of $U(2)$, to be more precise. And since we don’t care about the global phase, make it a subgroup of $SU(2)$, for simplicity.
INGENUO: Oh. Well, we just need to check which are the subgroups of $SU(2)$, surely we’ll find something that works. (Both start reading Wikipedia.)
$t=0.20$
SIMPLICIO: Humm, so it turns out that the finite subgroups of $SO(3)$ are rather lame. You either have the platonic solids, which are too finite, or two subgroups that can get arbitrarily large, the cyclic and the dihedral groups.
INGENUO: Argh. What are these things?
SIMPLICIO: The cyclic group is just the rotations of the sphere by some rational angle around a fixed axis, and the dihedral group is just the cyclic group together with a reflection along the same axis. So you can put your states either in the vertices of a polygon inscribed in the equator of the Bloch sphere, or in the vertices of a prism.
INGENUO: Ugh. They are not nearly as uniform as I hoped. So I guess the best one can do is put the states in the vertices of an icosahedron.
SIMPLICIO: Beautiful. So instead of 20 bits you can have 20 states. Almost there!
$t=0:21$

# The sleeping beauty problem: a foray into experimental metaphysics

One of the most intriguing consequences of Bell’s theorem is the idea that one can do experimental metaphysics: to take some eminently metaphysical concepts such as determinism, causality, and free will, and extract from them actual experimental predictions, which can be tested in the laboratory. The results of said tests can then be debated forever without ever deciding the original metaphysical question.

It was with such ideas in mind that I learned about the Sleeping Beauty problem, so I immediately thought: why not simply do an experimental test to solve the problem?

The setup is as follows: you are the Sleeping Beauty, and today is Sunday. I’m going to flip a coin, and hide the result from you. If the coin fell on heads, I’m going to give you a sleeping pill that will make you sleep until Monday, and terminate the experiment after you wake up. If it falls on tails instead, I’m going also to give you the pill that makes you sleep until Monday, but after your awakening I’m going to give you a second pill that erases your memory and makes you sleep until Tuesday. At each awakening I’m going to ask you: what is the probability[4] that the coin fell on tails?

There are two positions usually defended by philosophers:

1. $p(T) = 1/2$. This is defended by Lewis and Bostrom, roughly because before going to sleep the probability was assumed to be one half (i.e. that the coin is fair), and by waking up you do not learn anything you didn’t know before, so the probability should not change.
2. $p(T) = 2/3$. This is defended by Elga and Bostrom, roughly because the three possible awakenings (heads on Monday, tails on Monday, and tails on Tuesday) are indistinguishable from your point of you, so you should assign all of them the same probability. Since two of them have the coin fallen on tails, the probability of tails must be two-thirds.

Well, seems like the perfect question to answer experimentally, no? Give drugs to people, and ask them to bet on the coins being heads or tails. See who wins more money, and we’ll know who is right! There are, however, two problems with this experiment. The first is that it is not so easy to erase people’s memories. Hitting them hard on the head or giving them enough alcohol usually does the trick, but it doesn’t work reliably, and I don’t know where I could find volunteers that thought the experiment was worth the side effects (brain clots or a massive hangover). And, frankly, even if I did find volunteers (maybe overenthusiastic philosophy students?), these methods are just too grisly for my taste.

Luckily a colleague of mine (Marie-Christine) found an easy solution: just demand people to place their bets in advance. Since they are not supposed to be able to know in which of the three awakenings they are, it makes no sense for them to bet differently in different awakenings (in fact, they should even be be unable to bet differently on different awakenings without access to a random number generator. If they have one in their brains is another question). So if you decide to bet on heads, and then “awakes” on Tuesday, too bad, you have to do the bad bet anyway.

With that solved, we get to the second problem: it is not rational to ever bet on heads. If you believe that the probability is $1/2$ you should be indifferent between heads and tails, and if you believe that the probability is $2/3$ you should definitely bet on tails. In fact, if you believe that the probability is $1/2$ but have even the slightest doubt that your reasoning is correct, you should bet on tails anyway just to be on the safe side.

This problem can be easily solved, simply by biasing the coin a bit towards heads, such that the probability of heads (if you believed in $1/2$) is now slightly above one half, while keeping the probability of tails (if you believed in $2/3$) still above one half. To calculate the exact numbers we use a neat little formula from Sebens and Carroll, which says that the probability of you being the observer labelled by $i$ within a set of observers with identical subjective experiences is
$p(i) = \frac{w_i}{\sum_j w_j},$
where $w_i$ is the Born-rule weight of your situation, and the $w_j$ are the Born-rule weights of all observers in the subjectively-indistinguishable situation.

Let’s say that the coin has a (objective, quantum, given by the Born rule) probability $p$ of falling on heads. The probability of being one of the tail observers is then simply the sum of the Born-rule weight of the Monday tail observer (which is simply $1-p$) with the Born-rule weight of the Tuesday tail observer (also $1-p$), divided by the sum of the Born-rule weights of all three observers ($1-p$, $1-p$, and $p$), so
$p(T) = \frac{2(1-p)}{2(1-p) + p}.$
For elegance, let’s make this probability be equal to the objective probability of the coin falling on heads, so that both sides of the philosophical dispute will bet on their preferred solution with the same odds. Solving $p = (2 – 2p)/(2-p)$ gives us then
$p = 2-\sqrt{2} \approx 0.58,$
which makes the problem quantum, and thus on topic for this blog, since it features the magical $\sqrt2$.[2]

With all this in hand, time to do the experiment. I gathered 17 impatient hungry physicists in a room, and after explaining them all of this, I asked them to bet on either heads or tails. The deal was that the bet was a commitment to buy, in each awakening, a ticket that would pay them 1€ in case they were right. Since the betting odds were set to be $0.58$, the price for each ticket was 0.58€.

After each physicist committed to a bet, I ran my biased quantum random number generator (actually just the function rand from Octave with the correct weighting), and cashed the bets (once when the result was heads, twice when the result was tails).

There were four possible situations: if the person betted on tails and the result was tails, they paid me 1.16€ for the tickets and got 2€ back, netting 0.84€ (this happened 4 times). If the person betted on heads and the result was tails, they paid me 1.16€ again, but got nothing back, netting -1.16€ (this happened 2 times). If the person betted on tails and the result was heads, they paid me 0.58€ for the ticket and got nothing back, netting -0.58€ (this happened 4 times). Finally, if the person betted on heads and the result was heads, they paid 0.58€ for the ticket and got 1€ back, netting 0.42€ (this happened once).

So on average the people who betted on tails profited 0.13€, while the people who betted on heads lost 0.61€. The prediction of the $2/3$ theory was that they should profit nothing when betting on tails, and lose 0.16€ when betting on heads. The prediction of the $1/2$ theory was the converse: who bets on tails loses 0.16€, while who beats on heads breaks even. In the end the match was not that good, but still the data clearly favours the $2/3$ theory. Once again, physics comes to the rescue of philosophy, solving experimentally a long-standing metaphysical problem!

Speaking more seriously, of course the philosophers knew, since the first paper on the subject, that the experimental results would be like this, and that is why nobody bothered to do the experiment. They just thought that this was not a decisive argument, as the results are determined by how you operationalise the Sleeping Beauty problem, and the question was always about what is the correct operationalisation (or, on other words, what probability is supposed to be). Me, I think that whatever probability is, it should be something with a clear operational meaning. And since I don’t know any natural operationalisation that will give the $1/2$ answer, I’m happy with the $2/3$ theory.

# Understanding Bell’s theorem part 3: the Many-Worlds version

This post is based on discussions with Harvey Brown, Eric Cavalcanti, and Nathan Walk. At least one of them peacefully disagrees with everything written here.

After going through two versions of Bell’s theorem, one might hope to be done with it. Well, this was the situation in 1975, and judging by the huge amount of literature produced since then about Bell’s theorem, I think it is clear that the scientific community is far from being done with it. Why is that so? One reason is that many people really don’t want to give up any of the assumptions behind the simple version of Bell’s theorem: they are used to classical mechanics, which offers them a world with determinism and no action at a distance, and they want to keep it that way. But if you ask more specifically the quantum community, they do not lose any sleep over the simple version: they are happy to give up determinism and keep no action at a distance. Instead, the real thorn in their side is the failure of local causality. It is after all a well-motivated locality assumption that, even if it is not demanded by relativity, it seems to be a plausible extrapolation from it. Furthermore, the failure of local causality is not even a brute experimental fact that people must just accept and be done with it. To see that your probabilities have changed as a result of a measurement done in a space-like separated region you need to know the result of said measurement. And then it is not space-like separated anymore, it has moved to your past light cone.

But this is is just an abstract complaint about the theorem, that doesn’t suggest any obvious solution. A more concrete problem, which is much easier to address, is that both the simple and the nonlocal versions blissfully ignore the Many-Worlds interpretation. Even if you don’t find this interpretation compelling, it is taken seriously by a big part of the scientific community, and I don’t think it is defensible to simply ignore it when discussing the foundations of quantum mechanics.

So how do we reformulate Bell’s theorem to take the Many-Worlds interpretation into account? In this point the literature is rather disappointing, as nobody seems to have tried to do that. The papers I know either exclude the Many-Worlds interpretation via an explicit assumption, or simply note that Bell’s theorem does not apply to it, as the derivation implicitly assume that measurements have a single outcome. This is true, but rather unsatisfactory. Should we conclude then that Bell’s theorem is just a mistake? And how about local causality, is it violated or not? And how about quantum key distribution, does it work at all, or do we need to change cryptosystems if we believe that Many-Worlds is true?

Let us start by examining local causality, or more precisely one of the equations we used in the derivation:
$p(a|bxy\lambda) = p(a|x\lambda)$
this says that the probability of Alice obtaining outcome $a$ depends only on her setting $x$ and the physical state $\lambda$, and not on Bob’s setting $y$ or his outcome $b$. We immediately have a problem: what can “Bob’s outcome $b$” possibly mean in the Many-Worlds interpretation? After all if Alice and Bob share an entangled state $\frac{\ket{00}+\ket{11}}{\sqrt2}$, then before Bob’s measurement their joint state is
$\ket{\text{Alice}}\frac{\ket{00}+\ket{11}}{\sqrt2}\ket{\text{Bob}}$
which, after his measurement, becomes
$\ket{\text{Alice}}\frac{\ket{00\text{Bob}_0}+\ket{11\text{Bob}_1}}{\sqrt2}$
So there is no such thing as “Bob’s outcome”. There are two copies of Bob, each seeing a different outcome. Maybe we can then use $b=\frac{\ket{00\text{Bob}_0}+\ket{11\text{Bob}_1}}{\sqrt2}$ in that equation, instead of $b=0$ or $b=1$. Does it work then? Well, there is still the problem that the equation is about the “probability of Alice obtaining outcome $a$”. But we know that there is also no such thing as “Alice’s outcome”: there will be two copies of Alice, each seeing a different outcome. So from a third-person perspective it makes no sense to talk about the “probability of Alice obtaining outcome $a$”. On the other hand, from Alice’s perspective she will experience a single outcome (if you experience more than one outcome, I want to know what are you smoking), so we can talk about probabilities in a first-person, decision-theoretic way. The equation is then about how much Alice should bet on experiencing outcome $a$, or more precisely the maximum she should pay for a ticket that gives her 1€ if the outcome she experiences is $a$.

So, how much should she? Well, the right hand side $p(a|x\lambda)$ is easy to decide: she only knows that she is making a measurement on a half of an entangled state, whose reduced density matrix is $\mathbb{1}/2$. Her probabilities are $1/2$, independently of the basis in which she measures. How about the left hand side of the equation, $p(a|bxy\lambda)$? Well, now she knows in addition that Bob is in the state $\frac{\ket{00\text{Bob}_0}+\ket{11\text{Bob}_1}}{\sqrt2}$ (whe are assuming for simplicity that he measured in the Z basis). So what? How does that help her in predicting which outcome she will experience? This state has no bias towards 0 or 1, and there is no more information, outside her future light cone, that could help her make the prediction. This is no surprise, as in the Many-Worlds interpretation whatever Bob does is assumed to be a unitary, and unitaries applied to one half of a entangled state cannot affect the probabilities of measurements on the other half. For Bob’s measurement to affect Alice in any way it would have to cause a collapse of the wave function, and this is precisely what the Many-Worlds interpretation says that does not happen. We must therefore conclude that $p(a|bxy\lambda) = 1/2$ and that this bastardised version of local causality is respected.

Does this imply that Bell inequalities are not violated in the Many-Worlds interpretation? Of course not! To derive them we needed the version of local causality where Bob had a single outcome. Can we still use it in some way? Well, Bob does obtain a single outcome from Alice’s point of view after they interact in the future (and become decohered with respect to eachother), so then (and only then) we can talk about the joint probabilities $p(ab|xy)$. As eloquently put by Brown and Timpson:

We can only think of the correlations between measurement outcomes on the two sides of the experiment actually obtaining in the overlap of the future light-cones of the measurement events—they do not obtain before then and—a fortiori—they do not obtain instantaneously.

But at this point in time the assumption of local causality becomes ill-motivated: Bob’s measurement is now in Alice’s past light-cone, and it is perfectly legit for her probabilities to depend on it. The information from it had, after all, to slugishly crawl the intervening space in order to influence her.

So the nonlocal version of Bell’s theorem simply falls apart in the Many-Worlds interpretation. Can we still derive some version of Bell’s theorem from well-motivated assumptions, or do we need to give up and say that it simply doesn’t make sense? Well, I wouldn’t be writing this post if I didn’t have a solution.

To do it, we start by formalising the version of local causality presented above. It says that Alice’s probability of experiencing outcome $a$ depend only on stuff in her past light-cone $\Lambda,$ and not on anything else in the entire region $\Gamma$ outside her future light-cone.

• Generalised local causality:  $p(a|\Gamma) = p(a|\Lambda)$.

Note that we had to condition on the entire region $\Gamma$ instead of only on Bob’s lab because the state $\frac{\ket{00\text{Bob}_0}+\ket{11\text{Bob}_1}}{\sqrt2}$ is defined in the former region, not on the latter.

I think it is fair to call this generalised local causality because it reduces to local causality if one assumes that Bob’s measurement had a single outcome, via some sort of wavefunction collapse. Note also that in the Many-Worlds interpreation generalised local causality is essentially the same thing as no action at a distance. This is because Many-Worlds is a deterministic theory (not in the sense that the outcome of the measurement is predictable, but in the sense that the post-measurement state is uniquely determined by the pre-measurement state), and therefore conditioning on the post-measurement state doesn’t bring us any additional information. This is not really a surprise, since local causality also reduces to no action at a distance for deterministic theories.

This brings us to the second assumption needed to derive the Many-Worlds version of Bell’s theorem. Since we have now some sort of no action at a distance, one might expect some sort of determinism to do the job and complete the derivation. This is indeed the case, but the terminology here becomes unfortunately confusing, because as explained above Many-Worlds is a deterministic theory, but not in the sense demanded by determinism. The assumption we need is predictability, i.e., that an observer with access to the physical state $\lambda$ can predict the measurement outcomes[3]. As wittily put by Howard Wiseman, determinism means that “God does not play dice”, and predictability means that “God does not let us play dice”. Putting in a more boring way, we simply write

• Predictability:  $p(ab|xy\lambda) \in \{0,1\}$.

Using then predictability together with generalised local causality we can again prove Bell’s theorem, following the same steps we did for the simple version. The interesting thing is that while generalised local causality is always true, there are some situations where predictability holds and some where it doesn’t, and a violation of a Bell inequality implies that it does not hold.

I think it is instructive to consider some concrete examples to see how this works. The simplest case is where Alice and Bob share a pure product state and the Eve knows it. For example their joint state could be
$\ket{\text{Alice}}\ket{00}\ket{\text{Bob}}\ket{\text{Eve}^{00}}$
In this case it is clear that Eve can predict the result of their measurements (in the computational basis) and that therefore they cannot violate any Bell inequality. A slightly less simple case is where they all start in this same state, but Alice and Bob do a measurement in the superposition basis. Now Eve can not predict the result of the measurement, but Alice and Bob still cannot violate a Bell inequality. This is ok, because predictability is a sufficient condition for a Bell inequality to hold, not a necessary one.

A more interesting case is where Alice and Bob share a maximally entangled state and Eve again knows it:
$\ket{\text{Alice}}\frac{\ket{00}+\ket{11}}{\sqrt2}\ket{\text{Bob}}\ket{\text{Eve}^{\phi^+}}$
Eve’s knowledge doesn’t help her predict the outcome of the measurement, because there is no outcome of the measurement to be predicted. Both outcomes will happen, and eventually decoherence will make two copies of Eve, one in the 00 branch and another in the 11 branch. Both ignorant of which branch they are in. In this case Alice and Bob will violate a Bell inequality, and correctly conclude that Eve couldn’t have possibly predicted their outcomes.

The most interesting case is where Alice and Bob share the mixed state $\frac12\ket{00}\bra{00} + \frac12\ket{11}\bra{11}$ and Eve holds its purification. Their joint state is
$\ket{\text{Alice}}\frac{\ket{00}\ket{\text{Eve}^{0}}+\ket{11}\ket{\text{Eve}^{1}}}{\sqrt2}\ket{\text{Bob}}$
and it is clear that both copies of Eve, $\ket{\text{Eve}^{0}}$ and $\ket{\text{Eve}^{1}}$ can predict the result of Alice and Bob’s measurements, and that they cannot violate any Bell inequality. Note that this state represents the case where Eve does her measurement before Alice and Bob. She could also make it after them, it makes no difference.

This concludes the Many-Worlds version of Bell’s theorem, and my series of posts about it. I hope that they helped clear some of the misunderstandings about it, and that even if you disagree with my conclusions, you would agree that I’m asking the right questions. I’d like finish with a quotation from the man himself:

The “many world interpretation” seems to me an extravagant, and above all an extravagantly vague, hypothesis. I could almost dismiss it as silly. And yet… It may have something distinctive to say in connection with the “Einstein-Podolsky-Rosen puzzle,” and it would be worthwile, I think, to formulate some precise version of it to see if this is really so.