## A frenquentist’s dream

I’m frequently told that probabilities are the limit of relative frequencies for an infinite number of repetitions. It sounds nice: it defines a difficult concept – probabilities – in terms of a simple one – frequencies – and even gives us a way to measure probabilities, if we fudge the “infinite” part a bit. The problem with this definition? It is not true.

First of all, this limit does not exist. If one makes an infinite sequence of zeroes and ones by throwing a fair coin (fudging away this pesky infinity again), calling the result of the $i$th throw $s_i$, the relative frequency after $n$ throws is
$f_n = \frac1n\sum_{i=1}^{n}s_i.$ What should then $\lim_{n\to\infty}f_n$ be? $1/2$? Why? All sequences of zeros and ones are equally possible – they are even equally probable! What is wrong with choosing the sequence $s = (0,0,0,\ldots)$? Or even the sequence $(0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,\ldots)$, whose frequencies do not converge to any number, but eternally oscillate between $0$ and $1$? If for some reason one chooses a nice1 sequence like $s=(0,1,0,1,0,1,\ldots)$, for which the limit does converge to $1/2$, what is wrong with reordering it to obtain $s’ = (s_1,s_3,s_2,s_5,s_7,s_4,\ldots)$ instead, with limit $1/3$?

No, no, no, you complain. It is true that all sequences are equiprobable, but most of them have limiting frequency $1/2$. Moreover, it is a theorem that the frequencies converge – it is the law of large numbers! How can you argue against a theorem?

Well, what do you mean by “most”? This is already a probabilistic concept! And according to which measure? It cannot be a fixed measure, otherwise it would say that the limiting frequency is always $1/2$, independently of the single-throw probability $p$. On the other hand, if one allows it to depend on $p$, one can indeed define a measure on the set of infinite sequences such that “most” sequences have limiting frequency $p$. A probability measure. So you’re not explaining the single-throw probability in terms of the limiting frequencies, but rather in terms of the probabilities of the limiting frequencies. Which is kind of a problem, if “probability” is what you wanted to explain in the first place. The same problem happens with the law of large numbers. Its statement is that
$\forall \epsilon >0 \quad \lim_{n\to\infty}\text{Pr}(|f_n -p|\ge \epsilon) = 0,$ so it only says that the probability of observing a frequency different than $p$ goes to $0$ as the number of trial goes to infinity.

But enough with mocking frequentism. Much more eloquent dismissals have already been written, several times over, and as the Brazilian saying goes, one shouldn’t kick a dead dog. Rather, I want to imagine a world where frequentism is true.

What would it take? Well, the most important thing is to make the frequencies converge to the probability in the infinite limit. One also needs, though, the frequencies to be a good approximation to the probability even for a finite number of trials, otherwise empiricism goes out of the window. My idea, then, is to allow the frequencies to fluctuate within some error bars, but never beyond. One could, for example, take the $5\sigma$ standard for scientific discoveries that particle physics use, and declare it to be a fundamental law of Nature: it is only possible to observe a frequency $f_n$ if
$f_n \in \left(p-5\frac{\sigma}{\sqrt{n}},p+5\frac{\sigma}{\sqrt{n}}\right).$ Trivially, then, for large $\lim_{n\to\infty}f_n = p$, and even better, if we want to measure some probability within error $\epsilon$, we only need $n > \sigma^2/\epsilon^2$ trials, so for example 2500 throws are enough to tomograph any coin within error $10^{-2}$.

In this world, the gambler’s fallacy is not a fallacy, but a law of Nature. If one starts throwing a fair coin and observes 24 heads in row, it is literally impossible to observe another heads in the next throw. It’s as if there is a purpose pushing the frequencies towards the mean. It captures well our intuition about randomness. It is also completely insane: 25 heads are impossible only in the start of a sequence. If before them one had obtained 24 tails, 25 heads are perfectly fine. Also, it’s not as if 25 heads are impossible because their probability is too low. The probability of 24 heads, one tails, and another heads is even lower.

Even worse, if the probability you’re trying to tomograph is the one of obtaining 24 heads followed by one tail, then the frequency $f_1$ must be inside the interval $[0,2^{-25}+\sqrt{2^{-25}(1-2^{-25})}]\approx [0,2^{-12.5}],$which is only possible if $f_1 = 0$. That is, it is impossible to observe tails after observing 24 heads, as it would make $f_1=1$, but it is also impossible to observe heads. So in this world Nature would need to keep track not only of all the coin throws, but also which statistics you are calculating about them, and also find a way to keep you from observing contradictions, presumably by not allowing any coin to be thrown at all.

Posted in Uncategorised | 3 Comments

## Mixed states and true randomness

Recently two nice papers appeared on the arXiv, the most recent by Galley and Masanes, and the oldest by López Grande et al.. They are both – although a bit indirectly – about the age old question of the equivalence between proper and improper mixtures.

A proper mixture is when you prepare the states $\ket{0}$ and $\ket{1}$ with probability $p$ and $1-p$, obtaining the density matrix
$\rho_\text{proper} = p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1}.$ An improper mixture is when you prepare the entangled state $\sqrt{p}\ket{0}\ket{0} + \sqrt{1-p}\ket{1}\ket{1}$ and discard the second subsystem, obtaining the density matrix $\rho_\text{improper} = p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1}.$ The question is then why do these different preparation procedures give rise to the same statistics (and therefore it is legitimate to represent them with the same density matrix).

Well, do they? I’m not so sure about that! The procedure to prepare the proper mixture is rather vague, so we can’t really answer whether is it appropriate to represent it via the density matrix $\rho_\text{proper}$. To remove the vagueness, I asked an experimentalist how she prepared the state $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$ that was necessary for an experiment. “Easy”, she told me, “I prepared $n$ copies of $\ket{0}$, $n$ copies of $\ket{1}$, and then combined the statistics.

This sounds like preparing the state $\ket{0}^{\otimes n} \otimes \ket{1}^{\otimes n}$, not like preparing $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$. Do they give the same statistics? Well, if I measure all states in the $Z$ basis, exactly $\frac12$ of the results will be $0$. But if I measure $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$ in the $Z$ basis $2n$ times, the probability that $\frac12$ of the results are $0$ is
$\frac{1}{2^{2n}} {2n \choose n} \approx \frac{1}{\sqrt{n\pi}},$ so just by looking at this statistic I can guess with high probability which was the preparation. It is even easier to do that if I disregard her instructions and look at the order of the results: getting $n$ zeroes followed by $n$ ones is a dead giveaway.

Maybe one should prepare these states using a random number generator instead? If one uses the function rand() from MATLAB to decide whether to prepare $\ket{0}$ or $\ket{1}$ at each round one can easily pass the two randomness tests I mentioned above. Maybe it can even pass all common randomness tests available in the literature, I don’t know how good rand() is. But it cannot, however pass all randomness tests, as rand() is a deterministic algorithm using a finite seed, and is therefore restricted to outputting computable sequences of bits. One can, in fact, attack it, and this is the core of the paper of López Grande et al., showing how one can distinguish a sequence of bits that came from rand() from a truly random one. More generally, even the best pseudorandom number generators we have are designed to be indistinguishable from truly random sources only by polynomial-time tests, and fail against exponential-time algorithms.

Clearly pseudorandomness is not enough to generate proper mixtures; how about true randomness instead? Just use a quantum random number generator to prepare bits with probabilities $p$ and $1-p$, and use these bits to prepare $\ket{0}$ or $\ket{1}$. Indeed, this is what people do when they are serious about preparing mixed states, and the statistics really are indistinguishable from those of improper mixtures. But why? To answer that, we need to model the quantum random number generator physically. We start by preparing a “quantum coin” in the state
$\sqrt{p}\ket{H}+\sqrt{1-p}\ket{T},$ which we should measure in the $\{\ket{H},\ket{V}\}$ basis to generate the random bits. Going to the Church of the Larger Hilbert Space, we model the measurement as
$\sqrt{p}\ket{H}\ket{M_H}+\sqrt{1-p}\ket{T}\ket{M_T},$ and conditioned on the measurement we prepare $\ket{0}$ or $\ket{1}$, obtaining the state
$\sqrt{p}\ket{H}\ket{M_H}\ket{0}+\sqrt{1-p}\ket{T}\ket{M_T}\ket{1}.$ We then discard the quantum coin and the measurement result, obtaining finally
$p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1},$ which is just the desired state, but now it is an improper mixture. So, at least in the Many-Worlds interpretation, there is no mystery about why proper and improper mixtures are equivalent: they are physically the same thing!

(A closely related question, which has a closely related answer, is why is it equivalent to prepare the states $\ket{0}$ or $\ket{1}$ with probability $\frac12$ each, or the states $\ket{+}$ or $\ket{-}$, again with probability $\frac12$? The equivalence fails for pseudorandomness, as shown by López Grande et al.; if we use true randomness instead, we are preparing the states
$\frac1{\sqrt{2}}(\ket{H}\ket{0}+\ket{T}\ket{1})\quad\text{or}\quad\frac1{\sqrt{2}}(\ket{H}\ket{+}+\ket{T}\ket{-})$ and discarding the coin. But note that if one applies a Hadamard to the coin of the first state one obtains the second, so the difference between then is just a unitary on a system that is discarded anyway; no wonder we can’t tell the difference! More generally, any two purifications of the same density matrix must be related by a unitary on the purifying system.)

Galley and Masanes want to invert the question, and ask for which quantum-like theories proper and improper mixtures are equivalent. To be able to tackle this question, we need to define what improper mixtures even are in a quantum-like theory. They proceed by analogy with quantum mechanics: if one has a bipartite state $\ket{\psi}$, and are doing measurements $E_i$ only on the first system, the probabilities one obtains are given by
$p(i) = \operatorname{tr}( (E_i \otimes \mathbb I) \ket{\psi}\bra{\psi} ),$ and the improper mixture is defined as the operator $\rho_\text{improper}$ for which
$p(i) = \operatorname{tr}( E_i \rho_\text{improper})$ for all measurements $E_i$.

In their case, they are considering a quantum-like theory that is still based on quantum states, but whose probabilities are not given by the Born rule $p(i) = \operatorname{tr}(E_i \ket{\phi}\bra{\phi})$, but by some more general function $p(i) = F_i (\ket{\phi})$. One can then define the probabilities obtained by local measurements on a bipartite state as
$p(i) = F_i \star \mathbb I (\ket{\psi}),$ for some composition rule $\star$ and trivial measurement $\mathbb I$, and from that an improper mixture as the operator $\omega_\text{improper}$ such that
$p(i) = F_i (\omega_\text{improper})$ for all measurements $F_i$.

Defining proper mixtures, on the other hand, is easy: if one can prepare the states $\ket{0}$ or $\ket{1}$ with probabilities $p$ and $1-p$, their proper mixture is the operator $\omega_\text{proper}$ such that for all measurements $F_i$
$p(i) = F_i(\omega_\text{proper}) = p F_i(\ket{0}) + (1-p) F_i(\ket{1}).$ That is, easy if one can generate true randomness that is not reducible to quantum-like randomness. I don’t think this makes sense, as one would have to consider a world where reductionism fails, or at least one where quantum-like mechanics is not the fundamental theory. Such non-reducible probabilities are uncritically assumed to exist anyway by people working on GPTs all the time2.

Now with both proper and improper mixtures properly defined, one can answer the question of whether they are equivalent: the answer is a surprising no, for any alternative probability rule that respects some basic consistency conditions. This has the intriguing consequence that if we were to modify the Born rule while keeping the rest of quantum mechanics intact, a wedge would be driven between the probabilities that come from the fundamental theory and some “external” probabilities coming from elsewhere. This would put the Many-Worlds interpretation under intolerable strain.

But such an abstract “no” result is not very interesting; I find it much more satisfactory to exhibit a concrete alternative to the Born rule where the equivalence fails. Galley and Masanes propose the function
$F_i(\ket{\psi}) = \operatorname{tr}(\hat F_i (\ket{\psi}\bra{\psi})^{\otimes 2})$ for some positive matrices $\hat F_i$ restricted by their consistency conditions. It is easy to see that the proper mixture of $\ket{0}$ and $\ket{1}$ described above is given by2
$\omega_\text{proper} = p \ket{00}\bra{00} + (1-p)\ket{11}\bra{11}.$ In quantum mechanics one would try to make it by discarding half of the state $\sqrt{p}\ket{0}\ket{0} + \sqrt{1-p}\ket{1}\ket{1}$. Here it doesn’t work, as nothing does, but I want to know what it gives us anyway. It is not easy to see that the improper mixture is given by the weirdo
\begin{multline} \omega_\text{improper} = (p^2 + \frac{p(1-p)}{3})\ket{00}\bra{00} + \\ \frac{2p(1-p)}{3} (\ket{01}+\ket{10})(\bra{01}+\bra{10}) + ((1-p)^2 + \frac{p(1-p)}{3})\ket{11}\bra{11}.\end{multline}

Posted in Uncategorised | 5 Comments

## What is cool about quantum teleportation?

This post should be accessible to anyone with a school education.

Everybody™ thinks that quantum teleportation is lame. Even science-loving xkcd thinks quantum teleportation is lame, and has a comic mocking it:

This breaks my heart. I want to actually defend quantum teleportation, and show that Randall Munroe is wrong. Quantum teleportation is not a “particle statistics thing”.3 It is not worse than classical teleportation. Quantum teleportation is cool.

Let us start with something that everyone agrees is cool: Start Trek teleportation2. Some Scottish engineer presses a button, you dematerialize in the spaceship, and instantly rematerialize on the surface of an unexplored planet. This is completely impossible, so we have to get rid of non-essential elements to make it possible.

First of all, the “instant” part of the rematerialization. Einstein doesn’t like it, so we replace it with a suitable light-speed delay. After all, I don’t really care if I get to Mars instantly, or after a 20 minutes delay. It is still much better than getting on a rocket and playing Breath of the Wild for months while being bombarded with space radiation. Second and last, the “unexplored planet” part. How the hell are you supposed to rematerialize yourself alone on an unexplored planet? You’ve been dematerialized, probably into photons3, there isn’t much you can do. Is the spaceship supposed to do the rebuilding? How? Remotely moving stuff around with atom precision? Like some kind of long-range optical tweezers? Much easier to just build a rematerialization station on Mars. Yes, this stops you from teleporting to places “Where no man has gone before”4, but let’s be honest, do you really want to teleport somewhere without first making sure that the locals don’t find you tasty?

So with these changes, a realistic version of Star Trek teleportation becomes: Some Scottish engineer presses a button, you dematerialize in the spaceship, and after a suitable light-speed delay rematerialize on the surface of an already-explored planet. Easy, right? Sounds like just a glorified 3D printer. The Scottish engineer measures you with exquisitely high precision, sends the data via email, and the 3D printer makes a copy at the destination. And the Scottish engineer shoots you.

Wait what? Why is the Scottish engineer shooting you? And with what right the 3D-printed copy is claiming to be you in Mars? And what if the Scottish engineer does not shoot you, and the 3D-printed copy grabs a rocket back to Earth while you’re still stuck in the spaceship? What if your spouse wants to have sex with the copy? Who gets custody of the kids? What if you want to have sex with your copy? These vexing questions are often asked, but seldom answered.

An unexpected easy-way-out appeared in 1982, when Wootters and Żurek proved the no-cloning theorem. This theorem does what it says on the tin: it shows that it is not possible to clone a quantum state. Problem is, we don’t know if the whole quantum state of a human being is needed to define their interesting properties, or some classical approximation is good enough. But there are people seriously speculating that the whole quantum state is in fact needed, so we’ll assume that this is the case so that we can go on with the story.5

What happens in this case? Well now human beings are unclonable, unique. You can still be Star Trek-teleported to Mars, but now instead of being measured and shot by the Scottish engineer, they transfer your quantum state to a bunch of photons using something called a SWAP gate, reducing your body to a shapeless heap of atoms in the process. The photons fly away to the rematerialization station, where a second SWAP gate transfers the quantum state to suitably prepared shapeless heap of atoms, which then becomes you. The crucial difference is that your original body is destroyed not because you are uncomfortable with the idea of having a copy, or because the Scottish engineer hates you, but because it is fundamentally impossible to do the teleportation otherwise. The process is more akin to moving than to copying, and captures the original Star Trek intuition of transporting people without any unwholesome killing business.

But this is not yet quantum teleportation! But why should we bother with it, what is missing? Well, the problem is that quantum states are rather fragile. Send your quantum state through the Martian atmosphere, and you’ll see what happens. Well, no, you won’t see, because you’ll be dead. Your quantum state will most likely decohere away, and at the rematerialization station only a classical approximation will arrive which, as we assumed above, is not really you.

One can try to encode the quantum state in a way that it is resistant against atmospheric turbulence (and people are working on it), but quantum teleportation offers a more elegant solution: we just need to supply some entangled pairs of particles to the dematerialization and rematerialization stations, and you can be teleported via email, just like before! Good old email, that can be copied, stored, and resent as often as we can. But with the added bonus that once you are dematerialized, you can only be rematerialized once, and this can only happen at the rematerialization station in Mars.

The catch, of course, is how you supply the entangled pairs to start with. Nature allows no cheating: it costs exactly one entangled pair to teleport an entangled pair. Well, one can just try sending them through the atmosphere. Since they don’t encode any information, it is not a problem if we lose half of them. If the atmosphere is so bad that almost none of them gets through, them one can just do it the hard way: save them in a quantum memory, and send them via rocket.

So, is quantum teleportation cool?

Posted in Uncategorised | 1 Comment

## Wallace’s version of the Deutsch-Wallace theorem, part 2

To conclude the proof of Wallace’s version of the Deutsch-Wallace theorem, we shall add the Equivalence theorem from the previous post to a pretty weak decision theory, and show that if you are rational and live in a universe described by the Many-Worlds interpretation, you must bet according to the Born rule.

The first rationality assumption we need is pretty minimal: we only demand Amir to have preferences between games that are in a precise sense coherent. They must be transitive, in the sense that if he would rather vote for Strache than to vote for Kern, and would prefer to vote for Kern over Kurz, than he must choose to vote for Strache over Kurz. He must also have definite preferences about any pair of games: either he thinks that Strache is better than Kurz, or that Kurz is better than Strache, or he is indifferent between them. He is not allowed to say that they are not comparable. Note that we are not judging whether his preferences are politically coherent, or whether voting for Strache is at all a good idea. The axiom is then:

• Ordering: Amir’s preferences between games, written as $G \succeq G’$, define a total order in the set of games: if $G \succeq G’$, and $G’ \succeq G”$, then $G \succeq G”$. Moreover, for any two games $G$ and $G’$, either $G \succ G’$, or $G \sim G’$, or $G \prec G’$.

This means that the $\succeq$ behaves like the usual $\ge$ relation between real numbers6.

The second and last rationality assumption we shall use is rather stronger, but I think still pretty well-justified. We demand that Amir’s preferences between games must remain consistent while he plays: if he prefers game $G$ to game $G’$, he cannot change his mind if $G$ and $G’$ are offered as rewards inside another game:

• Consistency: Let $\alpha \neq 0$, and consider the games $\ket{F} = \alpha\ket{M_0}\ket{G} + \beta\ket{M_1}\ket{z}$ and $\ket{F’} = \alpha\ket{M_0}\ket{G’} + \beta\ket{M_1}\ket{z},$ that differ only on the game given as a reward when the measurement result is $M_0$. Then $F \succeq F’$ iff $G \succeq G’$.

It is easy to check that Consistency actually implies all three relations $F \succ F’$ iff $G \succ G’,$ $F \sim F’$ iff $G \sim G’,$ and $F \prec F’$ iff $G \prec G’$.

These assumptions, together with Indifference and Substitution, are enough to imply the

• Born rule: Suppose you are rational, and consider the games
$\ket{G} = \sum_i \alpha_i\ket{M_i}\ket{z_i}\quad\text{and}\quad\ket{G’} = \sum_i \beta_i\ket{D_i}\ket{w_i}.$ Then there exists a function $u$ such that
$u(G) = \sum_i |\alpha_i|^2 u(z_i)$ and $G \succ G’ \iff u(G) > u(G’)$ Moreover, $u$ is unique modulo the choice of a zero and a unity.

This theorem says that you are free to decide your preferences between the rewards: these will define their utility. Your freedom ends here, however: the probabilities that you assign to obtaining said rewards must be given by the Born rule, on pain of irrationality.

A comment is also in order about the uniqueness: the choice of a zero and a unity is analogous to the one that must be done in a temperature scale. In the Celsius scale, for example, the zero is chosen as the freezing point of the water, and the unity as $1/100$ the difference between the freezing point and the boiling point. In the Fahrenheit scale, the zero is chosen as the coldest temperature in Gdańsk’s winter, and the unity as $1/96$ the difference between the temperature of Gdańsk’s winter and the temperature of the blood of a healthy male. In any case, the choice of these two values define the temperature scale uniquely, and the same is true for utility, as implied by the following theorem:

• Uniqueness: If $u$ is a utility, then $\mathcal F(u)$ is a utility if and only if $\mathcal F(u) = au+b$ for some real numbers $a,b$ such that $a>0$.

The proof of the ‘if’ direction is easy: just note that $\mathcal F(u(G)) = a\sum_i |\alpha_i|^2 u(z_i) + b = au(G)+b,$ and that such positive affine transformations preserve the ordering of real numbers. The proof of the ‘only if’ direction is not particularly hard, but it is a bit longer and I shall skip it2. Since the choice of a value for the utility at two rewards $x$ and $y$ is enough to fix $a$ and $b$, the claim follows.

But enough foreplay, now we need to start proving the Born rule theorem in earnest. We’ll build it out of two lemmas: Slider, that says that the weights of a game with rewards $x$ and $y$ behave like a tuner for the preferences, and Closure, that says that as we move this slider we are bound to hit any reward between $x$ and $y$.

• Slider: Let $x$ and $y$ be rewards such that $x \succ y$, and consider the games
$\ket{G} = \sqrt{p}\ket{M_0}\ket{x} + \sqrt{1-p}\ket{M_1}\ket{y}$ and
$\ket{G’} = \sqrt{q}\ket{M_0}\ket{x} + \sqrt{1-q}\ket{M_1}\ket{y}.$ Then $G \succ G’$ iff $p > q$.

Proof: suppose $p > q$. Then we can define the games
$\ket{F} = \sqrt{q}\ket{M_0}\ket{x} + \sqrt{p-q}\ket{M_1}\ket{x} + \sqrt{1-p}\ket{M_2}\ket{y}$ and
$\ket{F’} = \sqrt{q}\ket{M_0}\ket{x} + \sqrt{p-q}\ket{M_1}\ket{y} + \sqrt{1-p}\ket{M_2}\ket{y}.$
Note that the weights of rewards $x$ and $y$ in the game $F$ are $p$ and $1-p$, and in the game $F’$ they are $q$ and $1-q$, so by Equivalence we have that $F \sim G$ and $F’ \sim G’$. Since Consistency implies that $F \succ F’$, transitivity gives us $G \succ G’$. To prove the other direction, note that $p = q$ implies directly that $G \sim G’$, and $p < q$ implies $G \prec G'$ by the flipped argument.

• Closure: Let $x,y$, and $z$ be rewards such that $x \succ y$ and $x \succeq z \succeq y$, and let
$\ket{G_p} = \sqrt{p}\ket{M_0}\ket{x} + \sqrt{1-p}\ket{M_1}\ket{y}.$
Then there exists a unique $p_z$ such that$z \sim \ket{G_{p_z}}.$

Proof: since $\succeq$ is a total order, for all $\rho$ it must be the case that either $z\succ \ket{G_p}, \quad z \sim \ket{G_p},\quad\text{or}\quad z \prec \ket{G_p}.$Moreover, Slider tells us that there exists a critical $p_z$ such that
\begin{align*}
p < p_z \quad &\Rightarrow \quad \ket{G_p} \prec z \end{align*} Then some continuity argument will conclude that $z \sim \ket{G_{p_z}}$.

Now for the main proof: Let $x$ and $y$ be fixed rewards such that $x \succ y$. Set $u(x)$ and $u(y)$ to be any real numbers such that $u(x) > u(y)$, defining the unity and the zero of the utility function3. Now because of Closure for every reward $z$ such that $x \succeq z \succeq y$ there will be a unique number $p_z$ such that
$z \sim \sqrt{p_z}\ket{M_0}\ket{x} + \sqrt{1-p_z}\ket{M_1}\ket{y}.$ Define then
$u(z) = p_z u(x) + (1-p_z) u(y).$ We want to show that the utilities so defined do represent the preferences between any two rewards $z$ and $w$ in the sense that $z \succ w$ iff $u(z) > u(w)$. Suppose that $u(z) > u(w)$. This is the case iff $p_z > p_w$, which by Slider is equivalent to $\sqrt{p_z}\ket{M_0}\ket{x} +\sqrt{1-p_z}\ket{M_1}\ket{y} \succ \sqrt{p_w}\ket{M_0}\ket{x} + \sqrt{1-p_w}\ket{M_1}\ket{y},$ which is equivalent to $z \succ w$.
Now we want to show that for any game $\ket{G} = \sqrt{q}\ket{M_0}\ket{z} + \sqrt{1-q}\ket{M_1}\ket{w}$ its utility is given by $u(G) = q u(z) + (1-q) u(w),$ as advertised. By Consistency, we can replace $z$ and $w$ in $G$ by their equivalent games, and we have that
\begin{multline}
\ket{G} \sim \sqrt{q p_z}\ket{M_0}\ket{M_0}\ket{x} + \sqrt{q(1-p_z)}\ket{M_0}\ket{M_1}\ket{y} + \\ \sqrt{(1-q)p_w}\ket{M_1}\ket{M_0}\ket{x} +\sqrt{(1-q)(1-p_w)}\ket{M_1}\ket{M_1}\ket{y}. \end{multline} By Equivalence,
$\ket{G} \sim \sqrt{\lambda p_z + (1-q)p_w}\ket{M_0}\ket{x} + \sqrt{q(1-p_z)+(1-q)(1-p_w)}\ket{M_1}\ket{y},$
and since $x \succeq G \succeq y$, its utility is given by the above formula, so
\begin{align*}
u(G) &= (q p_z + (1-q)p_w)u(x) + (q(1-p_z)+(1-q)(1-p_w))u(y)\\
&= q u(z) + (1-q) u(w),
\end{align*} as we wanted to show.

With this argument we have proved the Born rule theorem for any game inside the interval $[y,x] = \{z: x\succeq z \succeq y\}$. This would be enough if we were to assume that the set of rewards was something so lame, but since we want to be deal with more interesting reward sets – like $\mathbb R$ – we cannot stop now. It is fortunately not hard to complete the proof: consider a sequence of intervals $[y_i,x_i]$ such that all of them contain $[y,x]$ and that their union equals the set of rewards. By the above proof, in each such interval there exists a utility function $f_i$ that satisfies the requirements. We want to show that these functions agree with each other, and as such define a unique utility over the whole set of rewards. For that, consider a reward $z$ in $[x_i,y_i]\cap [x_j,y_j]$ for some $i,j$. Then it must be the case that either $x\succeq z \succeq y$, or $x\succ y \succ z$, or $z \succ x \succ y$. By Closure, there exists unique $p_z,p_y$, and $p_x$ such that
\begin{align*}
z &\sim \sqrt{p_z}\ket{M_0}\ket{x} + \sqrt{1-p_z}\ket{M_1}\ket{y}, \\
y &\sim \sqrt{p_y}\ket{M_0}\ket{x} + \sqrt{1-p_y}\ket{M_1}\ket{z}, \\
x &\sim \sqrt{p_x}\ket{M_0}\ket{z} + \sqrt{1-p_x}\ket{M_1}\ket{y}. \\
\end{align*}Since $f_i$ and $f_j$ are utilities over this interval, we must have that for $k=i,j$
\begin{align*}
f_k(z) &= p_zf_k(x) + (1-p_z)f_k(y), \\
f_k(y) &= p_yf_k(x) + (1-p_y)f_k(z), \\
f_k(x) &= p_xf_k(z) + (1-p_x)f_k(y). \\
\end{align*}Now, we use our freedom to set the zero and the unity of the utilities to choose $f_k(y) = u(y)$ and $f_k(x) = u(x)$, taking these equations to
\begin{align*}
f_k(z) &= p_zu(x) + (1-p_z)u(y), \\
u(y) &= p_yu(x) + (1-p_y)f_k(z), \\
u(x) &= p_xf_k(z) + (1-p_x)u(y), \\
\end{align*}which uniquely define $f_k(z)$ in all three situations, implying that $f_i(z)=f_j(z)$. Setting $u(z)$ to be this common value, we have defined a unique utility function over the whole set of rewards, and we’re done.

Posted in Uncategorised | 8 Comments

## Wallace’s version of the Deutsch-Wallace theorem, part 1

One might still be worried about Deutsch’s Additivity. What if it is actually necessary prove the Born rule? In this case one wouldn’t be able to use the Born rule in the Many-Worlds interpretation without committing oneself to stupid decisions, such as giving away all your money to take part in St. Petersburg’s lottery. Should one give up on the Many-Worlds interpretation then? Or start betting against the Born rule? If these thoughts are keeping you awake at night, then you need Wallace’s version of the Deutsch-Wallace theorem, that replaces Deutsch’s simplistic decision theory with a proper one that allows for bounded utilities.

Wallace’s insight was to realise that the principles of Indifference and Substitution do all the real work in Deutsch’s argument: they are already enough to imply the mod-squared amplitude part of Born’s rule. The connection of those mod-squared amplitudes with probabilities then follow from the other, decision-theoretical principles, but those are incidental, and can be replaced wholesale with a proper decision theory.

More precisely, Wallace used Indifference and Substitution to prove4 a theorem called Equivalence, which states that Amir must be indifferent between games that assign equal Born-rule weights to the same rewards.

It was not at all obvious to me why this should be a strong result. After all, if I say that the games $\ket{G} = \alpha\ket{M_0}\ket{r_0}+\beta\ket{M_1}\ket{r_1}$ and
$\ket{G’} = \gamma\ket{D_0}\ket{r_0}+\delta\ket{D_1}\ket{r_1}$ are equivalent if $|\alpha|^2=|\gamma|^2$ and $|\beta|^2=|\delta|^2$, it will also be true that they are equivalent if $|\alpha|=|\gamma|$ and $|\beta|=|\delta|$2, so we haven’t actually learned anything about the “square” part of the Born rule, we have only learned that the phases of the amplitudes are irrelevant. Or have we?

Actually, Equivalence shows its power only when we consider sums of mod-squared amplitudes. It says, for example, that the game (taken unnormalised for clarity) $\ket{G} = 2\ket{M_0}\ket{r_0}+\ket{M_1}\ket{r_1}$ is equivalent to the game
$\ket{G’} = \ket{M_0}\ket{r_0}+\ket{M_1}\ket{r_0}+\ket{M_2}\ket{r_0}+\ket{M_3}\ket{r_0}+\ket{M_4}\ket{r_1},$ as they both assign weight $4$ to reward $r_0$ and weight $1$ to reward $r_1$. Some alternative version of Equivalence that summed the modulus of the amplitudes instead, as it would be appropriate in classical probability theory, would claim that $G$ was actually equivalent to
$\ket{G’^\prime} = \ket{M_0}\ket{r_0}+\ket{M_1}\ket{r_0}+\ket{M_2}\ket{r_1},$ as they both would assign weight $2$ to reward $r_0$ and weight $1$ to reward $r_1$, a decidedly non-quantum result.

Having hopefully convinced you that Equivalence is actually worthwhile, let’s proceed to prove it. The proof is actually very similar to the one presented in the previous post, so if you think it is obvious how to adapt it you can safely skip to the next post, where we’ll do the decision-theory part of the proof. Below I’ll write down the proof of Equivalence anyway just for shits and giggles.

First let’s state it properly:

• Equivalence: Consider two games $\ket{G} = \sum_{ij}\alpha_{ij}\ket{M_i}\ket{r_j}\quad\text{and}\quad \ket{G’} = \sum_{ij}\beta_{ij}\ket{D_i}\ket{r_j}.$ If all rewards $r_j$ have the same Born-rule weight, that is, if $\forall j\quad \sum_i|\alpha_{ij}|^2 = \sum_i|\beta_{ij}|^2,$ then $G \sim G’$.

Note that unlike in the previous post we’re not stating that these games have the same value, but rather that Amir is indifferent between them, which we represent with the $\sim$ relation. We do this because we want to eventually prove that Amir’s preferences can be represented by such a value function, so it feels a bit inelegant to start with the assumption that it exists.

Now, let’s recall Indifference and Substitution from the previous post, slightly reworded to remove reference to the values of the games:

• Indifference: If two games $G$ and $G’$ differ only by the labels of the measurements, then $G \sim G’$.
• Substitution: Amir must be indifferent between the game $\ket{G} = \alpha\ket{M_0}\ket{r_0} + \beta\ket{M_1}\ket{r_1}$ and the composite game
\begin{align*}
\ket{G’} &= \alpha\ket{M_0}\ket{G’^\prime} + \beta\ket{M_1}\ket{r_1} \\
&= \alpha\gamma\ket{M_0}\ket{D_0}\ket{r_0} + \alpha\delta\ket{M_0}\ket{D_1}\ket{r_0} + \beta\ket{M_1}\ket{r_1},
\end{align*} where instead of receiving $r_0$ Amir plays the trivial game $\ket{G’^\prime} = \gamma\ket{D_0}\ket{r_0} + \delta\ket{D_1}\ket{r_0}.$

And to the proof. First we show that any complex phases are irrelevant. For that, consider the game
$\ket{G} = \alpha e^{i\phi}\ket{M_0}\ket{r_0}+\beta e^{i\varphi}\ket{M_1}\ket{r_1}.$ By Substitution, we can replace the rewards $r_0$ and $r_1$ with the degenerate games $e^{-i\phi}\ket{D_0}\ket{r_0}$ and $e^{-i\varphi}\ket{D_1}\ket{r_1}$, and Amir must be indifferent between $G$ and the game $\ket{G’} = \alpha \ket{M_0}\ket{D_0}\ket{r_0}+\beta\ket{M_1}\ket{D_1}\ket{r_1}.$Since $G’$ can be obtained from a third game $\ket{G’^\prime} = \alpha \ket{M_0}\ket{r_0}+\beta \ket{M_1}\ket{r_1}$ via Substitution, this accumulation of measurements does not matter either, and we have Amir must be indifferent to any phases.

This allows us to restrict our attention to positive amplitudes. It does not, however, allow us to restrict our attention to amplitudes which are square roots of rational numbers, but we shall do it anyway because the argument for all real numbers is boring. Consider then two games $\ket{G} = \sum_{ij}\sqrt{\frac{p_{ij}}{q_{ij}}}\ket{M^j_i}\ket{r_j}\quad\text{and}\quad \ket{G’} = \sum_{ij}\sqrt{\frac{a_{ij}}{b_{ij}}}\ket{D^j_i}\ket{r_j}$ for which
$\forall j\quad \sum_i\frac{p_{ij}}{q_{ij}} = \sum_i\frac{a_{ij}}{b_{ij}}.$ We shall show that $G \sim G’$. First focus on the reward $r_0$. We can rewrite the amplitudes of the measurement results that give $r_0$ so that they have the same denominator in both games by defining $d_0 = \prod_i q_{i0}b_{i0},$ and the integers $p_{i0}’ = d_0 p_{i0}/q_{i0}$ and $a_{i0}’ = d_0 a_{i0}/b_{i0}$, so that
$\frac{p’_{i0}}{d_{0}} = \frac{p_{i0}}{q_{i0}}\quad\text{and}\quad\frac{a’_{i0}}{d_{0}} = \frac{a_{i0}}{b_{i0}}.$The parts of the games associated to reward $r_0$ are then
$\frac1{\sqrt{d_0}}\sum_i \sqrt{p_{i0}’}\ket{M^0_i}\ket{r_0} \quad\text{and}\quad \frac1{\sqrt{d_0}}\sum_i \sqrt{a_{i0}’}\ket{D^0_i}\ket{r_0},$ and using again Substitution we replace the reward given for measurement results $M^0_i$ and $D^0_i$ with the trivial games $\frac1{\sqrt{p_{i0}’}}\sum_{k=1}^{p_{i0}’}\ket{P_k}\ket{r_0}\quad\text{and}\quad \frac1{\sqrt{a_{i0}’}}\sum_{k=1}^{a_{i0}’}\ket{P_k}\ket{r_0},$ obtaining
$\frac1{\sqrt{d_0}}\sum_i\sum_{k=1}^{p_{i0}’}\ket{M^0_i}\ket{P_k}\ket{r_0} \quad\text{and}\quad \frac1{\sqrt{d_0}}\sum_i \sum_{k=1}^{a_{i0}’}\ket{D^0_i}\ket{P_k}\ket{r_0},$ which are just uniform superpositions, with $\sum_ip_{i0}’$ terms on the left hand side and $\sum_ia_{i0}’$ on the right hand side. Judicious use of Indifference and Substitution can as before erase the differences in the piles of measurements, taking them to
$\frac1{\sqrt{d_0}}\sum_{l=1}^{\sum_ip_{i0}’}\ket{C_l}\ket{r_0} \quad\text{and}\quad \frac1{\sqrt{d_0}}\sum_{l=1}^{\sum_ia_{i0}’}\ket{C_l}\ket{r_0}.$ Now by assumption we have that $\sum_i\frac{p’_{i0}}{d_{0}} = \sum_i\frac{p_{i0}}{q_{i0}} = \sum_i\frac{a_{i0}}{b_{i0}} = \sum_i\frac{a’_{i0}}{d_{0}},$so the number of terms on both sides are the same, so the $r_0$ parts of the games are equivalent. Since this same argument can be repeated for all other $r_j$, Equivalence is proven.

## Deutsch’s version of the Deutsch-Wallace theorem

With the decision theory from the previous post already giving us probabilities, all that is left to do is add the Many-Worlds interpretation and show that these probabilities must actually be given by the Born rule. Sounds easy, no?

But we don’t actually need the whole Many-Worlds interpretation, just some stylized part of it that deals with simple measurement scenarios. We only need to say that when someone makes a measurement on (e.g) a qubit in the state $\alpha\ket{0} + \beta\ket{1},$ what happens is not a collapse into $\ket{0}$ or $\ket{1}$, but rather a unitary 3 evolution into the state $\alpha\ket{0}\ket{M_0}+\beta\ket{1}\ket{M_1},$ which represents a macroscopic superposition of the measurement device showing result $M_0$ when the qubit is in the state $\ket{0}$ with the device showing result $M_1$ when the qubit is in the state $\ket{1}$.

We want to use these measurements to play the decision-theoretical games we were talking about in the previous post. To do that, we just say that Amir will get a reward depending on the measurement result: reward $r_0$ if the result is $M_0$, and reward $r_1$ if the result is $M_1$. We can represent this simply by appending the reward into the correct branch of the above macroscopic superposition, taking it to $\alpha\ket{0}\ket{M_0}\ket{r_0}+\beta\ket{1}\ket{M_1}\ket{r_1}.$ Since this state has all the information we need to define the game – the amplitudes, the measurement results, and the rewards – we can use it as the representation of the game. So when we need to write down a game $G$, we shall do this by using the state2
$\ket{G} = \alpha\ket{M_0}\ket{r_0} + \beta\ket{M_1}\ket{r_1}.$ And this is pretty much all we need from quantum mechanics.

Now we need to state two further rationality axioms, and we can proceed to the proof. The first one is that Amir must not care about what do we call the measurement results: if $0$ and $1$, or $\uparrow$ and $\downarrow$, or $H$ and $V$, it doesn’t matter. If two games are the same thing but for the labels of the measurement results, Amir must value these games equally:

• Indifference: If two games $G$ and $G’$ differ only by the labels of the measurements, then $V(G) = V(G’)$.

The other axiom says that Amir must be indifferent between receiving reward $r_0$ or playing a game that gives reward $r_0$ independently of the measurement result, even when this reward was part of a previous game:

• Substitution: Amir must value equally the game $\ket{G} = \alpha\ket{M_0}\ket{r_0} + \beta\ket{M_1}\ket{r_1}$ and the composite game
\begin{align*}
\ket{G’} &= \alpha\ket{M_0}\ket{G’^\prime} + \beta\ket{M_1}\ket{r_1} \\
&= \alpha\gamma\ket{M_0}\ket{D_0}\ket{r_0} + \alpha\delta\ket{M_0}\ket{D_1}\ket{r_0} + \beta\ket{M_1}\ket{r_1},
\end{align*} where instead of receiving $r_0$ Amir plays the trivial game $\ket{G’^\prime} = \gamma\ket{D_0}\ket{r_0} + \delta\ket{D_1}\ket{r_0}.$

Now, to the proof. Consider the games
\begin{align*}
\ket{G} &= \frac1{\sqrt2}(\ket{M_0}\ket{r_0} + \ket{M_1}\ket{r_1}) \\
\ket{G’} &= \frac1{\sqrt2}(\ket{M_0}\ket{r_1} + \ket{M_1}\ket{r_0}) \\
\ket{G’^\prime} &= \frac1{\sqrt2}(\ket{M_0}\ket{r_0+r_1} + \ket{M_1}\ket{r_0+r_1}) \\
\end{align*}
By Additivity, from the previous post, we have that
$V(G’^\prime) = V(G) + V(G’),$ and from Constancy that $V(G’^\prime) = r_0 + r_1$, so we already know that $V(G) + V(G’) = r_0 + r_1$. But the games $G$ and $G’$ are just relabellings of each other, so by Indifference we must have $V(G) = V(G’)$, so we can conclude that $V(G) = \frac12(r_0+r_1)$ or, in other words, that quantum states with amplitude $1/\sqrt{2}$ click with probability $1/2$.

We can easily extend this argument to show that games involving a uniform superposition of $n$ states $\ket{G} = \frac{1}{\sqrt{n}}\sum_{i=0}^{n-1}\ket{M_i}\ket{r_i}$ must have value3 $V(G) = \frac1n\sum_{i=0}^{n-1}r_i.$ Now we need to deal with non-uniform superpositions. Consider the games
$\ket{G} = \sqrt{\frac{2}{3}}\ket{M_0}\ket{r_0} + \frac{1}{\sqrt3}\ket{M_1}\ket{r_1}$ and
$\ket{G’} = \frac1{\sqrt2}(\ket{D_0}\ket{r_0} + \ket{D_1}\ket{r_0}).$ By Constancy the value of $G’$ is $r_0$, and by Substitution the value of $G$ must be equal to the value of
\begin{align*}
\ket{G’^\prime} &= \sqrt{\frac{2}{3}}\ket{M_0}\ket{G’} + \frac{1}{\sqrt3}\ket{M_1}\ket{r_1} \\
&= \frac{1}{\sqrt3}\ket{M_0}\ket{D_0}\ket{r_0} + \frac{1}{\sqrt3}\ket{M_0}\ket{D_1}\ket{r_0} + \frac{1}{\sqrt3}\ket{M_1}\ket{r_1}.
\end{align*}
But $G’^\prime$ is just a uniform superposition, so from the previous argument we know that
$V(G’^\prime) = \frac13(r_0+r_0+r_1),$ and therefore that
$V(G) = \frac23r_0+\frac13r_1.$ Using analogous applications of Substitution we can show that for any positive integers $n$ and $m$ the value
$\ket{G} = \sqrt{\frac{n}{n+m}}\ket{M_0}\ket{r_0} + \sqrt{\frac{m}{n+m}}\ket{M_1}\ket{r_1}$ is
$V(G) = \frac{n}{n+m}r_0+\frac{m}{n+m}r_1,$ and we are pretty much done. To extend the argument to any positive real amplitudes one only needs a continuity assumption4, and to extend it to arbitrary complex amplitudes we can do a little trick: consider the game with a single outcome $\ket{G} = e^{i\phi}\ket{M_0}\ket{r_0}.$ By Constancy the value of this game is $r_0$, independently of the phase $e^{i\phi}$. Now consider the game
$\ket{G} = \sqrt{\frac{n}{n+m}}e^{i\phi}\ket{M_0}\ket{r_0} + \sqrt{\frac{m}{n+m}}e^{i\varphi}\ket{M_1}\ket{r_1}.$ By Substitution we can replace the rewards $\ket{r_0}$ and $\ket{r_1}$ with the single outcome games $e^{-i\phi}\ket{D_0}\ket{r_0}$ and $e^{-i\varphi}\ket{D_1}\ket{r_1}$ without changing its value, so the phases play no role in determining the value of the game.

To summarize, we have show that the value of a game $\ket{G} = \alpha\ket{M_0}\ket{r_0} + \beta\ket{M_1}\ket{r_1}$ must be given by $V(G) = |\alpha|^2r_0 + |\beta|^2r_1,$which is just the Born rule.

## Probability from decision theory

There exists a problem in the world, that is even more pressing than the position of cheese in the cheeseburger emoji: namely that nobody™ understands the Deutsch-Wallace theorem. I’ve talked to a lot of people about it, and the usual reaction I get is that they have heard of it, are vaguely interested in how can one prove the Born rule, but have no idea how Deutsch and Wallace actually did it.

It’s hard to blame them. The original paper by Deutsch is notoriously idiosyncratic: he even neglected to mention that one of his assumptions was the Many-Worlds interpretation5! Several people wrote papers trying to understand it: Barnum et al. mistakenly concluded that Deutsch was simply wrong, Gill made a valiant effort but gave up without a conclusion, and Wallace finally succeeded, clarifying Deutsch’s proof and putting it in context.

Wallace was not successful, however, in popularising the theorem. I think this is because his paper is a 27-page mess. It did not help, either, that Wallace quickly moved on to improving and formalising Deutsch’s theorem, providing an even more complicated proof from weaker assumptions, leaving the community with no easy entry point into this confusing literature.

To fill this hole, then, I’m writing two “public service” blog posts. The first (this one) is to explain how to derive probabilities from decision theory, and the second is to show how this decision-theoretical argument, together with the Many-Worlds interpretation, yields the Born rule.

Unlike Deutsch, I’m going to use a standard decision theory, taken from the excellent “The Foundations of Causal Decision Theory” by James Joyce. We’re going to consider a simple betting scenario, where an agent – called Amir – decides how much to pay to take part in a game where he receives $a$ euros if event $E$ happens, and $b$ euros if event $\lnot E$ happens2. The game is then defined by the vector $(a,b)$, and the maximal price Amir accepts to pay for it is its value $V(a,b)$3.

The first rationality axiom we demand is that if the game is certain to pay him back $c$ euros, he must assign value $c$ to the game. This means that the Amir is indifferent to betting per se, he doesn’t demand some extra compensation to go through the effort of betting, nor does he accept to lose money just to experience the thrill of betting (unlike real gambling addicts, I must say). The axiom is then

• Constancy: $V(c,c) = c$.

The second axiom we demand is that if for a pair of games $(a,b)$ and $(c,d)$ it happens that $a \ge c$ and $b \ge d$, that is, if in both cases where $E$ happens or $\lnot E$ happens the first game pays a reward that is larger or equal than the second game, then Amir must value the first game no less than the second game. The axiom is then

• Dominance: if $(a,b) \ge (c,d)$ then $V(a,b) \ge V(c,d)$.

The third and last axiom we need sounds very innocent: if Amir is willing to pay $V(a,b)$ to play the game with rewards $(a,b)$, and thinks that playing for rewards $(c,d)$ is worth $V(c,d)$, then the price he should pay for getting the rewards $(a+c,b+d)$ must be $V(a,b) + V(c,d)$. In other words: it shouldn’t matter if tickets for the game with rewards $(a+c,b+d)$ are sold at once, or broken down into first a ticket for rewards $(a,b)$ followed by a ticket for rewards $(c,d)$. The axiom is then

• Additivity: $V(a+c,b+d) = V(a,b) + V(c,d)$.

One problem with Additivity is that real agents don’t behave like this. People usually assign values such that $V(a+c,b+d) < V(a,b) + V(c,d)$, because if you have nothing then 10€ might be the difference between life and death, whereas if you already have 10,000€ then 10€ is just a nice gift. Besides not matching reality, this linear utility function implied by Additivity causes pathological decisions such as the St. Petersburg paradox or Pascal’s Wager. But these problems do not appear if the amounts at stake are small compared to Amir’s wealth, which we can assume to be the case, and Additivity makes for a rather simple and elegant decision theory, so we’ll use it anyway4. After all, I’m not writing for the people whose objection to the Deutsch-Wallace theorem is that Deutsch’s decision theory implies linear utilities, but rather for those whose objection is “What the hell is going on?”.

Now, to work. First we shall show how Additivity allows us to write the value of any game as a function of the value of the elementary games $(1,0)$ and $(0,1)$. Additivity immediately implies that
$V(a,b) = V(a,0) + V(0,b),$and that for any positive integer $n$
$V(na,0) = nV(a,0).$Taking now $a=1/n$, the previous equation gives us that $V(1,0) = nV(1/n,0),$ or that $V(1/n,0) = \frac1n V(1,0).$ Considering $m$ such games, we have now that $V(m/n,0) = \frac{m}{n} V(1,0)$ for any positive rational $m/n$. We can extend this to all rationals if we remember that by Constancy $V(0,0) = 0$ and that by Additivity
$V(0,0) = V(m/n,0) + V(-m/n,0).$Now one could extend this argument to all reals by taking some continuity assumption, but I don’t think it is interesting to do so. I’d rather assume that one can only have rational amounts of euros5. Anyway, now we have shown that for all rational $a$ and $b$ we have that
$V(a,b) = aV(1,0) + bV(0,1).$What is left to see is that the values of the elementary games $(1,0)$ and $(0,1)$ behave like probabilities. If we consider Constancy with $c=1$ we have that $V(1,0) + V(0,1) = 1,$ so these “probabilities” are normalised. If we now consider Dominance, we get that
$V(1,0) \ge V(0,0) = 0,$so the “probabilities” are positive. Is there anything left to show? Well, if you are a Bayesian, no. The probability of an event $E$ is defined as the price a rational agent would pay for a lottery ticket that gives then 1€ if $E$ happens and nothing otherwise. Bayesians have the obligation to show that these probabilities to obey the usual Kolmogorov axioms, but on the interpretational side there is nothing left to explain.

## On the morality of blackholing in conferences

Consider the entirely hypothetical situation where you are in a physics conference with really bad wifi. Either because the router has a hard limit in the amount of devices that can connect simultaneously, or the bandwidth is too small to handle everyone’s OwnClouds trying to sync, or it is a D-Link. The usual approach is just to be pissed off and and complain to the organizers, to no avail (while ignoring the talks and trying to reconnect like crazy). Here I’d like to describe a different approach, that if not morally commendable at least lead to more results: blackholing.

To blackhole, what you do is to create a hotspot with your phone with the same name, encryption type, and password as the conference wifi. You then disable the data connection of your phone, and turn on the hotspot. What happens is that the devices of the people close to you will automatically disconnect from the conference router and connect to your hotspot instead, since they will think that your hotspot is a repeater with a stronger signal. But since you disabled your data connection, they are connecting to a sterile hotspot, so you are creating a kind of wifi black hole. To the people far from your, however, this is a gift from the gods, as they keep connected to the conference router, and can use the bandwidth that was freed up by the poor souls that fell in your black hole.

The question is, is it moral to do this? Obviously the people who did fall in your black hole are not going to like it, but one thing to notice is that this technique is intrinsically altruistic, as you cannot use wifi either, since you are in the middle of the black hole (and as far as I know it is not possible to defend oneself against it). It is even more altruistic if you like to sit close to your friends, who will then sacrifice their wifi in favour of a more distant acquaintance. It does become immoral if you arrange with a friend to sit close to the conference router, and you blackhole some random people far from it with the specific intent of giving your friend wifi, without caring about the other people who will also get it.

But let’s consider that you don’t have such tribalistic morals, and consider everyone’s welfare equally. Then the question is whether the utility of $n$ people with bad wifi is smaller than the utility of $k$ people with no wifi and $n-k$ people with good wifi, that is, whether
$n\, U(\text{bad wifi}) \le k\,U(\text{no wifi}) + (n-k)\,U(\text{good wifi}).$Now, assuming that the utility is a function only of the bandwidth available, this simplifies to
$n\,U(B/n) \le k\,U(0) + (n-k)\,U(B/(n-k)),$where $B$ is the total bandwidth of the conference router. Therefore, to determine whether blackholing is moral or not we need to find out how people’s happiness scale as a function of the available bandwidth.

One immediately sees that if the happiness scales linearly with the bandwidth, it is indifferent whether to blackhole or not. But to make relevant moral judgements, we need to find out what the actual utility functions are. By asking people around, I empirically determined that
$u(x) = \frac{1}{1+\left(\frac{B_0}{x}\right)^2},$where $B_0$ is the critical bandwidth that allows people to do basic surfing. Substituting in the previous inequality, we see that blackholing is moral iff
$k \le \frac{n^2 – \left(\frac{B}{B_0}\right)^2}{n},$which is better understood if we rewrite $\frac{B}{B_0} = fn$, that is, as the fraction $f$ of people that can do basic surfing with the given bandwidth. We have then
$k \le (1-f^2)n,$which shows that if $f = 1$ it is never moral to blackhole, whereas if $f \approx 0$ it always is. In an hypothetical conference held in Paraty with $n=100$ and $\frac{B}{B_0} = 50$, it is moral to blackhole up to $k=75$ people.

## Is gravity quantum?

Last week two curious papers appeared on the arXiv, one by Marletto and Vedral, and the other by Bose et al., proposing to test whether the gravitational field must be quantized. I think they have a nice idea there, that is a bit obscured by all the details they put in the papers, so I hope the authors will forgive me for butchering their argument down to the barest of the bones.

The starting point is a worryingly common idea that maybe the reason why a quantum theory of gravity is so damn difficult to make is because gravity is not actually quantum. While concrete models of “non-quantum gravity” tend to be pathological or show spectacular disagreement with experiment, there is still a lingering hope that somehow a non-quantum theory of gravity will be made to work, or that at least a semi-classical model like QFT in a curved spacetime will be enough to explain all the experimental results we’ll ever get. Marletto and Bose’s answer? Kill it with fire.

Their idea is to put two massive particles (like neutrons) side-by-side in two Mach-Zender interferometers, in such a way that their gravitational interaction is non-negligible in only one of the combination of arms, and measure the resulting entanglement as proof of the quantumness of the interaction.

More precisely, the particles start in the state $\ket{L}\ket{L},$ which after the first beam splitter in each of the interferometers gets mapped to $\frac{\ket{L} + \ket{R}}{\sqrt2}\frac{\ket{L} + \ket{R}}{\sqrt2} = \frac12(\ket{LL} + \ket{LR} + \ket{RL} + \ket{RR}),$ which is where the magic happens: we can put these interferometers together in such a way that the right arm of the first interferometer is very close to the left arm of the second interferometer, and all the other arms are far away from each other. If the basic rules of quantum mechanics apply to gravitational interactions, this should give a phase shift corresponding to the gravitational potential energy to the $\ket{RL}$ member of the superposition, resulting in the state
$\frac12(\ket{LL} + \ket{LR} + e^{i\phi}\ket{RL} + \ket{RR}),$ which can even be made maximally entangled if we manage to make $\phi = \pi$. Bose promises that he can get us $\phi \approx 10^{-4}$, which would be a tiny but detectable amount of entanglement. If we now complete the interferometers with a second beam splitter, we can do complete tomography of this state, and in particular measure its entanglement.

Now I’m not sure about what “non-quantum gravity” can do, but if it can allow superpositions of masses to get entangled via gravitational interactions, the “non-quantum” part of its name is as appropriate as the “Democratic” in Democratic People’s Republic of Korea.

## How quantum teleportation actually works

EDIT: Philip Ball has updated his article on Nature News, correcting the most serious of its errors. While everyone makes mistakes, few actually admit to them, so I think this action is rather praiseworthy. Correspondingly, I’m removing criticism of that mistake in my post.

Recently I have read an excellent essay by Philip Ball on the measurement problem: clear, precise, non-technical, free of bullshit and mysticism. I was impressed: a journalist managed to dispel confusion about a theme that even physicists themselves are confused about. It might be worth checking out what this guy writes in the future.

I was not so impressed, however, when I saw his article about quantum teleportation, reporting on Jian-Wei Pan’s group amazing feat of teleporting a quantum state from a ground station to a satellite. While Philip was careful to note that nothing observable is going on faster than light, he still claims that something unobservable is going on faster than light, and that there is some kind of conspiracy by Nature to cover that up. This is not only absurd on its face, but also needs the discredited notion of wavefunction collapse to make sense, which Philip himself noted was replaced by decoherence as a model of how measurements happen. For these reasons, very few physicists still take this description of the teleportation protocol seriously. It would be nice if the media would report on the current understanding of the community instead of repeating misconceptions from the 90s.

But enough ranting. I think the best way to counter the spreading of misinformation about quantum mechanics is not to just criticize people who get it wrong, but instead to give the correct explanation about the phenomena. I’m going to explain it twice, first in a non-technical way in the hope of helping interested laypeople, and then in a technical way, for people who do know quantum mechanics. So, without further ado, here’s how quantum teleportation actually works (this is essentially Deutsch and Hayden‘s description):

Alice has a quantum bit, which she wants to transmit to Bob. Quantum bits are a bit like classical bits as they can be in the states 0 or 1 (and therefore used to store information like blogs or photos6), and entirely unlike classical bits as they can also be in a superposition of 0 and 1. Now if Alice had a classical bit, it would be trivial to transmit it to Bob: she would just use the internet. But the internet cannot handle superpositions between 0 and 1: if you tried to send a qubit via the internet you would lose this superposition information (the Dutch are working on this, though). To preserve this superposition information Alice would need an expensive direct optical fibre connection to Bob’s place, that we assume she doesn’t have.

What she do? She can try to measure this superposition information, record it in classical bits, and transmit those via the internet. But superposition information is incredibly finicky: if Alice has only one copy of the qubit, she cannot obtain it. She can only get a good approximation to it if she measures several copies of the qubit. Which she might not have, or even if she does, it will be only an approximation to her qubit, not the real deal.

So again, what can she do? That’s where quantum teleportation comes in. If Alice and Bob share a Bell state (a kind of entangled state), they can use it to transmit this fragile superposition information perfectly. Alice needs to do a special kind of measurement — called Bell basis measurement — in the qubit she wants to transmit together with her part of the Bell state. Now, this is where everyone’s brains melt and all the faster-than-light nonsense comes from. It appears that after Alice does her measurement the part of the Bell state that belongs to Bob instantaneously becomes the qubit Alice wanted to send, just with some error that depends on her measurement result. In order to correct the error, Bob then needs to know Alice’s measurement result, which he can only find out after a light signal has had time to propagate from her lab to his. So it is as if Nature did send the qubit faster than light, but cleverly concealed this fact with this error, just so that we wouldn’t see any violation of relativity. Come on. Trying to put ourselves back in the centre of the universe, are we?

Anyway, this narrative only makes sense if you believe in some thoroughly discredit interpretations of quantum mechanics2. If you haven’t kept your head buried in the sand in the last decades, you know that measurements work through decoherence: Alice’s measurement is not changing the state of Bob in any way. She is just entangling her qubit with the Bell state and herself and anything else that comes in the way. And this entanglement spreads just through normal interactions: photons going around, molecules colliding with each other. Everything very decent and proper, nothing faster than light.

Now, in this precious moment after she has done her measurement and before this cloud of decoherence has had time to spread to Bob’s place, we can compare the silly story told in the previous paragraph with reality. We can compute the information about Alice’s qubit that is available in Bob’s place, and see that it is precisely zero. Nature is not trying to conceal anything from us, it is just a physical fact that the real quantum state that describes Alice and Bob’s systems is a complicated entangled state that contains no information about Alice’s qubit in Bob’s end. But the cool thing about quantum teleportation is that if Bob knows the measurement result he is able to sculpt Alice’s qubit out of this complicated entangled state. But he doesn’t, because the measurement result cannot get to him faster than light.

Now, if we wait a couple of nanoseconds more, the cloud of decoherence hits Bob, and then we are actually in the situation where Bob’s part of the Bell state has become Alice’s qubit, modulo some easily correctable error. But now there is no mystery to it: the information got there via decoherence, no faster than light.

Now, for the technical version: Alice has a qubit $\ket{\Gamma} = \alpha\ket{0} + \beta\ket{1}$, which she wishes to transmit to Bob, but she does not have a good noiseless quantum transmission channel that she can use, just a classical one (aka the Internet). So what can they do? Luckily they have maximally entangled state $\ket{\phi^+} = \frac1{\sqrt2}(\ket{00}+\ket{11})$ saved from the time when they did have a good quantum channel, so they can just teleport $\ket{\Gamma}$.

To do that, note that initial state they have, written in the order Alice’s state, Alice’s part of $\ket{\phi^+}$, and Bob’s part of $\ket{\phi^+}$, is
$\ket{\Gamma}\ket{\phi^+} = \frac{1}{\sqrt2}( \alpha\ket{000}+\alpha\ket{011} + \beta\ket{100} + \beta{111}),$ and if we rewrite the first two subsystems in the Bell basis we obtain
$\ket{\Gamma}\ket{\phi^+} = \frac{1}{2}( \ket{\phi^+}\ket{\Gamma} + \ket{\phi^-}Z\ket{\Gamma} + \ket{\psi^+}X\ket{\Gamma} + \ket{\psi^-}XZ\ket{\Gamma}),$ so we see that conditioned on Alice’s state being a Bell state, Bob’s state is just a simple function of $\ket{\Gamma}$. Note that at this point nothing was done to the quantum system, so Bob’s state did not change in any way. If we calculate the reduce density matrix at his lab, we see that it is the maximally mixed state, which contains no information about $\ket{\Gamma}$ whatsoever.

Now, clearly we want Alice to measure her subsystems in the Bell basis to make progress. She does that, first applying an entangling operation to map the Bell states to the computational basis, and then she makes the measurement in the computational basis.3 After the entangling operation, the state is
$\frac{1}{2}( \ket{00}\ket{\Gamma} + \ket{01}Z\ket{\Gamma} + \ket{10}X\ket{\Gamma} + \ket{11}XZ\ket{\Gamma}),$ and making a measurement in the computational basis — for now modelled in a coherent way — and storing the result in two extra qubits results in the state
$\frac{1}{2}( \ket{00}\ket{00}\ket{\Gamma} + \ket{01}\ket{01}Z\ket{\Gamma} + \ket{10}\ket{10}X\ket{\Gamma} + \ket{11}\ket{11}XZ\ket{\Gamma}).$ Now something was done to this state, but still there is no information at Bob’s: his reduced density matrix is still the maximally mixed state. Looking at this entangled state, though, we see that if Bob applies the operations $\mathbb{I}$, $X$, $Z$, or $ZX$ to his qubit conditioned on the measurement result he will extract $\ket{\Gamma}$ from it. So Alice simply sends the qubits with the measurement result to Bob, who uses it to get $\ket{\Gamma}$ in his side, the teleportation protocol is over, and Alice and Bob lived happily ever after. Nothing faster than light happened, and the information from Alice to Bob clearly travelled through the qubits with the measurement results. The interesting thing we saw was that by expending one $\ket{\phi^+}$ and by sending two classical bits we can transmit one quantum bit. Everything ok?

No, no, no, no, no!, you complain. What was this deal about modelling a measurement coherently? This makes no sense, measurements must by definition cause lots of decoherence! Indeed, we’re getting there. Now with decoherence, the state after the measurement in the computational basis is $\frac{1}{2}( \ket{E_{00}}\ket{00}\ket{00}\ket{\Gamma} + \ket{E_{01}}\ket{01}\ket{01}Z\ket{\Gamma} + \ket{E_{10}}\ket{10}\ket{10}X\ket{\Gamma} + \ket{E_{11}}\ket{11}\ket{11}XZ\ket{\Gamma}),$ where $\ket{E_{ij}}$ is the state of the environment, labelled according to the result of the measurement. You see that there is no collapse of the wavefunction4: in particular Bob’s state is in the same entangled superposition as before, and his reduced density matrix is still the maximally mixed state. Moreover, as any physical process, decoherence spreads at most as fast as the speed of light, so even after Alice has been engulfed by the decoherence and has obtained a definite measurement result, Bob will still for some time remain unaffected by it, with the state still being adequately described by the above superposition. Only after a relativity-respecting time interval he will become engulfed as well, coherence will be killed, and the state relative to him and Alice will be adequately described by (e.g.) $\ket{E_{10}}\ket{10}\ket{10}X\ket{\Gamma}.$ Now we are in the situation people usually describe: his qubit is in a definite state, and he merely does not know which is it. Alice then sends him the measurement result — 10 — via the Internet, from which he deduces that he needs to apply operation $X$ to recover $\ket{\Gamma}$, and now the teleportation protocol is truly over.