Which is really bizarre argument. Yes, frequentism is nonsense, and yes, subjective probability makes perfect sense. But that’s all that is true about it. No, objective probability is not the same thing as frequentism, and no, subjective probability is not the only probability that exists. Come on, that’s denying the premise! The question is interesting precisely because we strongly believe that objective probability exists; either because of quantum mechanics, or more directly from the observation of radioactive decay. Does anybody seriously believe that whether some atom decays or not depends on the opinion of an agent? There even existed natural nuclear reactors, where chain reactions occurred much before any agent existed to wonder about them.

In any case, it seems that philosophers won’t do anything about it. What can we say about objective probability, though? It is easy to come up with some desiderata: it should to be *objective*, to start with. The probability of some radioactive atom decaying should just be a property of the atom, not a property of some agent betting about it. Agents and bets are still important, though, as it should make sense to bet according to the objective probabilities. In other words, Lewis’ Principal Principle should hold: rational agents should set their subjective probabilities to be equal to the objective probabilities, if the latter are known1. Last but not least, objective probabilities should be connected to relative frequencies via the law of large numbers, that is, we need that

\[ \text{Pr}(|f_N-p|\ge\varepsilon) \le 2e^{-2N\varepsilon^2}, \] or, in words, the (multi-trial) probability that the frequency deviates more than $\varepsilon$ from the (single-trial) probability after $N$ trials goes down exponentially with $\varepsilon$ and $N$ 2.

I think it is also easy to come up with a definition of objective probability that fulfills these desiderata, if we model objectively random processes as *deterministic* branching processes. Let’s say we are interested the decay of an atom. Instead of saying that it either decays or not, we say that the world *branches* in several new worlds, in some of which the atom decays, and some of which it does not. Moreover, we say that we can somehow count the worlds, that is, that we can attribute a measure $\mu(E)$ to the set of worlds where event $E$ happens and a measure $\mu(\neg E)$ to the set of worlds where event $\neg E$ happens. Then we say that the objective probability of $E$ is

\[p(E) = \frac{\mu(E)}{\mu(E)+\mu(\neg E)}.\] Now, before you shut off saying that this is nonsense, because the Many-Worlds interpretation is false, so we shouldn’t consider branching, let me introduce a toy theory where this deterministic branching is *literally true* by fiat. In this way we can separate the question of whether the Many-Worlds interpretation is true from the question of whether deterministic branching explains objective probability.

This toy theory was introduced by Adrian Kent to argue that probability makes no sense in the Many-Worlds interpretation. Well, I think it is a great illustration of how probability actually makes perfect sense. It goes like this: the universe is a deterministic computer simulation3 where some agents live. In this universe there is a wall with two lamps, and below each a display that shows a non-negative integer. This wall also has a “play” button, that when pressed makes either of the lamps light up.

The agents there can’t really predict which lamp will light up, but they have learned two things about how the wall works. The first is that if the number below a lamp is zero, that lamp never lights up. The second is that if the numbers are set to $n_L$ and $n_R$, respectively, and they press “play” multiple times, the fraction of times where the left lamp lights up is often close to $n_L/(n_L+n_R)$.

What is going on, of course, is that when “play” is pressed the whole computer simulation is deleted and $n_L+n_R$ new ones are initiated, $n_L$ with the left lamp lit, and $n_R$ with the right lamp lit. My proposal is to define the objective probability of some event as the proportion of simulations where this event happens, as this quantity fulfills all our desiderata for objective probability.

This clearly fulfills the “objectivity” desideratum, as a proportion of simulations is a property of the world, not some agent’s opinion. It also respects the “law of large numbers” desideratum. To see that, fist notice that for a single trial the proportion of simulations where the left lamp lights up is

\[p(L) = \frac{n_L}{n_L+n_R}.\] Now the number of simulations where the left lamp lights up $k$ times out of $N$ trials is

\[ {N \choose k}n_L^kn_R^{N-k},\] so if we divide by total number of simulations $(n_L+n_R)^N$, we see that the proportion of simulations where the left lamp lit $k$ times out of $N$ is given by \[\text{Pr}(N,k) = {N \choose k}p(L)^k(1-p(L))^{N-k}.\]Since this is formally identical to the binomial distribution, it allows us to prove a theorem formally identical to the law of large numbers:

\[ \text{Pr}(|k/N-p(L)|\ge\varepsilon) \le 2e^{-2N\varepsilon^2}, \]which says that the (multi-trial) proportion of simulations where the frequency deviates more than $\varepsilon$ from the (single-trial) proportion of simulations after $N$ trials goes down exponentially with $\varepsilon$ and $N$.

Last but not least, to see that if fulfills the “Principal Principle” desideratum, we need to use the decision-theoretic definition of subjective probability: the subjective probability $s(L)$ of an event $L$ is the highest price a rational agent should pay to play a game where they receive $1$€ if event $L$ happens and nothing otherwise. In the $n_L$ simulations where the left lamp lit the agent ends up with $(1-s(L))$ euros, and in the $n_R$ simulations where the right lamp lit the agent ends up with $-s(L)$ euros. If the agent cares equally about all its future selves, they should accept to pay $s(L)$ as long as \[(1-s(L))n_L-s(L)n_R \ge 0,\]which translates to \[s(L) \le \frac{n_L}{n_L+n_R},\] so indeed the agent should bet according to the objective probability if they know $n_L$ and $n_R$4.

And this is it. Since it fulfills all our desiderata, I claim that deterministic branching does explain objective probability. Furthermore, it is the only coherent explanation I know of. It is hard to argue that nobody will ever come up with a single-world notion of objective probability that makes sense, but at least in one point such a notion will always be unsatisfactory: why would something be in principle impossible to predict? Current answers are limited to saying that quantum mechanics say so, or that if we could predict the result of a measurement we would run into trouble with Bell’s theorem. But that’s not really an explanation, it’s just saying that there is no alternative. Deterministic branching theories do offer an explanation, though: you cannot predict which outcome will happen because all will.

Now the interesting question is whether this argument applies to the actual Many-Worlds interpretation, and we can get a coherent definition of objective probability there. The short answer is that it’s complicated. The long answer is the paper I wrote about it =)

]]>I’m happy with how the article turned out (no bullshit, conveys complex concepts in understandable language, quotes me ;), but there is a point about it that I’d like to nitpick: Ball writes that it was not “immediately obvious” whether the probabilities should be given by $\psi$ or $\psi^2$. Well, it might not have been immediately obvious to Born, but this is just because he was not familiar with Schrödinger’s theory2. Schrödinger, on the other hand, was very familiar with his own theory, and in the very paper where he introduced the Schrödinger equation he discussed at length the meaning of the quantity $|\psi|^2$. He got it wrong, but my point here is that he *knew* that $|\psi|^2$ was the right quantity to look at. It was obvious to him because the Schrödinger evolution is unitary, and absolute values squared behave well under unitary evolution.

Born’s contribution was, therefore, not mathematical, but conceptual. What he introduced was not the $|\psi|^2$ formula, but the idea that this is a probability. And the difficulty we have with the Born rule until today is conceptual, not mathematical. Nobody doubts that the probability must be given by $|\psi|^2$, but people are still puzzled by these high-level, ill-defined concepts of probability and measurement in an otherwise reductionist theory. And I think one cannot hope to understand the Born rule without understanding what probability is.

Which is why I don’t think the papers of Masanes et al. and Cabello can *explain* the Born rule. They refuse to tackle the conceptual difficulties, and focus on the mathematical ones. What they can explain is why quantum theory immediately goes down in flames if we replace the Born rule with anything else. I don’t want to minimize this result: it is nontrivial, and solves something that was bothering me for a long time. I’ve always wanted to find a minimally reasonable alternative to the Born rule for my research, and now I know that there isn’t one.

This is what I like, by the way, in the works of Saunders, Deutsch, Wallace, Vaidman, Carroll, and Sebens. They tackle the conceptual difficulties with probability and measurement head on3. I’m not satisfied with their answers, for several reasons, but at least they are asking the right questions.

]]>The well-known argument by Frauchiger and Renner about the consistency of quantum mechanics has finally been published (in Nature Communications). With publication came a substantial change to the conclusion of the paper: while the old version claimed that “no single-world interpretation can be logically consistent”, the new version claims that “quantum theory cannot be extrapolated to complex systems” or, to use the title, that “quantum theory cannot consistently describe the use of itself”.

This is clearly bollocks. We need to find out, though, where exactly has the argument gone wrong. Several discussions popped up on the internet to do so, for example in Scott Aaronson’s blog, but to my surprise nobody pointed out the obvious mistake: the predictions that Frauchiger and Renner claim to follow from quantum mechanics do not actually follow from quantum mechanics. In fact, they are outright wrong.

For example, take the first of the predictions that appear on Table 3 of the paper. $\bar{\text{F}}$ measures $r=\text{tails}$ and claims: “I am certain that W will observe $w = \text{fail}$ at time $n$:$31$”. By assumption, though, $\bar{\text{F}}$ is in an isolated laboratory and their measurement is described by a unitary transformation. This implies that the state of lab L at time $n$:$30$ will be given either by

\[ \frac{3}{\sqrt{10}}\ket{\text{fail}}_\text{L} + \frac{1}{\sqrt{10}}\ket{\text{ok}}_\text{L}\quad\text{or}\quad\frac{1}{\sqrt{2}}\ket{\text{fail}}_\text{L} – \frac{1}{\sqrt{2}}\ket{\text{ok}}_\text{L},\]depending on the result of $\bar{\text{W}}$’s measurement. Therefore, it is not certain that W will observe $w = \text{fail}$; this will happen with probability $9/10$ or $1/2$, respectively.

To obtain the prediction the authors write in Table 3, one would need to assume that $\bar{\text{F}}$’s measurement caused a collapse of the state of their laboratory – contrary to the assumption of unitarity. In this case, the state at time $n$:$30$ would in fact be given by

\[ \ket{\text{fail}}_\text{L},\]independently of the result of $\bar{\text{W}}$’s measurement, and W would indeed observe $w = \text{fail}$ with certainty. But then W would never observe $w = \text{ok}$, and the paradox desired by the authors would never emerge.

To make this point more clear, I will describe how precisely the same problem arises in the original Wigner’s friend *gedankenexperiment*, so that people who are not familiar with Frauchiger and Renner’s argument can follow it. It goes like this:

Wigner is outside a perfectly isolated laboratory, and inside it there is a friend who is going to make a measurement on a qubit. Their initial state is

\[\ket{\text{Wigner}}\ket{\text{friend}}\frac{\ket{0}+\ket{1}}{\sqrt2}.\]If we assume that the measurement of the friend is a unitary transformation, after the measurement their state becomes

\[\ket{\text{Wigner}}\frac{\ket{\text{friend}_0}\ket{0} + \ket{\text{friend}_1}\ket{1}}{\sqrt2}.\]Now the friend is asked to predict what Wigner will observe if he makes a measurement on the qubit. Frauchiger and Renner claim that, using quantum mechanics, the friend can predict that “If I observed 0, then Wigner will observe 0 will certainty”4.

Wait, what? The quantum prediction is clearly that Wigner will observe 0 with probability 1/2. The claimed prediction only follows if we assume that the friend’s measurement caused a collapse.

And both assumptions are fine, actually. If there is no collapse, the prediction of 0 with probability 1/2 is correct and leads to no inconsistency, and if there is a collapse the prediction of 0 with probability 1 is correct and leads to no inconsistency. We only get an inconsistency if we insist that from the point of view of the friend there is a collapse, from the point of view of Wigner there is no collapse, and somehow both points of view are correct.

**Update:** After a long discussion with Renato, I think I understand his point of view. He thinks that this assumption of “collapse and no collapse” is just part of quantum mechanics, so it doesn’t need to be stated separately. Well, I think this is one hell of an unstated assumption, and in any case hardly part of the consensus about quantum mechanics. More technically, I think Frauchiger and Renner’s formalization of quantum mechanics — called [Q] — does not imply “collapse and no collapse”, it is too vague for that, so there is really a missing assumption in the argument.

This is of course nonsense. Bell’s theorem is not only a rather simple piece of mathematics, with a few-lines proof that can be understood by high-school students, but also the foundation of an entire field of research — quantum information theory. It has been studied, verified, and improved upon by thousands of scientists around the world.

The form of Bell’s theorem that is relevant for the article at hand is that for all probability distributions $\rho(\lambda)$ and response functions $A(a,\lambda)$ and $B(b,\lambda)$ with range $\{-1,+1\}$ we have that

\begin{multline*}

-2 \le \sum_\lambda \rho(\lambda) \Big[A(a_1,\lambda)B(b_1,\lambda)+A(a_1,\lambda)B(b_2,\lambda) \\ +A(a_2,\lambda)B(b_1,\lambda)-A(a_2,\lambda)B(b_2,\lambda)\Big] \le 2

\end{multline*}

The author’s proposed counterexample? It’s described in equations (3.48) and (3.49): A binary random variable $\lambda$ that can take values $-1$ or $+1$, with $\rho(-1)=\rho(+1)=1/2$, and response functions $A(a,\pm1)=\pm1$ and $B(b,\pm1)=\mp1$. That’s it. Just perfectly anti-correlated results, that do not even depend on the local settings $a$ and $b$. The value of the Bell expression above is simply $-2$.

Now how could Open Science let such trivial nonsense pass? They do provide the “Review History” of the article, so we can see what happened: there were two referees that pointed out that the manuscript was wrong, one that was unsure, and two that issued a blanket approval without engaging with the contents. And the editor decided to accept it anyway.

What now? Open Science can recover a bit of its reputation by withdrawing this article, as Annals of Physics did with a previous version, but I’m never submitting an article to them.

]]>Before I start ranting about what I find so objectionable about it, I’ll present the proof of this version of Bell’s theorem the best I can. So, what is counterfactual definiteness? It is the assumption that not only the measurement you did in fact do has a definite answer, but also the measurement you did *not* do has a definite answer. If feels a lot like determinism, but it is not really the same thing, as the assumption is silent about *how* the result of the counterfactual measurement is determined, it just says that it *is*. To be more clear, let’s take a look at the data that comes from a real Bell test, the Delft experiment:2

N | $x$ | $y$ | $a$ | $b$ |
---|---|---|---|---|

1 | 0 | 0 | 1 | 1 |

2 | 0 | 0 | 0 | 0 |

3 | 1 | 1 | 1 | 0 |

4 | 1 | 1 | 0 | 1 |

5 | 0 | 0 | 1 | 1 |

6 | 1 | 1 | 1 | 0 |

7 | 0 | 0 | 1 | 0 |

8 | 1 | 0 | 1 | 1 |

9 | 0 | 0 | 1 | 1 |

10 | 0 | 1 | 0 | 0 |

The first column indicates the rounds of the experiment, the $x$ and $y$ columns indicate the settings of Alice and Bob, and the $a$ and $b$ columns the results of their measurements. If one assumes counterfactual definiteness, then definite results must also exist for the measurements that were *not* made, for example in the first round there must exist results corresponding to the setting $x=1$ for Alice and $y=1$ for Bob. This data would then be just part of some more complete data table, for example this:

N | $a_0$ | $a_1$ | $b_0$ | $b_1$ |
---|---|---|---|---|

1 | 1 | 0 | 1 | 1 |

2 | 0 | 1 | 0 | 1 |

3 | 1 | 1 | 0 | 0 |

4 | 1 | 0 | 1 | 1 |

5 | 1 | 1 | 1 | 1 |

6 | 1 | 1 | 0 | 0 |

7 | 1 | 1 | 0 | 0 |

8 | 1 | 1 | 1 | 0 |

9 | 1 | 0 | 1 | 0 |

10 | 0 | 0 | 1 | 0 |

In this table the column $a_0$ has the results of Alice’s measurements when her setting is $x=0$, and so on. The real data points, corresponding to the Delft experiment, are in black, and I filled in red the hypothetical results for the measurements that were not made.

What is the problem with assuming counterfactual definiteness, then? A complete table certainly exists. But it makes it possible to do something that wasn’t before: we can evaluate the entire CHSH game in every single round, instead of having to choose a single pair of settings. As a quick reminder, to win the CHSH game Alice and Bob must give the same answers when their settings are $(0,0)$, $(0,1)$, or $(1,0)$, and give different answers when their setting is $(1,1)$. In other words, they must have $a_0=b_0$, $a_0=b_1$, $a_1=b_0$, and $a_1 \neq b_1$. But if you try to satisfy all these equations simultaneously, you get that $a_0=b_0=a_1 \neq b_1 = a_0$, a contradiction. At most, you can satisfy 3 out of the 4 equations3. Then since in every row the score in the CHSH game is at most $3/4$, if we sample randomly from each row a pair of $a_x,b_y$ we have that

\[ \frac14(p(a_0=b_0) + p(a_0=b_1) + p(a_1=b_0) + p(a_1\neq b_1)) \le \frac34,\]

which is the CHSH inequality.

But if you select the actual Delft data from each row, the score will be $0.9$. Contradiction? Well, no, because you didn’t sample randomly, but just chose $1$ out of $4^{10}$ possibilities, which would happen with probability $1/4^{10} \approx 10^{-6}$ if you actually did it randomly. One can indeed violate the CHSH inequality by luck, it is just astronomically unlikely.

Proof presented, so now ranting: what is wrong with this version of the theorem? It is just so *lame*! It doesn’t even explicitly deal with the issue of locality, which is fundamental in all other versions of the theorem4! The conclusion that one takes from it, according to Asher Peres himself, is that “Unperformed experiments have no results”. To which the man in the street could reply “Well, duh, of course unperformed experiments have no results, why are you wasting my time with this triviality?”. It leaves the reader with the impression that they only need to give up the notion that unperformed experiments have results, and they are from then on safe from Bell’s theorem. But this is not true at all! The other proofs of Bell’s theorem still hold, so you still need to give up either *determinism* or *no action at a distance*, if you consider the simple version, or unconditionally give up *local causality*, if you consider the nonlocal version, or choose between *generalised local causality* and living in a single world, if you consider the Many-Worlds version.

What about the mainstream interpretations, then? In Časlav’s neo-Copenhagen interpretation the measurement results are observer-dependent (otherwise this would be a rather schizophrenic paper). In QBism they are explicitly subjective2, as almost everything else. In Many-Worlds there isn’t a single observer after a measurement, but several of them, each with their own measurement result.

How can this be? Časlav’s argument is as simple as it gets in quantum foundations: Bell’s theorem. In its simple version, Bell’s theorem dashes the old hope that quantum mechanics could be made deterministic: if the result of a spin measurement were pre-determined, then you wouldn’t be able to win the CHSH game with probability higher than $3/4$, unless some hidden action-at-a-distance was going on. But let’s suppose you did the measurement. Surely now the weirdness is over, right? You left the quantum realm, where everything is fuzzy and complicated, and entered the classical realm, where everything is solid and clear. So solid and clear that if somebody else does a measurement on you, their measurement result will be pre-determined, right?

Well, if it were pre-determined, than people doing measurements on people doing measurements wouldn’t be able to win the CHSH game with probability higher than $3/4$, unless some hidden action-at-a-distance was going on. But if quantum mechanics holds at *every* scale, then again one can win it with probability $\frac{2+\sqrt{2}}{4}$.

This highlights the fundamental confusion in Frauchiger and Renner’s argument, where they consider which outcome some observer thinks that another observer will experience, but are not careful to distinguish the different copies of an observer that will experience different outcomes. I’ve reformulated their argument to make this point explicit here, and it works fine, but undermines their conclusion that in single-world but not many-world theories observers will make contradictory assertions about which outcomes other observers will experience. Well, yes, but the point is that this contradiction is resolved in many-world theories by allowing different copies of an observer to experience different outcomes, and this recourse is not available in single-world theories.

]]>First of all, this limit does not exist. If one makes an infinite sequence of zeroes and ones by throwing a fair coin (fudging away this pesky infinity again), calling the result of the $i$th throw $s_i$, the relative frequency after $n$ throws is

\[ f_n = \frac1n\sum_{i=1}^{n}s_i.\] What should then $\lim_{n\to\infty}f_n$ be? $1/2$? Why? All sequences of zeros and ones are equally possible – they are even equally probable! What is wrong with choosing the sequence $s = (0,0,0,\ldots)$? Or even the sequence $(0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,\ldots)$, whose frequencies do not converge to any number, but eternally oscillate between $0$ and $1$? If for some reason one chooses a nice3 sequence like $s=(0,1,0,1,0,1,\ldots)$, for which the limit does converge to $1/2$, what is wrong with reordering it to obtain $s’ = (s_1,s_3,s_2,s_5,s_7,s_4,\ldots)$ instead, with limit $1/3$?

No, no, no, you complain. It is true that all sequences are equiprobable, but most of them have limiting frequency $1/2$. Moreover, it is a theorem that the frequencies converge – it is the law of large numbers! How can you argue against a theorem?

Well, what do you mean by “most”? This is already a probabilistic concept! And according to which measure? It cannot be a fixed measure, otherwise it would say that the limiting frequency is *always* $1/2$, independently of the single-throw probability $p$. On the other hand, if one allows it to depend on $p$, one can indeed define a measure on the set of infinite sequences such that “most” sequences have limiting frequency $p$. A probability measure. So you’re not explaining the single-throw probability in terms of the limiting frequencies, but rather in terms of the probabilities of the limiting frequencies. Which is kind of a problem, if “probability” is what you wanted to explain in the first place. The same problem happens with the law of large numbers. Its statement is that

\[\forall \epsilon >0 \quad \lim_{n\to\infty}\text{Pr}(|f_n -p|\ge \epsilon) = 0,\] so it only says that the *probability* of observing a frequency different than $p$ goes to $0$ as the number of trial goes to infinity.

But enough with mocking frequentism. Much more eloquent dismissals have already been written, several times over, and as the Brazilian saying goes, one shouldn’t kick a dead dog. Rather, I want to imagine a world where frequentism is *true*.

What would it take? Well, the most important thing is to make the frequencies converge to the probability in the infinite limit. One also needs, though, the frequencies to be a good approximation to the probability even for a finite number of trials, otherwise empiricism goes out of the window. My idea, then, is to allow the frequencies to fluctuate within some error bars, but never beyond. One could, for example, take the $5\sigma$ standard for scientific discoveries that particle physics use, and declare it to be a fundamental law of Nature: it is only possible to observe a frequency $f_n$ if

\[f_n \in \left(p-5\frac{\sigma}{\sqrt{n}},p+5\frac{\sigma}{\sqrt{n}}\right).\] Trivially, then, for large $\lim_{n\to\infty}f_n = p$, and even better, if we want to measure some probability within error $\epsilon$, we only need $n > \sigma^2/\epsilon^2$ trials, so for example 2500 throws are enough to tomograph any coin within error $10^{-2}$.

In this world, the gambler’s fallacy is not a fallacy, but a law of Nature. If one starts throwing a fair coin and observes 24 heads in row, it is literally impossible to observe another heads in the next throw. It’s as if there is a purpose pushing the frequencies towards the mean. It captures well our intuition about randomness. It is also completely insane: 25 heads are impossible only in the start of a sequence. If before them one had obtained 24 tails, 25 heads are perfectly fine. Also, it’s not as if 25 heads are impossible because their probability is too low. The probability of 24 heads, one tails, and another heads is even lower.

Even worse, if the probability you’re trying to tomograph is the one of obtaining 24 heads followed by one tail, then the frequency $f_1$ must be inside the interval \[[0,2^{-25}+\sqrt{2^{-25}(1-2^{-25})}]\approx [0,2^{-12.5}],\]which is only possible if $f_1 = 0$. That is, it is impossible to observe tails after observing 24 heads, as it would make $f_1=1$, but it is also impossible to observe heads. So in this world Nature would need to keep track not only of all the coin throws, but also which statistics you are calculating about them, and also find a way to keep you from observing contradictions, presumably by not allowing any coin to be thrown at all.

]]>A proper mixture is when you prepare the states $\ket{0}$ and $\ket{1}$ with probability $p$ and $1-p$, obtaining the density matrix

\[ \rho_\text{proper} = p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1}.\] An improper mixture is when you prepare the entangled state $\sqrt{p}\ket{0}\ket{0} + \sqrt{1-p}\ket{1}\ket{1}$ and discard the second subsystem, obtaining the density matrix \[ \rho_\text{improper} = p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1}.\] The question is then why do these different preparation procedures give rise to the same statistics (and therefore it is legitimate to represent them with the same density matrix).

Well, do they? I’m not so sure about that! The procedure to prepare the proper mixture is rather vague, so we can’t really answer whether is it appropriate to represent it via the density matrix $\rho_\text{proper}$. To remove the vagueness, I asked an experimentalist how she prepared the state $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$ that was necessary for an experiment. “Easy”, she told me, “I prepared $n$ copies of $\ket{0}$, $n$ copies of $\ket{1}$, and then combined the statistics.

This sounds like preparing the state $\ket{0}^{\otimes n} \otimes \ket{1}^{\otimes n}$, not like preparing $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$. Do they give the same statistics? Well, if I measure all states in the $Z$ basis, exactly $\frac12$ of the results will be $0$. But if I measure $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$ in the $Z$ basis $2n$ times, the probability that $\frac12$ of the results are $0$ is

\[ \frac{1}{2^{2n}} {2n \choose n} \approx \frac{1}{\sqrt{n\pi}},\] so just by looking at this statistic I can guess with high probability which was the preparation. It is even easier to do that if I disregard her instructions and look at the order of the results: getting $n$ zeroes followed by $n$ ones is a dead giveaway.

Maybe one should prepare these states using a random number generator instead? If one uses the function `rand()`

from MATLAB to decide whether to prepare $\ket{0}$ or $\ket{1}$ at each round one can easily pass the two randomness tests I mentioned above. Maybe it can even pass all common randomness tests available in the literature, I don’t know how good `rand()`

is. But it cannot, however pass *all* randomness tests, as `rand()`

is a deterministic algorithm using a finite seed, and is therefore restricted to outputting computable sequences of bits. One can, in fact, attack it, and this is the core of the paper of López Grande et al., showing how one can distinguish a sequence of bits that came from `rand()`

from a truly random one. More generally, even the best pseudorandom number generators we have are designed to be indistinguishable from truly random sources only by polynomial-time tests, and fail against exponential-time algorithms.

Clearly pseudorandomness is not enough to generate proper mixtures; how about true randomness instead? Just use a quantum random number generator to prepare bits with probabilities $p$ and $1-p$, and use these bits to prepare $\ket{0}$ or $\ket{1}$. Indeed, this is what people do when they are serious about preparing mixed states, and the statistics really are indistinguishable from those of improper mixtures. But why? To answer that, we need to model the quantum random number generator physically. We start by preparing a “quantum coin” in the state

\[ \sqrt{p}\ket{H}+\sqrt{1-p}\ket{T},\] which we should measure in the $\{\ket{H},\ket{V}\}$ basis to generate the random bits. Going to the Church of the Larger Hilbert Space, we model the measurement as

\[ \sqrt{p}\ket{H}\ket{M_H}+\sqrt{1-p}\ket{T}\ket{M_T},\] and conditioned on the measurement we prepare $\ket{0}$ or $\ket{1}$, obtaining the state

\[ \sqrt{p}\ket{H}\ket{M_H}\ket{0}+\sqrt{1-p}\ket{T}\ket{M_T}\ket{1}.\] We then discard the quantum coin and the measurement result, obtaining finally

\[ p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1},\] which is just the desired state, but now it is an improper mixture. So, at least in the Many-Worlds interpretation, there is no mystery about why proper and improper mixtures are equivalent: they are physically the same thing!

(A closely related question, which has a closely related answer, is why is it equivalent to prepare the states $\ket{0}$ or $\ket{1}$ with probability $\frac12$ each, or the states $\ket{+}$ or $\ket{-}$, again with probability $\frac12$? The equivalence fails for pseudorandomness, as shown by López Grande et al.; if we use true randomness instead, we are preparing the states

\[ \frac1{\sqrt{2}}(\ket{H}\ket{0}+\ket{T}\ket{1})\quad\text{or}\quad\frac1{\sqrt{2}}(\ket{H}\ket{+}+\ket{T}\ket{-})\] and discarding the coin. But note that if one applies a Hadamard to the coin of the first state one obtains the second, so the difference between then is just a unitary on a system that is discarded anyway; no wonder we can’t tell the difference! More generally, any two purifications of the same density matrix must be related by a unitary on the purifying system.)

Galley and Masanes want to invert the question, and ask for *which* quantum-like theories proper and improper mixtures are equivalent. To be able to tackle this question, we need to define what improper mixtures even are in a quantum-like theory. They proceed by analogy with quantum mechanics: if one has a bipartite state $\ket{\psi}$, and are doing measurements $E_i$ only on the first system, the probabilities one obtains are given by

\[ p(i) = \operatorname{tr}( (E_i \otimes \mathbb I) \ket{\psi}\bra{\psi} ),\] and the improper mixture is defined as the operator $\rho_\text{improper}$ for which

\[ p(i) = \operatorname{tr}( E_i \rho_\text{improper})\] for all measurements $E_i$.

In their case, they are considering a quantum-like theory that is still based on quantum states, but whose probabilities are not given by the Born rule $p(i) = \operatorname{tr}(E_i \ket{\phi}\bra{\phi})$, but by some more general function $p(i) = F_i (\ket{\phi})$. One can then define the probabilities obtained by local measurements on a bipartite state as

\[ p(i) = F_i \star \mathbb I (\ket{\psi}),\] for some composition rule $\star$ and trivial measurement $\mathbb I$, and from that an improper mixture as the operator $\omega_\text{improper}$ such that

\[ p(i) = F_i (\omega_\text{improper})\] for all measurements $F_i$.

Defining proper mixtures, on the other hand, is easy: if one can prepare the states $\ket{0}$ or $\ket{1}$ with probabilities $p$ and $1-p$, their proper mixture is the operator $\omega_\text{proper}$ such that for all measurements $F_i$

\[ p(i) = F_i(\omega_\text{proper}) = p F_i(\ket{0}) + (1-p) F_i(\ket{1}).\] That is, easy if one can generate true randomness that is not reducible to quantum-like randomness. I don’t think this makes sense, as one would have to consider a world where reductionism fails, or at least one where quantum-like mechanics is not the fundamental theory. Such non-reducible probabilities are uncritically assumed to exist anyway by people working on GPTs all the time2.

Now with both proper and improper mixtures properly defined, one can answer the question of whether they are equivalent: the answer is a surprising no, for any alternative probability rule that respects some basic consistency conditions. This has the intriguing consequence that if we were to modify the Born rule while keeping the rest of quantum mechanics intact, a wedge would be driven between the probabilities that come from the fundamental theory and some “external” probabilities coming from elsewhere. This would put the Many-Worlds interpretation under intolerable strain.

But such an abstract “no” result is not very interesting; I find it much more satisfactory to exhibit a concrete alternative to the Born rule where the equivalence fails. Galley and Masanes propose the function

\[ F_i(\ket{\psi}) = \operatorname{tr}(\hat F_i (\ket{\psi}\bra{\psi})^{\otimes 2})\] for some positive matrices $\hat F_i$ restricted by their consistency conditions. It is easy to see that the proper mixture of $\ket{0}$ and $\ket{1}$ described above is given by2

\[ \omega_\text{proper} = p \ket{00}\bra{00} + (1-p)\ket{11}\bra{11}.\] In quantum mechanics one would try to make it by discarding half of the state $\sqrt{p}\ket{0}\ket{0} + \sqrt{1-p}\ket{1}\ket{1}$. Here it doesn’t work, as nothing does, but I want to know what it gives us anyway. It is not easy to see that the improper mixture is given by the weirdo

\begin{multline} \omega_\text{improper} = (p^2 + \frac{p(1-p)}{3})\ket{00}\bra{00} + \\ \frac{2p(1-p)}{3} (\ket{01}+\ket{10})(\bra{01}+\bra{10}) + ((1-p)^2 + \frac{p(1-p)}{3})\ket{11}\bra{11}.\end{multline}

This breaks my heart. I want to actually defend quantum teleportation, and show that Randall Munroe is wrong. Quantum teleportation is not a “particle statistics thing”.3 It is not worse than classical teleportation. Quantum teleportation is cool.

Let us start with something that everyone agrees is cool: Start Trek teleportation2. Some Scottish engineer presses a button, you dematerialize in the spaceship, and instantly rematerialize on the surface of an unexplored planet. This is completely impossible, so we have to get rid of non-essential elements to make it possible.

First of all, the “instant” part of the rematerialization. Einstein doesn’t like it, so we replace it with a suitable light-speed delay. After all, I don’t really care if I get to Mars instantly, or after a 20 minutes delay. It is still much better than getting on a rocket and playing Breath of the Wild for months while being bombarded with space radiation. Second and last, the “unexplored planet” part. How the hell are you supposed to rematerialize yourself alone on an unexplored planet? You’ve been dematerialized, probably into photons3, there isn’t much you can do. Is the spaceship supposed to do the rebuilding? How? Remotely moving stuff around with atom precision? Like some kind of long-range optical tweezers? Much easier to just build a rematerialization station on Mars. Yes, this stops you from teleporting to places “Where no man has gone before”4, but let’s be honest, do you really want to teleport somewhere without first making sure that the locals don’t find you tasty?

So with these changes, a realistic version of Star Trek teleportation becomes: Some Scottish engineer presses a button, you dematerialize in the spaceship, and after a suitable light-speed delay rematerialize on the surface of an already-explored planet. Easy, right? Sounds like just a glorified 3D printer. The Scottish engineer measures you with exquisitely high precision, sends the data via email, and the 3D printer makes a copy at the destination. And the Scottish engineer shoots you.

Wait what? Why is the Scottish engineer shooting you? And with what right the 3D-printed copy is claiming to be you in Mars? And what if the Scottish engineer does not shoot you, and the 3D-printed copy grabs a rocket back to Earth while you’re still stuck in the spaceship? What if your spouse wants to have sex with the copy? Who gets custody of the kids? What if *you* want to have sex with your copy? These vexing questions are often asked, but seldom answered.

An unexpected easy-way-out appeared in 1982, when Wootters and Żurek proved the no-cloning theorem. This theorem does what it says on the tin: it shows that it is not possible to clone a quantum state. Problem is, we don’t know if the whole quantum state of a human being is needed to define their interesting properties, or some classical approximation is good enough. But there are people seriously speculating that the whole quantum state is in fact needed, so we’ll assume that this is the case so that we can go on with the story.5

What happens in this case? Well now human beings are unclonable, unique. You can still be Star Trek-teleported to Mars, but now instead of being measured and shot by the Scottish engineer, they transfer your quantum state to a bunch of photons using something called a SWAP gate, reducing your body to a shapeless heap of atoms in the process. The photons fly away to the rematerialization station, where a second SWAP gate transfers the quantum state to suitably prepared shapeless heap of atoms, which then becomes you. The crucial difference is that your original body is destroyed not because you are uncomfortable with the idea of having a copy, or because the Scottish engineer hates you, but because it is fundamentally impossible to do the teleportation otherwise. The process is more akin to moving than to copying, and captures the original Star Trek intuition of transporting people without any unwholesome killing business.

But this is not yet quantum teleportation! But why should we bother with it, what is missing? Well, the problem is that quantum states are rather fragile. Send your quantum state through the Martian atmosphere, and you’ll see what happens. Well, no, you won’t see, because you’ll be dead. Your quantum state will most likely decohere away, and at the rematerialization station only a classical approximation will arrive which, as we assumed above, is not really you.

One can try to encode the quantum state in a way that it is resistant against atmospheric turbulence (and people are working on it), but quantum teleportation offers a more elegant solution: we just need to supply some entangled pairs of particles to the dematerialization and rematerialization stations, and you can be teleported via email, just like before! Good old email, that can be copied, stored, and resent as often as we can. But with the added bonus that once you are dematerialized, you can only be rematerialized once, and this can only happen at the rematerialization station in Mars.

The catch, of course, is how you supply the entangled pairs to start with. Nature allows no cheating: it costs exactly one entangled pair to teleport an entangled pair. Well, one can just try sending them through the atmosphere. Since they don’t encode any information, it is not a problem if we lose half of them. If the atmosphere is so bad that almost none of them gets through, them one can just do it the hard way: save them in a quantum memory, and send them via rocket.

So, is quantum teleportation cool?

]]>The first rationality assumption we need is pretty minimal: we only demand Amir to have preferences between games that are in a precise sense coherent. They must be transitive, in the sense that if he would rather vote for Strache than to vote for Kern, and would prefer to vote for Kern over Kurz, than he must choose to vote for Strache over Kurz. He must also have definite preferences about any pair of games: either he thinks that Strache is better than Kurz, or that Kurz is better than Strache, or he is indifferent between them. He is not allowed to say that they are not comparable. Note that we are not judging whether his preferences are *politically* coherent, or whether voting for Strache is at all a good idea. The axiom is then:

**Ordering**: Amir’s preferences between games, written as $G \succeq G’$, define a total order in the set of games: if $G \succeq G’$, and $G’ \succeq G”$, then $G \succeq G”$. Moreover, for any two games $G$ and $G’$, either $G \succ G’$, or $G \sim G’$, or $G \prec G’$.

This means that the $\succeq$ behaves like the usual $\ge$ relation between real numbers6.

The second and last rationality assumption we shall use is rather stronger, but I think still pretty well-justified. We demand that Amir’s preferences between games must remain consistent while he plays: if he prefers game $G$ to game $G’$, he cannot change his mind if $G$ and $G’$ are offered as rewards inside another game:

**Consistency**: Let $\alpha \neq 0$, and consider the games \[\ket{F} = \alpha\ket{M_0}\ket{G} + \beta\ket{M_1}\ket{z}\] and \[\ket{F’} = \alpha\ket{M_0}\ket{G’} + \beta\ket{M_1}\ket{z},\] that differ only on the game given as a reward when the measurement result is $M_0$. Then $F \succeq F’$ iff $G \succeq G’$.

It is easy to check that **Consistency** actually implies all three relations $F \succ F’$ iff $G \succ G’,$ $F \sim F’$ iff $G \sim G’,$ and $F \prec F’$ iff $G \prec G’$.

These assumptions, together with **Indifference** and **Substitution**, are enough to imply the

**Born rule**: Suppose you are rational, and consider the games

\[\ket{G} = \sum_i \alpha_i\ket{M_i}\ket{z_i}\quad\text{and}\quad\ket{G’} = \sum_i \beta_i\ket{D_i}\ket{w_i}.\] Then there exists a function $u$ such that

\[ u(G) = \sum_i |\alpha_i|^2 u(z_i)\] and \[G \succ G’ \iff u(G) > u(G’) \] Moreover, $u$ is unique modulo the choice of a zero and a unity.

This theorem says that you are free to decide your preferences between the rewards: these will define their utility. Your freedom ends here, however: the probabilities that you assign to obtaining said rewards must be given by the Born rule, on pain of irrationality.

A comment is also in order about the uniqueness: the choice of a zero and a unity is analogous to the one that must be done in a temperature scale. In the Celsius scale, for example, the zero is chosen as the freezing point of the water, and the unity as $1/100$ the difference between the freezing point and the boiling point. In the Fahrenheit scale, the zero is chosen as the coldest temperature in Gdańsk’s winter, and the unity as $1/96$ the difference between the temperature of Gdańsk’s winter and the temperature of the blood of a healthy male. In any case, the choice of these two values define the temperature scale uniquely, and the same is true for utility, as implied by the following theorem:

**Uniqueness**: If $u$ is a utility, then $\mathcal F(u)$ is a utility if and only if $\mathcal F(u) = au+b$ for some real numbers $a,b$ such that $a>0$.

The proof of the ‘if’ direction is easy: just note that \[\mathcal F(u(G)) = a\sum_i |\alpha_i|^2 u(z_i) + b = au(G)+b,\] and that such positive affine transformations preserve the ordering of real numbers. The proof of the ‘only if’ direction is not particularly hard, but it is a bit longer and I shall skip it2. Since the choice of a value for the utility at two rewards $x$ and $y$ is enough to fix $a$ and $b$, the claim follows.

But enough foreplay, now we need to start proving the Born rule theorem in earnest. We’ll build it out of two lemmas: **Slider**, that says that the weights of a game with rewards $x$ and $y$ behave like a tuner for the preferences, and **Closure**, that says that as we move this slider we are bound to hit any reward between $x$ and $y$.

**Slider**: Let $x$ and $y$ be rewards such that $x \succ y$, and consider the games

\[\ket{G} = \sqrt{p}\ket{M_0}\ket{x} + \sqrt{1-p}\ket{M_1}\ket{y}\] and

\[\ket{G’} = \sqrt{q}\ket{M_0}\ket{x} + \sqrt{1-q}\ket{M_1}\ket{y}.\] Then $G \succ G’$ iff $p > q$.

Proof: suppose $p > q$. Then we can define the games

\[ \ket{F} = \sqrt{q}\ket{M_0}\ket{x} + \sqrt{p-q}\ket{M_1}\ket{x} + \sqrt{1-p}\ket{M_2}\ket{y}\] and

\[ \ket{F’} = \sqrt{q}\ket{M_0}\ket{x} + \sqrt{p-q}\ket{M_1}\ket{y} + \sqrt{1-p}\ket{M_2}\ket{y}.\]

Note that the weights of rewards $x$ and $y$ in the game $F$ are $p$ and $1-p$, and in the game $F’$ they are $q$ and $1-q$, so by **Equivalence** we have that $F \sim G$ and $F’ \sim G’$. Since **Consistency** implies that $F \succ F’$, transitivity gives us $G \succ G’$. To prove the other direction, note that $p = q$ implies directly that $G \sim G’$, and $p < q$ implies $G \prec G'$ by the flipped argument.

**Closure**: Let $x,y$, and $z$ be rewards such that $x \succ y$ and $x \succeq z \succeq y$, and let

\[\ket{G_p} = \sqrt{p}\ket{M_0}\ket{x} + \sqrt{1-p}\ket{M_1}\ket{y}.\]

Then there exists a unique $p_z$ such that\[z \sim \ket{G_{p_z}}.\]

Proof: since $\succeq$ is a total order, for all $\rho$ it must be the case that either \[z\succ \ket{G_p}, \quad z \sim \ket{G_p},\quad\text{or}\quad z \prec \ket{G_p}.\]Moreover, **Slider** tells us that there exists a critical $p_z$ such that

\begin{align*}

p > p_z \quad &\Rightarrow \quad \ket{G_p} \succ z \\

p < p_z \quad &\Rightarrow \quad \ket{G_p} \prec z
\end{align*}
Then some continuity argument will conclude that $z \sim \ket{G_{p_z}}$.

Now for the main proof: Let $x$ and $y$ be fixed rewards such that $x \succ y$. Set $u(x)$ and $u(y)$ to be any real numbers such that $u(x) > u(y)$, defining the unity and the zero of the utility function3. Now because of **Closure** for every reward $z$ such that $x \succeq z \succeq y$ there will be a unique number $p_z$ such that

\[ z \sim \sqrt{p_z}\ket{M_0}\ket{x} + \sqrt{1-p_z}\ket{M_1}\ket{y}.\] Define then

\[ u(z) = p_z u(x) + (1-p_z) u(y). \] We want to show that the utilities so defined do represent the preferences between any two rewards $z$ and $w$ in the sense that $z \succ w$ iff $u(z) > u(w)$. Suppose that $u(z) > u(w)$. This is the case iff $p_z > p_w$, which by **Slider** is equivalent to \[\sqrt{p_z}\ket{M_0}\ket{x} +\sqrt{1-p_z}\ket{M_1}\ket{y} \succ \sqrt{p_w}\ket{M_0}\ket{x} + \sqrt{1-p_w}\ket{M_1}\ket{y},\] which is equivalent to $z \succ w$.

Now we want to show that for any game \[\ket{G} = \sqrt{q}\ket{M_0}\ket{z} + \sqrt{1-q}\ket{M_1}\ket{w}\] its utility is given by \[ u(G) = q u(z) + (1-q) u(w),\] as advertised. By **Consistency**, we can replace $z$ and $w$ in $G$ by their equivalent games, and we have that

\begin{multline}

\ket{G} \sim \sqrt{q p_z}\ket{M_0}\ket{M_0}\ket{x} + \sqrt{q(1-p_z)}\ket{M_0}\ket{M_1}\ket{y} + \\ \sqrt{(1-q)p_w}\ket{M_1}\ket{M_0}\ket{x} +\sqrt{(1-q)(1-p_w)}\ket{M_1}\ket{M_1}\ket{y}. \end{multline} By **Equivalence**,

\[\ket{G} \sim \sqrt{\lambda p_z + (1-q)p_w}\ket{M_0}\ket{x} + \sqrt{q(1-p_z)+(1-q)(1-p_w)}\ket{M_1}\ket{y},\]

and since $x \succeq G \succeq y$, its utility is given by the above formula, so

\begin{align*}

u(G) &= (q p_z + (1-q)p_w)u(x) + (q(1-p_z)+(1-q)(1-p_w))u(y)\\

&= q u(z) + (1-q) u(w),

\end{align*} as we wanted to show.

With this argument we have proved the Born rule theorem for any game inside the interval $[y,x] = \{z: x\succeq z \succeq y\}$. This would be enough if we were to assume that the set of rewards was something so lame, but since we want to be deal with more interesting reward sets – like $\mathbb R$ – we cannot stop now. It is fortunately not hard to complete the proof: consider a sequence of intervals $[y_i,x_i]$ such that all of them contain $[y,x]$ and that their union equals the set of rewards. By the above proof, in each such interval there exists a utility function $f_i$ that satisfies the requirements. We want to show that these functions agree with each other, and as such define a unique utility over the whole set of rewards. For that, consider a reward $z$ in $[x_i,y_i]\cap [x_j,y_j]$ for some $i,j$. Then it must be the case that either $x\succeq z \succeq y$, or $x\succ y \succ z$, or $z \succ x \succ y$. By **Closure**, there exists unique $p_z,p_y$, and $p_x$ such that

\begin{align*}

z &\sim \sqrt{p_z}\ket{M_0}\ket{x} + \sqrt{1-p_z}\ket{M_1}\ket{y}, \\

y &\sim \sqrt{p_y}\ket{M_0}\ket{x} + \sqrt{1-p_y}\ket{M_1}\ket{z}, \\

x &\sim \sqrt{p_x}\ket{M_0}\ket{z} + \sqrt{1-p_x}\ket{M_1}\ket{y}. \\

\end{align*}Since $f_i$ and $f_j$ are utilities over this interval, we must have that for $k=i,j$

\begin{align*}

f_k(z) &= p_zf_k(x) + (1-p_z)f_k(y), \\

f_k(y) &= p_yf_k(x) + (1-p_y)f_k(z), \\

f_k(x) &= p_xf_k(z) + (1-p_x)f_k(y). \\

\end{align*}Now, we use our freedom to set the zero and the unity of the utilities to choose $f_k(y) = u(y)$ and $f_k(x) = u(x)$, taking these equations to

\begin{align*}

f_k(z) &= p_zu(x) + (1-p_z)u(y), \\

u(y) &= p_yu(x) + (1-p_y)f_k(z), \\

u(x) &= p_xf_k(z) + (1-p_x)u(y), \\

\end{align*}which uniquely define $f_k(z)$ in all three situations, implying that $f_i(z)=f_j(z)$. Setting $u(z)$ to be this common value, we have defined a unique utility function over the whole set of rewards, and we’re done.