I was interested in his most recent derivation, that besides structural assumptions about measurements and probabilities, needs two substantial assumptions: no-signalling and the possibility of state estimation, or *state estimation* for brevity. No-signalling is well-motivated and well-understood, but I was curious about *state estimation*. What does it mean? How does a theory that violates it looks like?

The precise definition is that *state estimation* is true if there is a finite set of measurement outcomes1 whose probabilities completely determine the quantum state. Or conversely, if *state estimation* fails, then for any finite set of measurement outcomes there are two different quantum states that give the same probabilities for all these outcomes. This is clearly not obeyed by quantum mechanics in the case of infinite-dimensional systems — you need to know the probability at each point in space to completely determine the wavefunction, which is an infinite set of outcomes2 — so the authors require it only for finite-dimensional systems.

How bad is it violate it for finite-dimensional systems, then? What can you learn about the quantum state with a reasonably small number of measurement outcomes? A good approximation, or would you have little idea about what the quantum state is? It seems that the former is the case. To illustrate that, we came up with a rather artificial theory where the measurements allow you to deterministically read off bits from some representation of the quantum state; for the case of a qubit $\ket{\psi}=\cos\theta\ket{0}+e^{i\varphi}\sin\theta\ket{1}$ a measurement would tell you the $n$th bit of $\theta$ or $\varphi$. It is clear that this theory violates *state estimation*: for any finite set of measurements there will be a largest $n$ that they can reach, and therefore any pair of quantum states that differ on bits higher than $n$ will be indistinguishable for this set of measurements. It is also clear that this violation: with only $2n$ measurements we can get a $n$-bit approximation for any qubit, which is much better than what can be done in reality! In reality when need about $2^n$ measurements to estimate the probabilities, and therefore the amplitudes, with such an accuracy.

This already tells us that *state estimation* is too strong; it needs at least to be qualified somehow in order to exclude the deterministic theory above. What does it mean in probabilistic theories, though? An often considered toy theory is one where structure of quantum mechanics is kept as it is, but the exponent in the Born rule is changed from $2$ to some $n$. More precisely, let the probability of obtaining outcome $i$ when measuring the state $\psi$ in the orthogonal basis $\{\ket{e_i}\}$ be \[ p(i|\psi) = \frac{|\langle e_i|\psi\rangle|^n}{\sum_{i’}|\langle e_{i’}|\psi\rangle|^n}. \]An interesting feature of this theory is that a finite set of measurement outcomes can distinguish all pure states (in fact the same measurements that distinguishes them in quantum theory), so *state estimation* can only fail here for mixed states.

A nice example is the pair of ensembles

\[\omega_A = \{(p,\ket{0}),(1-p,\ket{1})\}\] and \[\omega_B = \{(1/2,p^\frac1n\ket{0}+(1-p)^\frac1n\ket{1}),(1/2,p^\frac1n\ket{0}-(1-p)^\frac1n\ket{1})\}.\] In quantum mechanics ($n=2$) they are equivalent, both being represented by the density matrix

\[ \rho = \begin{pmatrix} p & 0 \\ 0 & 1-p \end{pmatrix}. \] If $n\neq 2$, though, they are not equivalent anymore, even though they give the same probabilities for any measurements in the X, Y, and Z basis3. To distinguish them we just need to measure the ensembles in the basis \[\{p^\frac1n\ket{0}+(1-p)^\frac1n\ket{1},(1-p)^\frac1n\ket{0}-p^\frac1n\ket{1}\}.\] The probability of obtaining the first outcome for ensemble $\omega_A$ is $p^2 +(1-p)^2$, and for ensemble $\omega_B$ it is some complicated expression that depends on $n$.

Now this is by no means a proof4, but it makes me suspect that it will be rather easy to distinguish any two ensembles that are not equivalent, by making a measurement that contains one of the pure states that was mixed in to make the ensemble. Then if we divide the Bloch sphere in a number of regions, assigning a measurement to cover each such region, we do that with a good enough approximation. Unlike the deterministic theory explored above, in this toy theory it is clearly more laborious to do state estimation than in quantum mechanics, but is still firmly within the real of possibility.

What now, then? If the possibility of state estimation is not a good assumption from which to derive the Born rule, is there a derivation in this operational framework that follows from better assumptions? It turns out that Galley himself has such a derivation, based only on similar structural assumptions together with no-signalling and purification, with no need for *state estimation*. But rather ironically, here the roles flip: while I find purification an excellent axiom to use, Galley is not a fan.

Let me elaborate. Purification is the assumption that every mixed state (like the ensembles above) is obtained by ignoring part of a pure state. It implies then that there are no “external” probabilities in the theory; if you want to flip a coin in order to mix two pure states, you better model that coin inside the theory, and as a pure state. Now Galley doesn’t find purification so nice: for once, because classical theories fail purification, and also because it feels like postulating that your theory is universal, which is a big step to take, in particular when the theory in question is quantum mechanics.

Well, I find that classical theories failing purification is just one more example in a huge pile of examples of how classical theories are wrong. In this particular case they are wrong by being essentially deterministic, and only allowing for probabilities when they are put there by hand. About postulating the universality of the theory, indeed that is a big assumption, but so what? I don’t think good assumptions need to be self-evidently true, I just think they should be well-motivated and physically meaningful.

*Addendum*: A natural question to ask is whether both no-signalling and purification are necessary in such a derivation. It turns that the answer is yes: the toy theory where the exponent in the Born rule is $n$ respects purification, when extended in the obvious way for composite systems, but violates no-signalling, and Galley’s rule respects no-signalling but violates purification.

So, we are interested in the total amount of CO$_2$ that a vehicle emits over its lifetime. It is the emissions caused by producing it in the first place, $P$, plus the amount of CO$_2$ it emits per km $\eta$ times the distance $L$ it travels over its lifetime: $P + \eta L$. To get a number that is easier to relate with, we divide everything by $L$ and get the effective emissions per km $\frac{P}{L}+\eta$. We want to compare a diesel vehicle with an electric vehicle, so we want to know whether

\[\frac{P_E}{L}+\eta_E\quad\text{or}\quad\frac{P_D}{L}+\eta_D\] is bigger.

Assume that $P_E > P_D$, because of the extra emissions needed to produced the battery of the electric vehicle, and that $\eta_E < \eta_D$, as it is much more efficient to extract energy from oil in a big power plant than in an internal combustion engine 5.

Now, what could you do to make the electric vehicles look bad? Well, since their production causes more emissions, you want to emphasise that in the equation, and since they emit less CO$_2$ when running, you want to downplay that. How? We have three variables, so we have three ways of manipulating the numbers: we can multiply $P_E$ and $P_D$ by some large number $n_P$ (e.g. by assuming that the factories producing the cars are powered purely by oil shale), we can divide $\eta_E$ and $\eta_D$ by some large number $n_\eta$ (e.g. by assuming the cars are ran always at maximal efficiency), and we can divide $L$ by some large number $n_L$ (assuming that car is scrapped after a few kilometres).

What is the effect of doing that? If the real numbers say that electric vehicles are better, that is, that

\[\frac{P_E}{L}+\eta_E < \frac{P_D}{L}+\eta_D,\]which is equivalent to

\[ \frac{P_E-P_D}{L(\eta_D-\eta_E)} < 1,\]then the manipulations of the previous paragraph imply in multiplying the left hand side of this inequality by $n_Pn_Ln_\eta$; if we want to flip it we just need to make $n_P,n_L,$ and $n_\eta$ large enough so that \[ n_Pn_Ln_\eta\frac{P_E-P_D}{L(\eta_D-\eta_E)} > 1.\]

And what the authors of the study did? All of the above. Most interestingly, they used the NEDC driving cycle to calculate $\eta_D$ and $\eta_E$, a ridiculously efficient driving cycle that has been discarded in favour of the less unrealistic WLTC. They did this because WLTC numbers hadn’t yet been released for the Tesla Model 3, the electric car they used for the comparison. They claim that this is not a problem, because NEDC favours city driving, where electric cars excel, so if anything this incorrect assumption would be tilting the scales in favour of electric cars. As we have seen, though, this is not the case: pretending that the cars are more efficient than they are tilts the scales in favour of the diesels.

Another mistake the authors made is to assume that cars only last 10 years or 150.000 km before going to the junkyard, which is about half of the actual number. Again this tilts the scales if favour of the diesels, as the production of electric cars causes more emissions. The reason they made this mistake is because they assumed that the battery of an electric car would only last this much, which is false for two reasons: first because a Tesla battery retains more than 90% of its capacity after 250,000 km, hardly junkyard material, and second because batteries that have in fact degraded too much to be useful in a car, say retaining only 70% of their capacity, do not go to the junkyard, but instead are reused for applications where the energy/weight ratio doesn’t matter, like grid storage.

The third mistake the authors made is exaggerating the emissions caused by production, using a discredited study that claimed that producing the lithium-ion battery causes 145 kg CO$_2$/kWh. The peer-reviewed number I could find is 97 kg CO$_2$/kWh for production in China. Even that seems too high, though, as Tesla’s batteries are produced in the Gigafactory 1 in Nevada, which has a cleaner energy mix, and should eventually be powered fully by rooftop solar. One thing that might look like a mistake but isn’t is that the authors don’t consider the emissions caused by producing the components that are common to both electrics and diesels2: wheels, body, seats, etc. Of course, ignoring that means that the number you get is not effective emissions per km, but it doesn’t change which car is the best, as that depends only on the difference $P_E-P_D$.

With the theory done, let’s get to the numbers. The authors use $P_E = 10,875,000$ gCO$_2$, $\eta_E = 83$ gCO$_2$/km, $P_D = 0$ gCO$_2$, $\eta_D = 144$ gCO$_2$/km, and $L=150,000$ km, which results in the effective emissions

\[ E = 155\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 144\text{ gCO}_2/\text{km},\]their absurd conclusion that electric vehicles emit more CO$_2$. Now what I find amazing is that this conclusion requires all three mistakes working together; correct any of the three and it flips.

First we correct $\eta_E$ and $\eta_D$ using the WLTC numbers (which are still too optimistic, but are the best I’ve got), which are already available for both the Model 3 (16 kWh/100 km) and the Mercedes C 220 d (5.1 l/100 km)3, resulting in $\eta_E = 88$ gCO$_2$/km and $\eta_D = 163$ gCO$_2$/km, and the effective emissions

\[ E = 160\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 163\text{ gCO}_2/\text{km}.\] Next we keep the wrong $\eta_E$ and $\eta_D$ and just correct $L$, setting it to $250,000$ km, resulting in the effective emissions

\[ E = 126\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 144\text{ gCO}_2/\text{km}.\] Next we keep the wrong $\eta_E, \eta_D$, and $L$, correcting only the emissions caused by the production of the battery. Putting 97 kg CO$_2$/kWh results in $P_E = 7,275,000$ gCO$_2$ and effective emissions

\[ E = 131\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 144\text{ gCO}_2/\text{km}.\] To finalize, let’s calculate the true numbers, correcting all three mistakes at once and also taking into account the emissions caused by producing the parts common in both vehicles. I couldn’t find a good number for that, just some estimates that put it around 20 tons of CO$_2$. Using this results in $P_E = 27,275,000$ gCO$_2$, $\eta_E = 88$ gCO$_2$/km, $P_D = 20,000,000$ gCO$_2$, $\eta_D = 163$ gCO$_2$/km, and $L=250,000$ km, and effective emissions \[ E = 197\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 243\text{ gCO}_2/\text{km}.\]

It doesn’t look very impressive, though. Only 19% less emissions? Is all the trouble worth it? The point is that none of the emissions of electric vehicles are necessary: as the grid cleans up both their production and operation will be CO$_2$-free. Diesels, though, will always burn diesel, so at best they will cause only the tailpipe emissions4, and the ultimate numbers will be \[ E = 0\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 135\text{ gCO}_2/\text{km}.\] There is no need to wait, though: electric vehicles are better for the environment than diesels. Not in the future, not depending on magical technologies, not in Norway, but right here, and right now. And this is only about CO$_2$ emissions; electric vehicles also have the undeniable benefit of not poisoning the atmosphere in densely populated cities.

]]>Which is really bizarre argument. Yes, frequentism is nonsense, and yes, subjective probability makes perfect sense. But that’s all that is true about it. No, objective probability is not the same thing as frequentism, and no, subjective probability is not the only probability that exists. Come on, that’s denying the premise! The question is interesting precisely because we strongly believe that objective probability exists; either because of quantum mechanics, or more directly from the observation of radioactive decay. Does anybody seriously believe that whether some atom decays or not depends on the opinion of an agent? There even existed natural nuclear reactors, where chain reactions occurred much before any agent existed to wonder about them.

In any case, it seems that philosophers won’t do anything about it. What can we say about objective probability, though? It is easy to come up with some desiderata: it should to be *objective*, to start with. The probability of some radioactive atom decaying should just be a property of the atom, not a property of some agent betting about it. Agents and bets are still important, though, as it should make sense to bet according to the objective probabilities. In other words, Lewis’ Principal Principle should hold: rational agents should set their subjective probabilities to be equal to the objective probabilities, if the latter are known5. Last but not least, objective probabilities should be connected to relative frequencies via the law of large numbers, that is, we need that

\[ \text{Pr}(|f_N-p|\ge\varepsilon) \le 2e^{-2N\varepsilon^2}, \] or, in words, the (multi-trial) probability that the frequency deviates more than $\varepsilon$ from the (single-trial) probability after $N$ trials goes down exponentially with $\varepsilon$ and $N$ 2.

I think it is also easy to come up with a definition of objective probability that fulfills these desiderata, if we model objectively random processes as *deterministic* branching processes. Let’s say we are interested the decay of an atom. Instead of saying that it either decays or not, we say that the world *branches* in several new worlds, in some of which the atom decays, and some of which it does not. Moreover, we say that we can somehow count the worlds, that is, that we can attribute a measure $\mu(E)$ to the set of worlds where event $E$ happens and a measure $\mu(\neg E)$ to the set of worlds where event $\neg E$ happens. Then we say that the objective probability of $E$ is

\[p(E) = \frac{\mu(E)}{\mu(E)+\mu(\neg E)}.\] Now, before you shut off saying that this is nonsense, because the Many-Worlds interpretation is false, so we shouldn’t consider branching, let me introduce a toy theory where this deterministic branching is *literally true* by fiat. In this way we can separate the question of whether the Many-Worlds interpretation is true from the question of whether deterministic branching explains objective probability.

This toy theory was introduced by Adrian Kent to argue that probability makes no sense in the Many-Worlds interpretation. Well, I think it is a great illustration of how probability actually makes perfect sense. It goes like this: the universe is a deterministic computer simulation3 where some agents live. In this universe there is a wall with two lamps, and below each a display that shows a non-negative integer. This wall also has a “play” button, that when pressed makes either of the lamps light up.

The agents there can’t really predict which lamp will light up, but they have learned two things about how the wall works. The first is that if the number below a lamp is zero, that lamp never lights up. The second is that if the numbers are set to $n_L$ and $n_R$, respectively, and they press “play” multiple times, the fraction of times where the left lamp lights up is often close to $n_L/(n_L+n_R)$.

What is going on, of course, is that when “play” is pressed the whole computer simulation is deleted and $n_L+n_R$ new ones are initiated, $n_L$ with the left lamp lit, and $n_R$ with the right lamp lit. My proposal is to define the objective probability of some event as the proportion of simulations where this event happens, as this quantity fulfills all our desiderata for objective probability.

This clearly fulfills the “objectivity” desideratum, as a proportion of simulations is a property of the world, not some agent’s opinion. It also respects the “law of large numbers” desideratum. To see that, fist notice that for a single trial the proportion of simulations where the left lamp lights up is

\[p(L) = \frac{n_L}{n_L+n_R}.\] Now the number of simulations where the left lamp lights up $k$ times out of $N$ trials is

\[ {N \choose k}n_L^kn_R^{N-k},\] so if we divide by total number of simulations $(n_L+n_R)^N$, we see that the proportion of simulations where the left lamp lit $k$ times out of $N$ is given by \[\text{Pr}(N,k) = {N \choose k}p(L)^k(1-p(L))^{N-k}.\]Since this is formally identical to the binomial distribution, it allows us to prove a theorem formally identical to the law of large numbers:

\[ \text{Pr}(|k/N-p(L)|\ge\varepsilon) \le 2e^{-2N\varepsilon^2}, \]which says that the (multi-trial) proportion of simulations where the frequency deviates more than $\varepsilon$ from the (single-trial) proportion of simulations after $N$ trials goes down exponentially with $\varepsilon$ and $N$.

Last but not least, to see that if fulfills the “Principal Principle” desideratum, we need to use the decision-theoretic definition of subjective probability: the subjective probability $s(L)$ of an event $L$ is the highest price a rational agent should pay to play a game where they receive $1$€ if event $L$ happens and nothing otherwise. In the $n_L$ simulations where the left lamp lit the agent ends up with $(1-s(L))$ euros, and in the $n_R$ simulations where the right lamp lit the agent ends up with $-s(L)$ euros. If the agent cares equally about all its future selves, they should accept to pay $s(L)$ as long as \[(1-s(L))n_L-s(L)n_R \ge 0,\]which translates to \[s(L) \le \frac{n_L}{n_L+n_R},\] so indeed the agent should bet according to the objective probability if they know $n_L$ and $n_R$4.

And this is it. Since it fulfills all our desiderata, I claim that deterministic branching does explain objective probability. Furthermore, it is the only coherent explanation I know of. It is hard to argue that nobody will ever come up with a single-world notion of objective probability that makes sense, but at least in one point such a notion will always be unsatisfactory: why would something be in principle impossible to predict? Current answers are limited to saying that quantum mechanics say so, or that if we could predict the result of a measurement we would run into trouble with Bell’s theorem. But that’s not really an explanation, it’s just saying that there is no alternative. Deterministic branching theories do offer an explanation, though: you cannot predict which outcome will happen because all will.

Now the interesting question is whether this argument applies to the actual Many-Worlds interpretation, and we can get a coherent definition of objective probability there. The short answer is that it’s complicated. The long answer is the paper I wrote about it =)

]]>I’m happy with how the article turned out (no bullshit, conveys complex concepts in understandable language, quotes me ;), but there is a point about it that I’d like to nitpick: Ball writes that it was not “immediately obvious” whether the probabilities should be given by $\psi$ or $\psi^2$. Well, it might not have been immediately obvious to Born, but this is just because he was not familiar with Schrödinger’s theory2. Schrödinger, on the other hand, was very familiar with his own theory, and in the very paper where he introduced the Schrödinger equation he discussed at length the meaning of the quantity $|\psi|^2$. He got it wrong, but my point here is that he *knew* that $|\psi|^2$ was the right quantity to look at. It was obvious to him because the Schrödinger evolution is unitary, and absolute values squared behave well under unitary evolution.

Born’s contribution was, therefore, not mathematical, but conceptual. What he introduced was not the $|\psi|^2$ formula, but the idea that this is a probability. And the difficulty we have with the Born rule until today is conceptual, not mathematical. Nobody doubts that the probability must be given by $|\psi|^2$, but people are still puzzled by these high-level, ill-defined concepts of probability and measurement in an otherwise reductionist theory. And I think one cannot hope to understand the Born rule without understanding what probability is.

Which is why I don’t think the papers of Masanes et al. and Cabello can *explain* the Born rule. They refuse to tackle the conceptual difficulties, and focus on the mathematical ones. What they can explain is why quantum theory immediately goes down in flames if we replace the Born rule with anything else. I don’t want to minimize this result: it is nontrivial, and solves something that was bothering me for a long time. I’ve always wanted to find a minimally reasonable alternative to the Born rule for my research, and now I know that there isn’t one.

This is what I like, by the way, in the works of Saunders, Deutsch, Wallace, Vaidman, Carroll, and Sebens. They tackle the conceptual difficulties with probability and measurement head on3. I’m not satisfied with their answers, for several reasons, but at least they are asking the right questions.

]]>The well-known argument by Frauchiger and Renner about the consistency of quantum mechanics has finally been published (in Nature Communications). With publication came a substantial change to the conclusion of the paper: while the old version claimed that “no single-world interpretation can be logically consistent”, the new version claims that “quantum theory cannot be extrapolated to complex systems” or, to use the title, that “quantum theory cannot consistently describe the use of itself”.

This is clearly bollocks. We need to find out, though, where exactly has the argument gone wrong. Several discussions popped up on the internet to do so, for example in Scott Aaronson’s blog, but to my surprise nobody pointed out the obvious mistake: the predictions that Frauchiger and Renner claim to follow from quantum mechanics do not actually follow from quantum mechanics. In fact, they are outright wrong.

For example, take the first of the predictions that appear on Table 3 of the paper. $\bar{\text{F}}$ measures $r=\text{tails}$ and claims: “I am certain that W will observe $w = \text{fail}$ at time $n$:$31$”. By assumption, though, $\bar{\text{F}}$ is in an isolated laboratory and their measurement is described by a unitary transformation. This implies that the state of lab L at time $n$:$30$ will be given either by

\[ \frac{3}{\sqrt{10}}\ket{\text{fail}}_\text{L} + \frac{1}{\sqrt{10}}\ket{\text{ok}}_\text{L}\quad\text{or}\quad\frac{1}{\sqrt{2}}\ket{\text{fail}}_\text{L} – \frac{1}{\sqrt{2}}\ket{\text{ok}}_\text{L},\]depending on the result of $\bar{\text{W}}$’s measurement. Therefore, it is not certain that W will observe $w = \text{fail}$; this will happen with probability $9/10$ or $1/2$, respectively.

To obtain the prediction the authors write in Table 3, one would need to assume that $\bar{\text{F}}$’s measurement caused a collapse of the state of their laboratory – contrary to the assumption of unitarity. In this case, the state at time $n$:$30$ would in fact be given by

\[ \ket{\text{fail}}_\text{L},\]independently of the result of $\bar{\text{W}}$’s measurement, and W would indeed observe $w = \text{fail}$ with certainty. But then W would never observe $w = \text{ok}$, and the paradox desired by the authors would never emerge.

To make this point more clear, I will describe how precisely the same problem arises in the original Wigner’s friend *gedankenexperiment*, so that people who are not familiar with Frauchiger and Renner’s argument can follow it. It goes like this:

Wigner is outside a perfectly isolated laboratory, and inside it there is a friend who is going to make a measurement on a qubit. Their initial state is

\[\ket{\text{Wigner}}\ket{\text{friend}}\frac{\ket{0}+\ket{1}}{\sqrt2}.\]If we assume that the measurement of the friend is a unitary transformation, after the measurement their state becomes

\[\ket{\text{Wigner}}\frac{\ket{\text{friend}_0}\ket{0} + \ket{\text{friend}_1}\ket{1}}{\sqrt2}.\]Now the friend is asked to predict what Wigner will observe if he makes a measurement on the qubit. Frauchiger and Renner claim that, using quantum mechanics, the friend can predict that “If I observed 0, then Wigner will observe 0 will certainty”4.

Wait, what? The quantum prediction is clearly that Wigner will observe 0 with probability 1/2. The claimed prediction only follows if we assume that the friend’s measurement caused a collapse.

And both assumptions are fine, actually. If there is no collapse, the prediction of 0 with probability 1/2 is correct and leads to no inconsistency, and if there is a collapse the prediction of 0 with probability 1 is correct and leads to no inconsistency. We only get an inconsistency if we insist that from the point of view of the friend there is a collapse, from the point of view of Wigner there is no collapse, and somehow both points of view are correct.

**Update:** After a long discussion with Renato, I think I understand his point of view. He thinks that this assumption of “collapse and no collapse” is just part of quantum mechanics, so it doesn’t need to be stated separately. Well, I think this is one hell of an unstated assumption, and in any case hardly part of the consensus about quantum mechanics. More technically, I think Frauchiger and Renner’s formalization of quantum mechanics — called [Q] — does not imply “collapse and no collapse”, it is too vague for that, so there is really a missing assumption in the argument.

This is of course nonsense. Bell’s theorem is not only a rather simple piece of mathematics, with a few-lines proof that can be understood by high-school students, but also the foundation of an entire field of research — quantum information theory. It has been studied, verified, and improved upon by thousands of scientists around the world.

The form of Bell’s theorem that is relevant for the article at hand is that for all probability distributions $\rho(\lambda)$ and response functions $A(a,\lambda)$ and $B(b,\lambda)$ with range $\{-1,+1\}$ we have that

\begin{multline*}

-2 \le \sum_\lambda \rho(\lambda) \Big[A(a_1,\lambda)B(b_1,\lambda)+A(a_1,\lambda)B(b_2,\lambda) \\ +A(a_2,\lambda)B(b_1,\lambda)-A(a_2,\lambda)B(b_2,\lambda)\Big] \le 2

\end{multline*}

The author’s proposed counterexample? It’s described in equations (3.48) and (3.49): A binary random variable $\lambda$ that can take values $-1$ or $+1$, with $\rho(-1)=\rho(+1)=1/2$, and response functions $A(a,\pm1)=\pm1$ and $B(b,\pm1)=\mp1$. That’s it. Just perfectly anti-correlated results, that do not even depend on the local settings $a$ and $b$. The value of the Bell expression above is simply $-2$.

Now how could Open Science let such trivial nonsense pass? They do provide the “Review History” of the article, so we can see what happened: there were two referees that pointed out that the manuscript was wrong, one that was unsure, and two that issued a blanket approval without engaging with the contents. And the editor decided to accept it anyway.

What now? Open Science can recover a bit of its reputation by withdrawing this article, as Annals of Physics did with a previous version, but I’m never submitting an article to them.

]]>Before I start ranting about what I find so objectionable about it, I’ll present the proof of this version of Bell’s theorem the best I can. So, what is counterfactual definiteness? It is the assumption that not only the measurement you did in fact do has a definite answer, but also the measurement you did *not* do has a definite answer. If feels a lot like determinism, but it is not really the same thing, as the assumption is silent about *how* the result of the counterfactual measurement is determined, it just says that it *is*. To be more clear, let’s take a look at the data that comes from a real Bell test, the Delft experiment:2

N | $x$ | $y$ | $a$ | $b$ |
---|---|---|---|---|

1 | 0 | 0 | 1 | 1 |

2 | 0 | 0 | 0 | 0 |

3 | 1 | 1 | 1 | 0 |

4 | 1 | 1 | 0 | 1 |

5 | 0 | 0 | 1 | 1 |

6 | 1 | 1 | 1 | 0 |

7 | 0 | 0 | 1 | 0 |

8 | 1 | 0 | 1 | 1 |

9 | 0 | 0 | 1 | 1 |

10 | 0 | 1 | 0 | 0 |

The first column indicates the rounds of the experiment, the $x$ and $y$ columns indicate the settings of Alice and Bob, and the $a$ and $b$ columns the results of their measurements. If one assumes counterfactual definiteness, then definite results must also exist for the measurements that were *not* made, for example in the first round there must exist results corresponding to the setting $x=1$ for Alice and $y=1$ for Bob. This data would then be just part of some more complete data table, for example this:

N | $a_0$ | $a_1$ | $b_0$ | $b_1$ |
---|---|---|---|---|

1 | 1 | 0 | 1 | 1 |

2 | 0 | 1 | 0 | 1 |

3 | 1 | 1 | 0 | 0 |

4 | 1 | 0 | 1 | 1 |

5 | 1 | 1 | 1 | 1 |

6 | 1 | 1 | 0 | 0 |

7 | 1 | 1 | 0 | 0 |

8 | 1 | 1 | 1 | 0 |

9 | 1 | 0 | 1 | 0 |

10 | 0 | 0 | 1 | 0 |

In this table the column $a_0$ has the results of Alice’s measurements when her setting is $x=0$, and so on. The real data points, corresponding to the Delft experiment, are in black, and I filled in red the hypothetical results for the measurements that were not made.

What is the problem with assuming counterfactual definiteness, then? A complete table certainly exists. But it makes it possible to do something that wasn’t before: we can evaluate the entire CHSH game in every single round, instead of having to choose a single pair of settings. As a quick reminder, to win the CHSH game Alice and Bob must give the same answers when their settings are $(0,0)$, $(0,1)$, or $(1,0)$, and give different answers when their setting is $(1,1)$. In other words, they must have $a_0=b_0$, $a_0=b_1$, $a_1=b_0$, and $a_1 \neq b_1$. But if you try to satisfy all these equations simultaneously, you get that $a_0=b_0=a_1 \neq b_1 = a_0$, a contradiction. At most, you can satisfy 3 out of the 4 equations3. Then since in every row the score in the CHSH game is at most $3/4$, if we sample randomly from each row a pair of $a_x,b_y$ we have that

\[ \frac14(p(a_0=b_0) + p(a_0=b_1) + p(a_1=b_0) + p(a_1\neq b_1)) \le \frac34,\]

which is the CHSH inequality.

But if you select the actual Delft data from each row, the score will be $0.9$. Contradiction? Well, no, because you didn’t sample randomly, but just chose $1$ out of $4^{10}$ possibilities, which would happen with probability $1/4^{10} \approx 10^{-6}$ if you actually did it randomly. One can indeed violate the CHSH inequality by luck, it is just astronomically unlikely.

Proof presented, so now ranting: what is wrong with this version of the theorem? It is just so *lame*! It doesn’t even explicitly deal with the issue of locality, which is fundamental in all other versions of the theorem4! The conclusion that one takes from it, according to Asher Peres himself, is that “Unperformed experiments have no results”. To which the man in the street could reply “Well, duh, of course unperformed experiments have no results, why are you wasting my time with this triviality?”. It leaves the reader with the impression that they only need to give up the notion that unperformed experiments have results, and they are from then on safe from Bell’s theorem. But this is not true at all! The other proofs of Bell’s theorem still hold, so you still need to give up either *determinism* or *no action at a distance*, if you consider the simple version, or unconditionally give up *local causality*, if you consider the nonlocal version, or choose between *generalised local causality* and living in a single world, if you consider the Many-Worlds version.

What about the mainstream interpretations, then? In Časlav’s neo-Copenhagen interpretation the measurement results are observer-dependent (otherwise this would be a rather schizophrenic paper). In QBism they are explicitly subjective2, as almost everything else. In Many-Worlds there isn’t a single observer after a measurement, but several of them, each with their own measurement result.

How can this be? Časlav’s argument is as simple as it gets in quantum foundations: Bell’s theorem. In its simple version, Bell’s theorem dashes the old hope that quantum mechanics could be made deterministic: if the result of a spin measurement were pre-determined, then you wouldn’t be able to win the CHSH game with probability higher than $3/4$, unless some hidden action-at-a-distance was going on. But let’s suppose you did the measurement. Surely now the weirdness is over, right? You left the quantum realm, where everything is fuzzy and complicated, and entered the classical realm, where everything is solid and clear. So solid and clear that if somebody else does a measurement on you, their measurement result will be pre-determined, right?

Well, if it were pre-determined, than people doing measurements on people doing measurements wouldn’t be able to win the CHSH game with probability higher than $3/4$, unless some hidden action-at-a-distance was going on. But if quantum mechanics holds at *every* scale, then again one can win it with probability $\frac{2+\sqrt{2}}{4}$.

This highlights the fundamental confusion in Frauchiger and Renner’s argument, where they consider which outcome some observer thinks that another observer will experience, but are not careful to distinguish the different copies of an observer that will experience different outcomes. I’ve reformulated their argument to make this point explicit here, and it works fine, but undermines their conclusion that in single-world but not many-world theories observers will make contradictory assertions about which outcomes other observers will experience. Well, yes, but the point is that this contradiction is resolved in many-world theories by allowing different copies of an observer to experience different outcomes, and this recourse is not available in single-world theories.

]]>First of all, this limit does not exist. If one makes an infinite sequence of zeroes and ones by throwing a fair coin (fudging away this pesky infinity again), calling the result of the $i$th throw $s_i$, the relative frequency after $n$ throws is

\[ f_n = \frac1n\sum_{i=1}^{n}s_i.\] What should then $\lim_{n\to\infty}f_n$ be? $1/2$? Why? All sequences of zeros and ones are equally possible – they are even equally probable! What is wrong with choosing the sequence $s = (0,0,0,\ldots)$? Or even the sequence $(0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,\ldots)$, whose frequencies do not converge to any number, but eternally oscillate between $0$ and $1$? If for some reason one chooses a nice3 sequence like $s=(0,1,0,1,0,1,\ldots)$, for which the limit does converge to $1/2$, what is wrong with reordering it to obtain $s’ = (s_1,s_3,s_2,s_5,s_7,s_4,\ldots)$ instead, with limit $1/3$?

No, no, no, you complain. It is true that all sequences are equiprobable, but most of them have limiting frequency $1/2$. Moreover, it is a theorem that the frequencies converge – it is the law of large numbers! How can you argue against a theorem?

Well, what do you mean by “most”? This is already a probabilistic concept! And according to which measure? It cannot be a fixed measure, otherwise it would say that the limiting frequency is *always* $1/2$, independently of the single-throw probability $p$. On the other hand, if one allows it to depend on $p$, one can indeed define a measure on the set of infinite sequences such that “most” sequences have limiting frequency $p$. A probability measure. So you’re not explaining the single-throw probability in terms of the limiting frequencies, but rather in terms of the probabilities of the limiting frequencies. Which is kind of a problem, if “probability” is what you wanted to explain in the first place. The same problem happens with the law of large numbers. Its statement is that

\[\forall \epsilon >0 \quad \lim_{n\to\infty}\text{Pr}(|f_n -p|\ge \epsilon) = 0,\] so it only says that the *probability* of observing a frequency different than $p$ goes to $0$ as the number of trial goes to infinity.

But enough with mocking frequentism. Much more eloquent dismissals have already been written, several times over, and as the Brazilian saying goes, one shouldn’t kick a dead dog. Rather, I want to imagine a world where frequentism is *true*.

What would it take? Well, the most important thing is to make the frequencies converge to the probability in the infinite limit. One also needs, though, the frequencies to be a good approximation to the probability even for a finite number of trials, otherwise empiricism goes out of the window. My idea, then, is to allow the frequencies to fluctuate within some error bars, but never beyond. One could, for example, take the $5\sigma$ standard for scientific discoveries that particle physics use, and declare it to be a fundamental law of Nature: it is only possible to observe a frequency $f_n$ if

\[f_n \in \left(p-5\frac{\sigma}{\sqrt{n}},p+5\frac{\sigma}{\sqrt{n}}\right).\] Trivially, then, for large $\lim_{n\to\infty}f_n = p$, and even better, if we want to measure some probability within error $\epsilon$, we only need $n > \sigma^2/\epsilon^2$ trials, so for example 2500 throws are enough to tomograph any coin within error $10^{-2}$.

In this world, the gambler’s fallacy is not a fallacy, but a law of Nature. If one starts throwing a fair coin and observes 24 heads in row, it is literally impossible to observe another heads in the next throw. It’s as if there is a purpose pushing the frequencies towards the mean. It captures well our intuition about randomness. It is also completely insane: 25 heads are impossible only in the start of a sequence. If before them one had obtained 24 tails, 25 heads are perfectly fine. Also, it’s not as if 25 heads are impossible because their probability is too low. The probability of 24 heads, one tails, and another heads is even lower.

Even worse, if the probability you’re trying to tomograph is the one of obtaining 24 heads followed by one tail, then the frequency $f_1$ must be inside the interval \[[0,2^{-25}+\sqrt{2^{-25}(1-2^{-25})}]\approx [0,2^{-12.5}],\]which is only possible if $f_1 = 0$. That is, it is impossible to observe tails after observing 24 heads, as it would make $f_1=1$, but it is also impossible to observe heads. So in this world Nature would need to keep track not only of all the coin throws, but also which statistics you are calculating about them, and also find a way to keep you from observing contradictions, presumably by not allowing any coin to be thrown at all.

]]>A proper mixture is when you prepare the states $\ket{0}$ and $\ket{1}$ with probability $p$ and $1-p$, obtaining the density matrix

\[ \rho_\text{proper} = p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1}.\] An improper mixture is when you prepare the entangled state $\sqrt{p}\ket{0}\ket{0} + \sqrt{1-p}\ket{1}\ket{1}$ and discard the second subsystem, obtaining the density matrix \[ \rho_\text{improper} = p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1}.\] The question is then why do these different preparation procedures give rise to the same statistics (and therefore it is legitimate to represent them with the same density matrix).

Well, do they? I’m not so sure about that! The procedure to prepare the proper mixture is rather vague, so we can’t really answer whether is it appropriate to represent it via the density matrix $\rho_\text{proper}$. To remove the vagueness, I asked an experimentalist how she prepared the state $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$ that was necessary for an experiment. “Easy”, she told me, “I prepared $n$ copies of $\ket{0}$, $n$ copies of $\ket{1}$, and then combined the statistics.

This sounds like preparing the state $\ket{0}^{\otimes n} \otimes \ket{1}^{\otimes n}$, not like preparing $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$. Do they give the same statistics? Well, if I measure all states in the $Z$ basis, exactly $\frac12$ of the results will be $0$. But if I measure $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$ in the $Z$ basis $2n$ times, the probability that $\frac12$ of the results are $0$ is

\[ \frac{1}{2^{2n}} {2n \choose n} \approx \frac{1}{\sqrt{n\pi}},\] so just by looking at this statistic I can guess with high probability which was the preparation. It is even easier to do that if I disregard her instructions and look at the order of the results: getting $n$ zeroes followed by $n$ ones is a dead giveaway.

Maybe one should prepare these states using a random number generator instead? If one uses the function `rand()`

from MATLAB to decide whether to prepare $\ket{0}$ or $\ket{1}$ at each round one can easily pass the two randomness tests I mentioned above. Maybe it can even pass all common randomness tests available in the literature, I don’t know how good `rand()`

is. But it cannot, however pass *all* randomness tests, as `rand()`

is a deterministic algorithm using a finite seed, and is therefore restricted to outputting computable sequences of bits. One can, in fact, attack it, and this is the core of the paper of López Grande et al., showing how one can distinguish a sequence of bits that came from `rand()`

from a truly random one. More generally, even the best pseudorandom number generators we have are designed to be indistinguishable from truly random sources only by polynomial-time tests, and fail against exponential-time algorithms.

Clearly pseudorandomness is not enough to generate proper mixtures; how about true randomness instead? Just use a quantum random number generator to prepare bits with probabilities $p$ and $1-p$, and use these bits to prepare $\ket{0}$ or $\ket{1}$. Indeed, this is what people do when they are serious about preparing mixed states, and the statistics really are indistinguishable from those of improper mixtures. But why? To answer that, we need to model the quantum random number generator physically. We start by preparing a “quantum coin” in the state

\[ \sqrt{p}\ket{H}+\sqrt{1-p}\ket{T},\] which we should measure in the $\{\ket{H},\ket{V}\}$ basis to generate the random bits. Going to the Church of the Larger Hilbert Space, we model the measurement as

\[ \sqrt{p}\ket{H}\ket{M_H}+\sqrt{1-p}\ket{T}\ket{M_T},\] and conditioned on the measurement we prepare $\ket{0}$ or $\ket{1}$, obtaining the state

\[ \sqrt{p}\ket{H}\ket{M_H}\ket{0}+\sqrt{1-p}\ket{T}\ket{M_T}\ket{1}.\] We then discard the quantum coin and the measurement result, obtaining finally

\[ p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1},\] which is just the desired state, but now it is an improper mixture. So, at least in the Many-Worlds interpretation, there is no mystery about why proper and improper mixtures are equivalent: they are physically the same thing!

(A closely related question, which has a closely related answer, is why is it equivalent to prepare the states $\ket{0}$ or $\ket{1}$ with probability $\frac12$ each, or the states $\ket{+}$ or $\ket{-}$, again with probability $\frac12$? The equivalence fails for pseudorandomness, as shown by López Grande et al.; if we use true randomness instead, we are preparing the states

\[ \frac1{\sqrt{2}}(\ket{H}\ket{0}+\ket{T}\ket{1})\quad\text{or}\quad\frac1{\sqrt{2}}(\ket{H}\ket{+}+\ket{T}\ket{-})\] and discarding the coin. But note that if one applies a Hadamard to the coin of the first state one obtains the second, so the difference between then is just a unitary on a system that is discarded anyway; no wonder we can’t tell the difference! More generally, any two purifications of the same density matrix must be related by a unitary on the purifying system.)

Galley and Masanes want to invert the question, and ask for *which* quantum-like theories proper and improper mixtures are equivalent. To be able to tackle this question, we need to define what improper mixtures even are in a quantum-like theory. They proceed by analogy with quantum mechanics: if one has a bipartite state $\ket{\psi}$, and are doing measurements $E_i$ only on the first system, the probabilities one obtains are given by

\[ p(i) = \operatorname{tr}( (E_i \otimes \mathbb I) \ket{\psi}\bra{\psi} ),\] and the improper mixture is defined as the operator $\rho_\text{improper}$ for which

\[ p(i) = \operatorname{tr}( E_i \rho_\text{improper})\] for all measurements $E_i$.

In their case, they are considering a quantum-like theory that is still based on quantum states, but whose probabilities are not given by the Born rule $p(i) = \operatorname{tr}(E_i \ket{\phi}\bra{\phi})$, but by some more general function $p(i) = F_i (\ket{\phi})$. One can then define the probabilities obtained by local measurements on a bipartite state as

\[ p(i) = F_i \star \mathbb I (\ket{\psi}),\] for some composition rule $\star$ and trivial measurement $\mathbb I$, and from that an improper mixture as the operator $\omega_\text{improper}$ such that

\[ p(i) = F_i (\omega_\text{improper})\] for all measurements $F_i$.

Defining proper mixtures, on the other hand, is easy: if one can prepare the states $\ket{0}$ or $\ket{1}$ with probabilities $p$ and $1-p$, their proper mixture is the operator $\omega_\text{proper}$ such that for all measurements $F_i$

\[ p(i) = F_i(\omega_\text{proper}) = p F_i(\ket{0}) + (1-p) F_i(\ket{1}).\] That is, easy if one can generate true randomness that is not reducible to quantum-like randomness. I don’t think this makes sense, as one would have to consider a world where reductionism fails, or at least one where quantum-like mechanics is not the fundamental theory. Such non-reducible probabilities are uncritically assumed to exist anyway by people working on GPTs all the time2.

Now with both proper and improper mixtures properly defined, one can answer the question of whether they are equivalent: the answer is a surprising no, for any alternative probability rule that respects some basic consistency conditions. This has the intriguing consequence that if we were to modify the Born rule while keeping the rest of quantum mechanics intact, a wedge would be driven between the probabilities that come from the fundamental theory and some “external” probabilities coming from elsewhere. This would put the Many-Worlds interpretation under intolerable strain.

But such an abstract “no” result is not very interesting; I find it much more satisfactory to exhibit a concrete alternative to the Born rule where the equivalence fails. Galley and Masanes propose the function

\[ F_i(\ket{\psi}) = \operatorname{tr}(\hat F_i (\ket{\psi}\bra{\psi})^{\otimes 2})\] for some positive matrices $\hat F_i$ restricted by their consistency conditions. It is easy to see that the proper mixture of $\ket{0}$ and $\ket{1}$ described above is given by2

\[ \omega_\text{proper} = p \ket{00}\bra{00} + (1-p)\ket{11}\bra{11}.\] In quantum mechanics one would try to make it by discarding half of the state $\sqrt{p}\ket{0}\ket{0} + \sqrt{1-p}\ket{1}\ket{1}$. Here it doesn’t work, as nothing does, but I want to know what it gives us anyway. It is not easy to see that the improper mixture is given by the weirdo

\begin{multline} \omega_\text{improper} = (p^2 + \frac{p(1-p)}{3})\ket{00}\bra{00} + \\ \frac{2p(1-p)}{3} (\ket{01}+\ket{10})(\bra{01}+\bra{10}) + ((1-p)^2 + \frac{p(1-p)}{3})\ket{11}\bra{11}.\end{multline}