Pure quantum operations

Everybody knows how to derive what are the most general operations one can apply to a quantum state. You just need to assume that a quantum operation

  1. Is linear.
  2. Maps quantum states to quantum states.
  3. Still maps quantum states to quantum states when applied to a part of a quantum system.

And you can prove that such quantum operations are the well-known completely positive and trace preserving maps, which can be conveniently represented using the Kraus operators or the Choi-Jamiołkowski isomorphism.

But what if one does not want general quantum operations, but wants to single out pure quantum operations? Can one have such an axiomatic description, a derivation from intuitive[1] assumptions?

Well, the usual argument one sees in textbooks to show that the evolution of quantum states must be given by a unitary assumes that the evolution

  1. Is linear.
  2. Maps pure quantum states to pure quantum states.

From this, you get that a quantum state $\ket{\psi}$ is mapped to a quantum state $U\ket\psi$ for a linear operator $U$, and furthermore since by definition quantum states have 2-norm equal to 1, we need the inner product $\bra\psi U^\dagger U \ket\psi$ to be 1 for all $\ket\psi$, which implies that $U$ must be a unitary matrix.

The only problem with this argument is that it is false, as the map
\[ \mathcal E(\rho) = \ket\psi\bra\psi \operatorname{tr} \rho, \]which simply discards the input $\rho$ and prepares the fixed state $\ket\psi$ instead is linear, maps pure states to pure states, and is not unitary. The textbooks are fine, as they usually go through this argument before density matrices are introduced, and either implicitly or explicitly state that the evolution takes state vectors to state vectors. But this is not good enough for us, as this restriction to state vectors is both unjustified, and does not satisfy our requirement of being an “intuitive assumption”.

Luckily, the fix is easy: we just need to add the analogue of the third assumption used in the derivation of general quantum operations. If we assume that a pure quantum operation

  1. Is linear.
  2. Maps pure quantum states to pure quantum states.
  3. Still maps pure quantum states to pure quantum states when applied to a part of a quantum system.

then we can prove that pure quantum operations are just unitaries[2]. Since the proof is simple, I’m going to show it in full.

Let $\mathcal F$ be the pure quantum operation we are interested in. If we apply it to the second subsystem of a maximally entangled state, $\ket{\phi^+} = \frac1{\sqrt d}\sum_{i=1}^d \ket{ii}$, by assumption 3 the result will be a pure state, which we call $\ket{\varphi}$. In symbols, we have
\[ \mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+}) = \ket{\varphi}\bra{\varphi}, \]where $\mathcal I$ represents doing nothing to the first subsystem. Now the beautiful thing about the maximally entangled state is that if $\mathcal F$ is a linear map then $\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})$ contains all the information about $\mathcal F$. In fact, if we know $\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})$ we can know how $\mathcal F$ acts on any matrix $\rho$ via the identity
\[ \mathcal F (\rho) = \operatorname{tr}_\text{in} [(\rho^T \otimes \mathbb I) \mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})]. \]
This is the famous Choi-Jamiołkowski isomorphism[3]. Now let’s use the fact that the result $\ket{\varphi}\bra{\varphi}$ is a pure state. If we write it down in the computational basis
\[\ket\varphi = \sum_{i,j=1}^d \varphi_{ij} \ket{i j}, \]we see that if we define a matrix $\Phi$ with elements $\Phi_{ij} = \varphi_{ji} \sqrt d$ then $\ket\varphi = \mathbb I \otimes \Phi \ket{\phi^+}$[4], so
\[ \mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+}) = (\mathbb I \otimes \Phi) \ket{\phi^+}\bra{\phi^+} (\mathbb I \otimes \Phi^\dagger).\]
Using the identity above we have that
\[ \mathcal F(\rho) = \Phi \rho \Phi^\dagger, \]and since $\operatorname{tr}(\mathcal F(\rho)) = 1$ for every $\rho$ we have that $\Phi^\dagger\Phi = \mathbb I$, so $\Phi$ is an isometry. If in addition we demand that $\mathcal F(\rho)$ has the same dimension as $\rho$, then $\Phi$ must be a square matrix, and therefore has a right inverse which is equal to its left inverse, so $\Phi$ is a unitary.

This result is so amazing, so difficult, and so ground-breaking that the referees allowed me to include it as a footnote in my most recent paper without bothering to ask for a proof or a reference. But joking aside, I’d be curious to know if somebody already wrote this down, as a quick search through the textbooks revealed me nothing.

But how about Wigner’s theorem, I hear you screaming. Well, Wigner was not concerned with deriving what were the quantum operations, but what were the symmetry transformations one could apply to quantum states. Because of this he did not assume linearity, which was not relevant to him (and in fact would make his theorem wrong, as one can have perfectly good anti-linear symmetries, such as time reversal). Also, he assumed that symmetry transformations preserve inner products, which is too technical for my purposes.

What is the probability of an infinite sequence of coin tosses?

It’s 0, except on the trivial cases where it is 1.

But clearly this is the wrong way to formulate the question, as there are interesting things to be said about the probabilities of infinite sequences of coin tosses. The situation is analogous to uniformly sampling real numbers from the $[0,1]$ interval: the probability of obtaining any specific number is just 0. The solution, however, is simple: we ask instead what is the probability of obtaining a real number in a given subinterval. The analogous solution works for the case of coin tosses: instead of asking the probability of a single infinite sequence, one can ask the probability of obtaining an infinite sequence that starts with a given finite sequence.

To be more concrete, let’s say that the probability of obtaining Heads in a single coin toss is $p$, and for brevity let’s denote the outcome Heads by 1 and Tails by 0. Then the probability of obtaining the sequence 010 is $p(1-p)^2$, which is the same as the probability of obtaining the sequence 0100 or the sequence 0101, which is the same as the probability of obtaining a sequence in the set {01000, 01001, 01010, 01011}, which is the same as the probability of obtaining an infinite sequence that starts with 010.

There is nothing better to do with infinite sequences of zeroes and ones than mapping them into a real number in the interval $[0,1]$, so we shall do that. The set of infinite sequences that start with 010 are then very conveniently represented by the interval $[0.010,0.010\bar1]$, also known as $[0.010,0.011]$ for those who do not like infinite strings of ones, or $[0.25,0.375]$ for those who do not like binary. Saying then that the probability of obtaining a sequence in $[0.010,0.010\bar{1}]$ is $p(1-p)^2$ is assigning a measure to this interval, which we write as
\[ \rho([0.010,0.010\bar{1}]) = p(1-p)^2 \]
Now if we can assign a sensible probability to every interval contained in $[0,1]$ we can actually extend it into a proper probability measure over the set of infinite sequences of coin tosses using standard measure-theoretical arguments. For me this is the right answer to the question posed on the title of this post.

So, how do we go about assigning a sensible probability to every interval contained in $[0,1]$? Well, the argument of the previous paragraph can clearly be extended to any interval of the form $[k/2^n, (k+1)/2^n]$. We just need write $k$ in the binary basis, padded with zeroes on the left until it reaches $n$ binary digits, and count the number of 0s and 1s. In symbols:
\[ \rho\left(\left[\frac{k}{2^n}, \frac{k+1}{2^n}\right]\right) = p^{n_1(k,n)}(1-p)^{n_0(k,n)} \]
The extension to any interval where the extremities are binary fractions is straightforward. We just break them down into intervals where the numerators differ by one and apply the previous rule. In symbols:
\[ \rho\left(\left[\frac{k}{2^n}, \frac{l+1}{2^n}\right]\right) = \sum_{i=k}^{l} p^{n_1(i,n)}(1-p)^{n_0(i,n)} \]
We are essentially done, since we can approximate any real number as well as we want we want by using binary fractions [5]. But life is more than just binary fractions, so I’ll show explicitly how to deal with the interval
\[[0,1/3] = [0,0.\bar{01}] \]

The key thing is to choose a nice sequence of binary fractions $a_n$ that converges to $1/3$. It is convenient to use a monotonically increasing sequence, because then we don’t need to worry about minus signs. If furthermore the sequence starts with $0$, then \[ [0,1/3] = \bigcup_{n\in \mathbb N} [a_n,a_{n+1}] \] and
\[ \rho([0,1/3]) = \sum_{n\in \mathbb N} \rho([a_n,a_{n+1}]) \] An easy sequence that does the job is $(0,0.01,0.0101,0.010101,\ldots)$. It lets us write the interval as
\[ [0,1/3] = [0.00, 0.00\bar{1}] \cup [0.0100, 0.0100\bar{1}] \cup [0.010100, 0.010100\bar{1}] \cup … \] which gives us a simple interpretation of $\rho([0,1/3])$: it is the probability of obtaining a sequence of outcomes starting with 00, or 0100, or 010100, etc. The formula for the measure of $[a_n,a_{n+1}]$ is also particularly simple:
\[ \rho([a_n,a_{n+1}]) = p^{n-1}(1-p)^{n+1} \] so the measure of the whole interval is just a geometric series:
\[ \rho([0,1/3]) = (1-p)^2\sum_{n\in\mathbb N} \big(p(1-p)\big)^{n-1} = \frac{(1-p)^2}{1-p(1-p)} \]

It might feel like something is missing because we haven’t examined irrational numbers. Well, not really, because the technique used to do $1/3$ clearly applies to them, as we only need a binary expansion of the desired irrational. But still, this is not quite satisfactory, because the irrationals that we know and love like $1/e$ or $\frac{2+\sqrt2}4$ have a rather complicated and as far as I know patternless binary expansion, so we will not be able to get any nice formula for them. On the other hand, one can construct some silly irrationals like the binary Liouville constant
\[ \ell = \sum_{n\in\mathbb N} 2^{-n!} \approx 0.110001000000000000000001\]whose binary expansion is indeed very simple: every $n!$th binary digit is a one, and the rest are zeroes. The measure of the $[0,\ell]$ interval is then
\[ \rho([0,\ell]) = \sum_{n\in \mathbb N} \left(\frac{p}{1-p}\right)^{n-1} (1-p)^{n!} \]Which I have no idea how to sum (except for the case $p=1/2$ ;)

But I feel that something different is still missing. We have constructed a probability measure over the set of coin tosses, but what I’m used to think of as “the probability” for uncountable sets is the probability density, and likewise I’m used to visualize a probability measure by making a plot of its density. Maybe one can “derive” the measure $\rho$ to obtain a probability density over the set of coin tosses? After all, the density is a simple derivative for well-behaved measures, or the Radon-Nikodym derivative for more naughty ones. As it turns out, $\rho$ is too nasty for that. The only condition that a probability measure needs to satisfy in order to have a probability density is that it needs to attribute measure zero to every set of Lebesgue measure zero, and $\rho$ fails this condition. To show that, we shall construct a set $E$ such that its Lebesgue measure $\lambda(E)$ is zero, but $\rho(E)=1$.

Let $E_n$ be the set of infinite sequences that start with a $n$-bit sequence that contains at most $k$ ones[2]. Then
\[ \rho(E_n) = \sum_{i=0}^k \binom ni p^i (1-p)^{n-i} \] and
\[ \lambda(E_n) = 2^{-n} \sum_{i=0}^k \binom ni \] These formulas might look nasty if you haven’t fiddled with entropies for some time, but they actually have rather convenient bounds, which are valid for $p < k/n < 1/2$: \[ \rho(E_n) \ge 1 - 2^{-n D\left( \frac kn || p\right)} \] and \[ \lambda(E_n) \le 2^{-n D\left( \frac kn || \frac 12\right)} \] where $D(p||q)$ is the relative entropy of $p$ with respect to $q$. They show that if $k/n$ is smaller than $1/2$ then $\lambda(E_n)$ is rather small (loosely speaking, the number of sequences whose fraction of ones is strictly less than $1/2$ is rather small), and that if $k/n$ is larger than $p$ then $\rho(E_n)$ is rather close to one (so again loosely speaking, what this measure does is weight the counting of sequences towards $p$ instead of $1/2$: if $k/n$ were smaller than $p$ then $\rho(E_n)$ would also be rather small).

If we now fix $k/n$ in this sweet range (e.g. by setting $k = \lfloor n(p + 0.5)/2\rfloor$)[3] then
\[ E = \bigcap_{i \in \mathbb N} \bigcup_{n \ge i} E_n,\]
is the set we want, some weird kind of limit of the $E_n$. Then I claim, skipping the boring proof, that
\[ \rho(E) = 1 \]and
\[ \lambda(E) = 0 \]

But don’t panic. Even without a probability density, we can still visualize a probability measure by plotting its cumulative distribution function
\[ f(x) = \rho([0,x]) \]which for $p = 1/4$ is this cloud-like fractal:
Cumulative distribution function of probability measure rho

Crackpots in my inbox

Often people ask me why I’m not more open-minded about ideas that defy the scientific consensus. Maybe global warming is just a conspiracy? Maybe Bell’s theorem is in fact wrong? Maybe the EmDrive does provide thrust without using propellant? Maybe the E-Cat can make cold fusion? I mean, it is not logically impossible for some outsider to be correct while the entire scientific community is wrong. Wasn’t Galileo burned at the stake (sic) for defying the scientific consensus? Why should I then dismiss this nonsense outright, without reading it through and considering it carefully?

Well, for starters the scientific method has advanced a lot since the time of Galileo. Instead of asserting dogma we are busy looking at every tiny way experiment can deviate from theory. And if you do prove the theory wrong, you do not get burned at the stake (sic), but get a Nobel Prize (like the prize gave for the discovery of neutrino oscillations in 2015). So I’m naturally very suspicious of outsiders claiming to have found glaring mistakes in the theory.

But the real problem is the sheer amount of would-be Galileos incessantly spamming researchers about their revolutionary theories (despite not being exactly famous, I get to join the fun because they usually write to every academic email address they find online. I can only wonder how Stephen Hawking’s inbox looks like). It is already a lot of work to keep me up-to-date with the serious papers in my field. Imagine if I also had to read every email that proved Einstein wrong?

Without further ado, I’d like to illustrate this point by showing here the most entertaining crackpots that have spammed me:

Probably the most well-known is Gabor Fekete, who has a truly amazing website to expound his theories (don’t forget to press Ctrl or click with the right button of the mouse while you’re there!). Apparently he doesn’t like the square root in the Lorentz factor, and has a nice animation showing it being erased. If you do that I guess you’ll be able to explain all of physics with eight digits accuracy. He has recently taken to spoofing his emails to make it look like they were sent by Nobel laureates, probably thinking that his theories would be accepted if they came from a famous source. While the forgery itself was well-made (one needs to look carefully at the source code of the email to detect it), the content of the email kind of gives it away. Maybe if he had spend his time studying physics instead of the SMTP protocol…

Another persistent spammer is Sorin Cosofret, who started a newsletter about his theories to unwilling subscribers. They are about classical electromagnetism, relativity, quantum mechanics, planetary dynamics, cosmology, chemistry… apparently everything is wrong, but he knows how to correct it. He also has a website, that if not as flashy as Gabor Fekete’s, is at least available in Romenian, English, French, German, and Spanish.

A more aggressive one is stefan:sattler, who has a problem with the known laws of planetary mechanics, and wants the scientific community to help in publicising his “Sattler’s Law of planetary mechanics”. After sending 5 emails in one month he lost his patience, and gave us 48 hours to do it, threatening to publish all our names and email addresses if we don’t (you know, the name and email addresses that are publicly available). He told us

Go now and REPENT – go now and try to offer redemption for the guilt and responsibility you all have loaded upon your shoulders.

Time is ticking – you have 48 hours – the JUDGEMENTS ARE BEING WRITTEN RIGHT NOW…..

I haven’t heard from him since.

More recently, I got an email from an anonymous crackpot who maintains a prolific YouTube channel in Croatian dedicated to showing that the Earth is flat. It was entertaining to see that the crackpot sent me emails to both my University of Vienna address and to my University of Cologne address, each signed as a different person pretending to be interested in whether the videos were correct.

If you want to defy the scientific consensus, first study it for a few years. Then publish a peer-reviewed paper (Reputable journals do accept some pretty outlandish stuff). Then I’ll listen to you.

My shortest research program ever

\[ t=0:00 \]

SIMPLICIO: These quantum gravity people! Always claiming that the world is fundamentally discrete! It’s so stupid!
INGENUO: Humm why is it stupid? They do have good reasons to think that.
SIMPLICIO: But come on, even the most discrete thing ever, the qubit, already needs continuous parameters to be described!
INGENUO: Well, yes, but it’s not as if you can take these parameters seriously. You can’t really access them with arbitrary precision.
SIMPLICIO: What do you mean? They are continuous! I can make any superposition between $\ket{0}$ and $\ket{1}$ that I want, there are no holes in the Bloch sphere, or some magical hand that will stop me from producing the state $\sin(1)\ket{0} + \cos(1)\ket{1}$ as precisely as I want.
INGENUO: Yeah, but even if you could do it, what’s the operational meaning of $\sin(1)\ket{0} + \cos(1)\ket{1}$? It’s not as if you can actually measure the coefficients back. The problem is that if you estimate the coefficients by sampling $n$ copies of this state the number of bits you get goes like $\frac12\,\log(n)$. And this is just hopeless. Even if you have some really bright source that produces $10^6$ photons per second and you do some black magic to keep it perfectly stable for a week, you only get something like 20 bits. So operationally speaking you might as well write
\[ 0.11010111011010101010\ket{0} + 0.10001010010100010100\ket{1}\]
SIMPLICIO: Pff, operationally. Operationally it also makes no difference whether the remains of Galileo are still inside Jupiter or not. It doesn’t mean I’m going to assume they magically disappeared. Same thing about the 21st bit. It’s there, even if you can’t measure it.
INGENUO: I would take lessons from operational arguments more seriously. You know, Einstein came up with relativity by taking seriously the idea that time is what a clock measures.
SIMPLICIO: ¬¬. So you are seriously arguing that there might be only 20 bits in a qubit.
SIMPLICIO: Come on. Talk is cheap. If you want to defend that you need to come up with a toy theory that is not immediately in contradiction with experiment where the state of a qubit is literally encoded in a finite number of bits.
INGENUO: Hmmm. I need to piss about it. (Goes to the bathroom)
\[t = 0:10\]
INGENUO: Ok, so if we have $b$ bits we can encode $2^b$ different states. And as long as $b$ is large enough and these states are more-or-less uniformly spread around the Bloch sphere we should be able to model any experiment as well as we want. So we only need to find some family of polyhedrons with $2^b$ vertices that tend to a sphere in the limit of infinite $b$ and we have the qubit part of the theory!
SIMPLICIO: Hey, not so fast! How about the transformations that you can do on these states? Surely you cannot allow unitaries that would map one of these $2^b$ states to some state not encoded in your scheme.
SIMPLICIO: So you have some set of allowed transformations that is not the set of all unitaries. And this set of allowed transformations clearly must satisfy some basic properties, like you can compose them and you do not get outside of the set, and it must always be possible to invert any of the transformations.
INGENUO: Yeah, sure. But what are you getting at?
SIMPLICIO: Well, they must form a group. A subgroup of $U(2)$, to be more precise. And since we don’t care about the global phase, make it a subgroup of $SU(2)$, for simplicity.
INGENUO: Oh. Well, we just need to check which are the subgroups of $SU(2)$, surely we’ll find something that works. (Both start reading Wikipedia.)
SIMPLICIO: Humm, so it turns out that the finite subgroups of $SO(3)$ are rather lame. You either have the platonic solids, which are too finite, or two subgroups that can get arbitrarily large, the cyclic and the dihedral groups.
INGENUO: Argh. What are these things?
SIMPLICIO: The cyclic group is just the rotations of the sphere by some rational angle around a fixed axis, and the dihedral group is just the cyclic group together with a reflection along the same axis. So you can put your states either in the vertices of a polygon inscribed in the equator of the Bloch sphere, or in the vertices of a prism.
INGENUO: Ugh. They are not nearly as uniform as I hoped. So I guess the best one can do is put the states in the vertices of an icosahedron.
SIMPLICIO: Beautiful. So instead of 20 bits you can have 20 states. Almost there!

The sleeping beauty problem: a foray into experimental metaphysics

One of the most intriguing consequences of Bell’s theorem is the idea that one can do experimental metaphysics: to take some eminently metaphysical concepts such as determinism, causality, and free will, and extract from them actual experimental predictions, which can be tested in the laboratory. The results of said tests can then be debated forever without ever deciding the original metaphysical question.

It was with such ideas in mind that I learned about the Sleeping Beauty problem, so I immediately thought: why not simply do an experimental test to solve the problem?

The setup is as follows: you are the Sleeping Beauty, and today is Sunday. I’m going to flip a coin, and hide the result from you. If the coin fell on heads, I’m going to give you a sleeping pill that will make you sleep until Monday, and terminate the experiment after you wake up. If it falls on tails instead, I’m going also to give you the pill that makes you sleep until Monday, but after your awakening I’m going to give you a second pill that erases your memory and makes you sleep until Tuesday. At each awakening I’m going to ask you: what is the probability[4] that the coin fell on tails?

There are two positions usually defended by philosophers:

  1. $p(T) = 1/2$. This is defended by Lewis and Bostrom, roughly because before going to sleep the probability was assumed to be one half (i.e. that the coin is fair), and by waking up you do not learn anything you didn’t know before, so the probability should not change.
  2. $p(T) = 2/3$. This is defended by Elga and Bostrom, roughly because the three possible awakenings (heads on Monday, tails on Monday, and tails on Tuesday) are indistinguishable from your point of you, so you should assign all of them the same probability. Since two of them have the coin fallen on tails, the probability of tails must be two-thirds.

Well, seems like the perfect question to answer experimentally, no? Give drugs to people, and ask them to bet on the coins being heads or tails. See who wins more money, and we’ll know who is right! There are, however, two problems with this experiment. The first is that it is not so easy to erase people’s memories. Hitting them hard on the head or giving them enough alcohol usually does the trick, but it doesn’t work reliably, and I don’t know where I could find volunteers that thought the experiment was worth the side effects (brain clots or a massive hangover). And, frankly, even if I did find volunteers (maybe overenthusiastic philosophy students?), these methods are just too grisly for my taste.

Luckily a colleague of mine (Marie-Christine) found an easy solution: just demand people to place their bets in advance. Since they are not supposed to be able to know in which of the three awakenings they are, it makes no sense for them to bet differently in different awakenings (in fact, they should even be be unable to bet differently on different awakenings without access to a random number generator. If they have one in their brains is another question). So if you decide to bet on heads, and then “awakes” on Tuesday, too bad, you have to do the bad bet anyway.

With that solved, we get to the second problem: it is not rational to ever bet on heads. If you believe that the probability is $1/2$ you should be indifferent between heads and tails, and if you believe that the probability is $2/3$ you should definitely bet on tails. In fact, if you believe that the probability is $1/2$ but have even the slightest doubt that your reasoning is correct, you should bet on tails anyway just to be on the safe side.

This problem can be easily solved, simply by biasing the coin a bit towards heads, such that the probability of heads (if you believed in $1/2$) is now slightly above one half, while keeping the probability of tails (if you believed in $2/3$) still above one half. To calculate the exact numbers we use a neat little formula from Sebens and Carroll, which says that the probability of you being the observer labelled by $i$ within a set of observers with identical subjective experiences is
\[ p(i) = \frac{w_i}{\sum_j w_j}, \]
where $w_i$ is the Born-rule weight of your situation, and the $w_j$ are the Born-rule weights of all observers in the subjectively-indistinguishable situation.

Let’s say that the coin has a (objective, quantum, given by the Born rule) probability $p$ of falling on heads. The probability of being one of the tail observers is then simply the sum of the Born-rule weight of the Monday tail observer (which is simply $1-p$) with the Born-rule weight of the Tuesday tail observer (also $1-p$), divided by the sum of the Born-rule weights of all three observers ($1-p$, $1-p$, and $p$), so
\[ p(T) = \frac{2(1-p)}{2(1-p) + p}.\]
For elegance, let’s make this probability be equal to the objective probability of the coin falling on heads, so that both sides of the philosophical dispute will bet on their preferred solution with the same odds. Solving $p = (2 – 2p)/(2-p)$ gives us then
\[ p = 2-\sqrt{2} \approx 0.58,\]
which makes the problem quantum, and thus on topic for this blog, since it features the magical $\sqrt2$.[2]

With all this in hand, time to do the experiment. I gathered 17 impatient hungry physicists in a room, and after explaining them all of this, I asked them to bet on either heads or tails. The deal was that the bet was a commitment to buy, in each awakening, a ticket that would pay them 1€ in case they were right. Since the betting odds were set to be $0.58$, the price for each ticket was 0.58€.

After each physicist committed to a bet, I ran my biased quantum random number generator (actually just the function rand from Octave with the correct weighting), and cashed the bets (once when the result was heads, twice when the result was tails).

There were four possible situations: if the person betted on tails and the result was tails, they paid me 1.16€ for the tickets and got 2€ back, netting 0.84€ (this happened 4 times). If the person betted on heads and the result was tails, they paid me 1.16€ again, but got nothing back, netting -1.16€ (this happened 2 times). If the person betted on tails and the result was heads, they paid me 0.58€ for the ticket and got nothing back, netting -0.58€ (this happened 4 times). Finally, if the person betted on heads and the result was heads, they paid 0.58€ for the ticket and got 1€ back, netting 0.42€ (this happened once).

So on average the people who betted on tails profited 0.13€, while the people who betted on heads lost 0.61€. The prediction of the $2/3$ theory was that they should profit nothing when betting on tails, and lose 0.16€ when betting on heads. The prediction of the $1/2$ theory was the converse: who bets on tails loses 0.16€, while who beats on heads breaks even. In the end the match was not that good, but still the data clearly favours the $2/3$ theory. Once again, physics comes to the rescue of philosophy, solving experimentally a long-standing metaphysical problem!

Speaking more seriously, of course the philosophers knew, since the first paper on the subject, that the experimental results would be like this, and that is why nobody bothered to do the experiment. They just thought that this was not a decisive argument, as the results are determined by how you operationalise the Sleeping Beauty problem, and the question was always about what is the correct operationalisation (or, on other words, what probability is supposed to be). Me, I think that whatever probability is, it should be something with a clear operational meaning. And since I don’t know any natural operationalisation that will give the $1/2$ answer, I’m happy with the $2/3$ theory.

Understanding Bell’s theorem part 3: the Many-Worlds version

This post is based on discussions with Harvey Brown, Eric Cavalcanti, and Nathan Walk. At least one of them peacefully disagrees with everything written here.

After going through two versions of Bell’s theorem, one might hope to be done with it. Well, this was the situation in 1975, and judging by the huge amount of literature produced since then about Bell’s theorem, I think it is clear that the scientific community is far from being done with it. Why is that so? One reason is that many people really don’t want to give up any of the assumptions behind the simple version of Bell’s theorem: they are used to classical mechanics, which offers them a world with determinism and no action at a distance, and they want to keep it that way. But if you ask more specifically the quantum community, they do not lose any sleep over the simple version: they are happy to give up determinism and keep no action at a distance. Instead, the real thorn in their side is the failure of local causality. It is after all a well-motivated locality assumption that, even if it is not demanded by relativity, it seems to be a plausible extrapolation from it. Furthermore, the failure of local causality is not even a brute experimental fact that people must just accept and be done with it. To see that your probabilities have changed as a result of a measurement done in a space-like separated region you need to know the result of said measurement. And then it is not space-like separated anymore, it has moved to your past light cone.

But this is is just an abstract complaint about the theorem, that doesn’t suggest any obvious solution. A more concrete problem, which is much easier to address, is that both the simple and the nonlocal versions blissfully ignore the Many-Worlds interpretation. Even if you don’t find this interpretation compelling, it is taken seriously by a big part of the scientific community, and I don’t think it is defensible to simply ignore it when discussing the foundations of quantum mechanics.

So how do we reformulate Bell’s theorem to take the Many-Worlds interpretation into account? In this point the literature is rather disappointing, as nobody seems to have tried to do that. The papers I know either exclude the Many-Worlds interpretation via an explicit assumption, or simply note that Bell’s theorem does not apply to it, as the derivation implicitly assume that measurements have a single outcome. This is true, but rather unsatisfactory. Should we conclude then that Bell’s theorem is just a mistake? And how about local causality, is it violated or not? And how about quantum key distribution, does it work at all, or do we need to change cryptosystems if we believe that Many-Worlds is true?

Let us start by examining local causality, or more precisely one of the equations we used in the derivation:
\[ p(a|bxy\lambda) = p(a|x\lambda) \]
this says that the probability of Alice obtaining outcome $a$ depends only on her setting $x$ and the physical state $\lambda$, and not on Bob’s setting $y$ or his outcome $b$. We immediately have a problem: what can “Bob’s outcome $b$” possibly mean in the Many-Worlds interpretation? After all if Alice and Bob share an entangled state $\frac{\ket{00}+\ket{11}}{\sqrt2}$, then before Bob’s measurement their joint state is
\[ \ket{\text{Alice}}\frac{\ket{00}+\ket{11}}{\sqrt2}\ket{\text{Bob}} \]
which, after his measurement, becomes
\[ \ket{\text{Alice}}\frac{\ket{00\text{Bob}_0}+\ket{11\text{Bob}_1}}{\sqrt2}\]
So there is no such thing as “Bob’s outcome”. There are two copies of Bob, each seeing a different outcome. Maybe we can then use $b=\frac{\ket{00\text{Bob}_0}+\ket{11\text{Bob}_1}}{\sqrt2}$ in that equation, instead of $b=0$ or $b=1$. Does it work then? Well, there is still the problem that the equation is about the “probability of Alice obtaining outcome $a$”. But we know that there is also no such thing as “Alice’s outcome”: there will be two copies of Alice, each seeing a different outcome. So from a third-person perspective it makes no sense to talk about the “probability of Alice obtaining outcome $a$”. On the other hand, from Alice’s perspective she will experience a single outcome (if you experience more than one outcome, I want to know what are you smoking), so we can talk about probabilities in a first-person, decision-theoretic way. The equation is then about how much Alice should bet on experiencing outcome $a$, or more precisely the maximum she should pay for a ticket that gives her 1€ if the outcome she experiences is $a$.

So, how much should she? Well, the right hand side $p(a|x\lambda)$ is easy to decide: she only knows that she is making a measurement on a half of an entangled state, whose reduced density matrix is $\mathbb{1}/2$. Her probabilities are $1/2$, independently of the basis in which she measures. How about the left hand side of the equation, $p(a|bxy\lambda)$? Well, now she knows in addition that Bob is in the state $\frac{\ket{00\text{Bob}_0}+\ket{11\text{Bob}_1}}{\sqrt2}$ (whe are assuming for simplicity that he measured in the Z basis). So what? How does that help her in predicting which outcome she will experience? This state has no bias towards 0 or 1, and there is no more information, outside her future light cone, that could help her make the prediction. This is no surprise, as in the Many-Worlds interpretation whatever Bob does is assumed to be a unitary, and unitaries applied to one half of a entangled state cannot affect the probabilities of measurements on the other half. For Bob’s measurement to affect Alice in any way it would have to cause a collapse of the wave function, and this is precisely what the Many-Worlds interpretation says that does not happen. We must therefore conclude that $p(a|bxy\lambda) = 1/2$ and that this bastardised version of local causality is respected.

Does this imply that Bell inequalities are not violated in the Many-Worlds interpretation? Of course not! To derive them we needed the version of local causality where Bob had a single outcome. Can we still use it in some way? Well, Bob does obtain a single outcome from Alice’s point of view after they interact in the future (and become decohered with respect to eachother), so then (and only then) we can talk about the joint probabilities $p(ab|xy)$. As eloquently put by Brown and Timpson:

We can only think of the correlations between measurement outcomes on the two sides of the experiment actually obtaining in the overlap of the future light-cones of the measurement events—they do not obtain before then and—a fortiori—they do not obtain instantaneously.

But at this point in time the assumption of local causality becomes ill-motivated: Bob’s measurement is now in Alice’s past light-cone, and it is perfectly legit for her probabilities to depend on it. The information from it had, after all, to slugishly crawl the intervening space in order to influence her.

So the nonlocal version of Bell’s theorem simply falls apart in the Many-Worlds interpretation. Can we still derive some version of Bell’s theorem from well-motivated assumptions, or do we need to give up and say that it simply doesn’t make sense? Well, I wouldn’t be writing this post if I didn’t have a solution.

To do it, we start by formalising the version of local causality presented above. It says that Alice’s probability of experiencing outcome $a$ depend only on stuff in her past light-cone $\Lambda,$ and not on anything else in the entire region $\Gamma$ outside her future light-cone.

  • Generalised local causality:  $p(a|\Gamma) = p(a|\Lambda)$.

Note that we had to condition on the entire region $\Gamma$ instead of only on Bob’s lab because the state $\frac{\ket{00\text{Bob}_0}+\ket{11\text{Bob}_1}}{\sqrt2}$ is defined in the former region, not on the latter.

I think it is fair to call this generalised local causality because it reduces to local causality if one assumes that Bob’s measurement had a single outcome, via some sort of wavefunction collapse. Note also that in the Many-Worlds interpreation generalised local causality is essentially the same thing as no action at a distance. This is because Many-Worlds is a deterministic theory (not in the sense that the outcome of the measurement is predictable, but in the sense that the post-measurement state is uniquely determined by the pre-measurement state), and therefore conditioning on the post-measurement state doesn’t bring us any additional information. This is not really a surprise, since local causality also reduces to no action at a distance for deterministic theories.

This brings us to the second assumption needed to derive the Many-Worlds version of Bell’s theorem. Since we have now some sort of no action at a distance, one might expect some sort of determinism to do the job and complete the derivation. This is indeed the case, but the terminology here becomes unfortunately confusing, because as explained above Many-Worlds is a deterministic theory, but not in the sense demanded by determinism. The assumption we need is predictability, i.e., that an observer with access to the physical state $\lambda$ can predict the measurement outcomes[3]. As wittily put by Howard Wiseman, determinism means that “God does not play dice”, and predictability means that “God does not let us play dice”. Putting in a more boring way, we simply write

  • Predictability:  $p(ab|xy\lambda) \in \{0,1\}$.

Using then predictability together with generalised local causality we can again prove Bell’s theorem, following the same steps we did for the simple version. The interesting thing is that while generalised local causality is always true, there are some situations where predictability holds and some where it doesn’t, and a violation of a Bell inequality implies that it does not hold.

I think it is instructive to consider some concrete examples to see how this works. The simplest case is where Alice and Bob share a pure product state and the Eve knows it. For example their joint state could be
\[ \ket{\text{Alice}}\ket{00}\ket{\text{Bob}}\ket{\text{Eve}^{00}} \]
In this case it is clear that Eve can predict the result of their measurements (in the computational basis) and that therefore they cannot violate any Bell inequality. A slightly less simple case is where they all start in this same state, but Alice and Bob do a measurement in the superposition basis. Now Eve can not predict the result of the measurement, but Alice and Bob still cannot violate a Bell inequality. This is ok, because predictability is a sufficient condition for a Bell inequality to hold, not a necessary one.

A more interesting case is where Alice and Bob share a maximally entangled state and Eve again knows it:
\[ \ket{\text{Alice}}\frac{\ket{00}+\ket{11}}{\sqrt2}\ket{\text{Bob}}\ket{\text{Eve}^{\phi^+}} \]
Eve’s knowledge doesn’t help her predict the outcome of the measurement, because there is no outcome of the measurement to be predicted. Both outcomes will happen, and eventually decoherence will make two copies of Eve, one in the 00 branch and another in the 11 branch. Both ignorant of which branch they are in. In this case Alice and Bob will violate a Bell inequality, and correctly conclude that Eve couldn’t have possibly predicted their outcomes.

The most interesting case is where Alice and Bob share the mixed state $\frac12\ket{00}\bra{00} + \frac12\ket{11}\bra{11}$ and Eve holds its purification. Their joint state is
\[ \ket{\text{Alice}}\frac{\ket{00}\ket{\text{Eve}^{0}}+\ket{11}\ket{\text{Eve}^{1}}}{\sqrt2}\ket{\text{Bob}}\]
and it is clear that both copies of Eve, $\ket{\text{Eve}^{0}}$ and $\ket{\text{Eve}^{1}}$ can predict the result of Alice and Bob’s measurements, and that they cannot violate any Bell inequality. Note that this state represents the case where Eve does her measurement before Alice and Bob. She could also make it after them, it makes no difference.

This concludes the Many-Worlds version of Bell’s theorem, and my series of posts about it. I hope that they helped clear some of the misunderstandings about it, and that even if you disagree with my conclusions, you would agree that I’m asking the right questions. I’d like finish with a quotation from the man himself:

The “many world interpretation” seems to me an extravagant, and above all an extravagantly vague, hypothesis. I could almost dismiss it as silly. And yet… It may have something distinctive to say in connection with the “Einstein-Podolsky-Rosen puzzle,” and it would be worthwile, I think, to formulate some precise version of it to see if this is really so.

Understanding Bell’s theorem part 2: the nonlocal version

Continuing the series on Bell’s theorem, I will now write about its most popular version, the one that people have in mind when they talk about quantum nonlocality: the version that Bell proved in his 1975 paper The theory of local beables.

But first things first: why do we even need another version of the theorem? Is there anything wrong with the simple version? Well, a problem that Bohmians have with it is that its conclusion is heavily slanted against their theory: quantum mechanics clearly respects no conspiracy and no action at a distance, but clearly does not respect determinism, so the most natural interpretation of the theorem is that trying to make quantum mechanics deterministic is a bad idea. The price you have to pay is having action at a distance in your theory, as Bohmian mechanics has. Because of this the Bohmians prefer to talk about another version of the theorem, that lends some support to the idea that the world is in some sense nonlocal.

There is also legitimate criticism to be made against the simple version of Bell’s theorem: namely that the assumption of determinism is too strong. This is easy to see, as we can cook up indeterministic correlations that are even weaker than the deterministic ones: if Alice and Bob play the CHSH game randomly they achieve $p_\text{succ} = 1/2$, well below the bound of $3/4$. This implies that just giving up on determinism does not allow you to violate Bell inequalities. You need to lose something more precious than that. What exactly?

The first attempt to answer this question was made by Clauser and Horne in 1974. Their proof goes like this: from no conspiracy, the probabilities decompose as
\[ p(ab|xy) = \sum_\lambda p(\lambda)p(ab|xy\lambda) \]
Then, they introduce their new assumption

  • Factorisability:   $p(ab|xy\lambda) = p(a|x\lambda)p(b|y\lambda)$.

which makes the probabilities reduce to
\[ p(ab|xy) = \sum_\lambda p(\lambda)p(a|x\lambda)p(b|y\lambda) \]
Noting that for any coefficients $M^{ab}_{xy}$ the Bell expression
\[ p_\text{succ} = \sum_{abxy} \sum_\lambda M^{ab}_{xy} p(\lambda)p(a|x\lambda)p(b|y\lambda) \]
is upperbounded by deterministic probability distributions $p(a|x\lambda)$ and $p(b|y\lambda)$, the rest of the proof of the simple version of Bell’s theorem applies, and we’re done.

So they can prove Bell’s theorem only from the assumptions of no conspiracy and factorisability, without assuming determinism. The problem is how to motivate factorisability. It is not a simple and intuitive condition like determinism or no action a distance, that my mum understands, but some weird technical stuff. Why would she care about probabilities factorising?

The justification that Clauser and Horne give is just that factorisability

…is a natural expression of a field-theoretical point of view, which in turn is an extrapolation from the common-sense view that there is no action at a distance.

What are they talking about? Certainly not about quantum fields, which do not factorise. Maybe about classical fields? But only those without correlations, because otherwise they don’t factorise either! Or are they thinking about deterministic fields? But then there would be no improvement with respect to the simple version of the theorem! And anyway why do they claim that it is an extrapolation of no action at a distance? They don’t have a derivation to be able to claim such a thing! It is hard for me to understand how anyone could have taken this assumption seriously. If I were allowed to just take some arbitrary technical condition as an assumption I could prove anything I wanted.

Luckily this unsatisfactory situation only lasted one year, as in 1975 Bell managed to find a proper motivation for factorisability, deriving it from his notion of local causality. Informally, it says that causes are close to their effects (my mum is fine with that). A bit more formally, it says that probabilities of events in a spacetime region $A$ depend only on stuff in its past light cone $\Lambda$, and not on stuff in a space-like separated region $B$ (my mum is not so fine with that). So we have

  • Local causality:   $p(A|\Lambda,B) = p(A|\Lambda)$.


How do we derive factorisability from that? Start by applying Bayes’ rule
\[p(ab|xy\lambda) = p(a|bxy\lambda)p(b|xy\lambda)\]
and consider Alice’s probability $p(a|bxy\lambda)$: obtaining an outcome $a$ certainly counts as an event in $A$, and Alice’s setting $x$ and the physical state $\lambda$ certainly count as stuff in $\Lambda$. On the other hand, $b$ and $y$ are clearly stuff in $B$. So we have
\[ p(a|bxy\lambda) = p(a|x\lambda) \]
Doing the analogous reasoning for Bob’s probability $p(b|xy\lambda)$ (and swapping $A$ with $B$ in the definition of local causality) we have
\[ p(b|xy\lambda) = p(b|y\lambda) \]
and substituting this back we get
\[p(ab|xy\lambda) = p(a|x\lambda)p(b|y\lambda)\]
which is just factorisability.

So there we have it, a perfectly fine derivation of Bell’s theorem, using only two simple and well-motivated assumptions: no conspiracy and local causality. There is no need for the technical assumption of factorisability. Because of this it annoys me to no end when people implicitly conflate factorisability and local causality, or even explicitly state that they are equivalent.

Is there any other way of motivating factorisability, or are we stuck with local causality? A popular way to do it nowadays is through Reichenbach’s principle, which states that if two events A and B are correlated, then either A influences B, B influences A, or there is a common cause C such that
\[ p(AB|C) = p(A|C)p(B|C)\]
It is easy to see that this directly implies factorisability for the Bell scenario.

It is often said that Reichenbach’s principle embodies the idea that correlations cry out for explanations. This is bollocks. It demands the explanation to have a very specific form, namely the factorised one. Why? Why doesn’t an entangled state, for example, count as a valid explanation? If you ask an experimentalist that just did a Bell test, I don’t think she (more precisely Marissa Giustina) will tell you that the correlations came out of nowhere. I bet she will tell you that the correlations are there because she spent years in a cold, damp, dusty basement without phone reception working on the source and the detectors to produce them. Furthermore, the idea that “if the probabilities factorise, you have found the explanation for the correlation” does not actually work.

I think the correct way to deal with Bell correlations is not to throw your hands in the air and claim that they cannot be explained, but to develop a quantum Reichenbach principle to tell which correlations have a quantum explanation and which not. This is currently a hot research topic.

But leaving those grandiose claims aside, is there a good motivation for Reichenbach’s principle? I don’t think so. Reichenbach himself motivated his principle from considerations about entropy and the arrow of time, which simply do not apply to a simple quantum state of two qubits. There may be another motivation other than his original one, but I don’t know of any.

To conclude, as far as I know local causality is really the only way to motivate factorisability. If you don’t like the simple version of Bell’s theorem, you are pretty much stuck with the nonlocal version. But does it also have its problems? Well, the sociological one is its name, which leads to the undying idea in the popular culture that quantum mechanics allows for faster than light signalling or even travelling. But the real one is that it doesn’t allow you to do quantum key distribution based on Bell’s theorem (note that the usual quantum key distribution is based on quantum mechanics itself, and only uses Bell’s theorem as a source of inspiration).

If you use the simple version of Bell’s theorem and believe in no action at a distance, a violation of a Bell inequality implies not only that your outcomes are correlated with Bob’s, but also that they are in principle unpredictable, so you managed to share a secret key with him, which you can use for example for a one-time pad (which raises the question of why don’t Bohmians march in the street against funding for research in QKD). But if you use the nonlocal version of Bell’s theorem and violate a Bell inequality, you only find out that your outcomes are not locally causal – they can still be deterministic and nonlocal.[2]

Update: Rewrote the paragraph about QKD.

Understanding Bell’s theorem part 1: the simple version

To continue with the series of “public service” posts, I will write the presentation of Bell’s theorem that I would like to have read when I was learning it. My reaction at the time was, I believe, similar to most students’: what the fuck am I reading? And my attempts to search the literature to understand what was going on only made my bewilderement worse, as the papers disagree about what are the assumptions in Bell’s theorem, what are the names of the assumptions, what is the conclusion we should take from Bell’s theorem, and even what Bell’s theorem even is! Given this widespread confusion, it is no wonder that so many crackpots obsess about it!

This is the first of a series of three posts about several versions of Bell’s theorem. I’m starting with what I believe is by consensus the simplest version: the one proved by Clauser, Horne, Shimony, and Holt in 1969, based on Bell’s original version from 1964.

The theorem is about explaining the statistics observed by two experimenters, Alice and Bob, that are making measurements on some physical system in a space-like separated way. The details of their experiment are not important for the theorem (of course, they are important for actually doing the experiment). What is important is that each experimenter has two possible settings, named 0 and 1, and for each setting the measurement has two possible outcomes, again named 0 and 1.

Of course it is not actually possible to have only two settings in a real experiment: usually the measurement depends on a continuous parameter, like the angle with which you set a wave plate, or the phase of the laser with which you hit an ion, and you are only able to set this continuous parameter with finite precision. But this is not a problem, as we only need to define in advance that “this angle corresponds to setting 0” and “this angle corresponds to setting 1”. If the angles are not a good approximation to the ideal settings you are just going to get bad statistics.

Analogously, it is also not actually possible to have only two outcomes for each measurement, most commonly because you lost a photon and no detector clicked, but also because you can have multiple detections, or you might be doing a measurement on a continuous variable, like position. Again, the important thing is that you define in advance which outcomes correspond to the 0 outcome, and which outcomes correspond to the 1 outcome. Indeed, this is exactly what was done in the recent loophole-free Bell tests: they defined the no-detection outcome to correspond to the outcome 1.

Having their settings and outcomes defined like this, our experimenters measure some conditional probabilities $p(ab|xy)$, where $a,b$ are Alice and Bob’s outcomes, and $x,y$ are their settings. Now they want to explain these correlations. How did they come about? Well, they obtained them by measuring some physical system $\lambda$ (that can be a quantum state, or something more exotic like a Bohmian corpuscle) that they did not have complete control over, so it is reasonable to write the probabilities as arising from an averaging over different values of $\lambda$. So they decompose the probabilities as
\[ p(ab|xy) = \sum_\lambda p(\lambda|xy)p(ab|xy\lambda) \]
Note that this is not an assumption, just a mathematical identity. If you are an experimental superhero and can really make your source emit the same quantum state in every single round of the experiment you just get a trivial decomposition with a single $\lambda$ (incidentally, by Caratheodory’s theorem one needs only 13 different $\lambda$s to write this decomposition, so the use of integrals over $\lambda$ in some proofs of Bell’s theorem is rather overkill).

The first assumption that we use in the proof is that the physical system $\lambda$ is not correlated with the settings $x$ and $y$, that is $p(\lambda|xy) = p(\lambda)$. I think this assumption is necessary to even do science, because if it were not possible to probe a physical system independently of its state, we couldn’t hope to be able to learn what its actual state is. It would be like trying to find a correlation between smoking and cancer when your sample of patients is chosen by a tobacco company. This assumption is variously called “freedom of choice”, “no superdeterminism”, or “no conspiracy”. I think “freedom of choice” is a really bad name, as in actual experiments nobody chooses the settings: instead they are determined by a quantum random number generator or by the bit string of “Doctor Who”. As for “no superdeterminism”, I think the name is rather confusing, as the assumption has nothing to do with determinism — it is possible to respect it in a deterministic theory, and it is possible to violate it in a indeterministic theory. Instead I’ll go with “no conspiracy”:

  • No conspiracy:   $p(\lambda|xy) = p(\lambda)$.

With this assumption the decomposition of the probabilities simplifies to
\[ p(ab|xy) = \sum_\lambda p(\lambda)p(ab|xy\lambda) \]

The second assumption that we’ll use is that the outcomes $a$ and $b$ are deterministic functions of the settings $x$ and $y$ and the physical system $\lambda$. This assumption is motivated by the age-old idea that the indeterminism we see in quantum mechanics is only a result of our ignorance about the physical system we are measuring, and that as soon as we have a complete specification of it — given by $\lambda$ — the probabilities would disappear from consideration and a deterministic theory would be recovered. This assumption is often called “realism”. I find this name incredibly stupid. Are the authors that use them really saying that they cannot conceive of an objective reality that is not deterministic? And that such a complex concept such as realism reduces to merely determinism? And furthermore they are blissfully ignoring the existece of collapse models, which are realistic but fundamentally indeterministic. As far as I know the name realism was coined by Bernard d’Espagnat in a Scientific American article from 1979, and since them it caught on. Maybe people liked it because Einstein, Podolsky and Rosen defended that a deterministic quantity is for sure real (but they did not claim that indeterministic quantities are not real), I don’t know. But I refuse to use it, I’ll go with the very straightforward and neutral name “determinism”.

  • Determinism:   $p(ab|xy\lambda) \in \{0,1\}$.

An immediate consequence of this assumption is that $p(ab|xy\lambda) = p(a|xy\lambda)p(b|xy\lambda)$ and therefore that the decomposition of $p(ab|xy)$ becomes
\[ p(ab|xy) = \sum_\lambda p(\lambda)p(a|xy\lambda)p(b|xy\lambda) \]

The last assumption we’ll need is that the probabilities that Alice sees do not depend on which setting Bob used for his measurement, i.e., that $p(a|xy\lambda) = p(a|x\lambda)$. The motivation for it is that since the measurements are made in a space-like separated way, a signal would have to travel from Bob’s lab to Alice’s faster than light in order to influence her result. Relativity does not like it, but does not outright forbid it either, if you are ok with having a preferred reference frame (I’m not). Even before the discovery of relativity Newton already found such action at a distance rather distasteful:

It is inconceivable that inanimate Matter should, without the Mediation of something else, which is not material, operate upon, and affect other matter without mutual Contact… That Gravity should be innate, inherent and essential to Matter, so that one body may act upon another at a distance thro’ a Vacuum, without the Mediation of any thing else, by and through which their Action and Force may be conveyed from one to another, is to me so great an Absurdity that I believe no Man who has in philosophical Matters a competent Faculty of thinking can ever fall into it.

Without using such eloquence, my own worry is that giving up on this would put into question how can we ever isolate a system in order to do measurements on it whose result does not depend on the state of the rest of universe.

This assumption was called in the literature “locality”, “no signalling”, and “no action at a distance”. My only beef with “locality” is that this word is overused, so nobody really knows what it means; “no signalling”, on the other hand is just bad, as the best example we have of a theory that violates this assumption — Bohmian mechanics — does not actually let us signal with it. I’ll go again for the more neutral word and stick with “no action at a distance”.

  • No action at a distance:   $p(a|xy\lambda) = p(a|x\lambda)$ and $p(b|xy\lambda) = p(b|y\lambda)$.

With this assumption we have the final decomposition of the conditional probabilities as
\[ p(ab|xy) = \sum_\lambda p(\lambda)p(a|x\lambda)p(b|y\lambda) \]
This is what we need to prove a Bell inequality. Consider the sum of probabilities
p_\text{succ} = \frac14\Big(p(00|00) + p(11|00) + p(00|01) + p(11|01) \\ p(00|10) + p(11|10) + p(01|11) + p(10|11)\Big)
This can be interpreted as the probability of success in a game where Alice and Bob receive inputs $x$ and $y$ from a referee, and must return equal outputs if the inputs are 00, 01, or 10, and must return different outputs if the inputs are 11.

We want to prove an upper bound to $p_\text{succ}$ from the decomposition of the conditional probabilities derived above. First we rewrite it as
\[ p_\text{succ} = \sum_{abxy} M^{ab}_{xy} p(ab|xy) = \sum_{abxy} \sum_\lambda M^{ab}_{xy} p(\lambda)p(a|x\lambda)p(b|y\lambda) \]
where $M^{ab}_{xy} = \frac14\delta_{a\oplus b,xy}$ are the coefficients defined by the above sum of probabilities. Note now that
\[ p_\text{succ} \le \max_\lambda \sum_{abxy} M^{ab}_{xy} p(a|x\lambda)p(b|y\lambda) \]
as the convex combination over $\lambda$ can only reduce the value of $p_\text{succ}$. And since the functions $p(a|x\lambda)$ and $p(b|y\lambda)$ are assumed to be deterministic, there can only be a finite number of them (in fact 4 different functions for Alice and 4 for Bob), so we can do the maximization over $\lambda$ simply by trying all 16 possibilities. Doing that, we see that
\[p_\text{succ} \le \frac34\]
for theories that obey no conspiracy, determinism, and no action at a distance. This is the famous CHSH inequality.

On the other hand, according to quantum mechanics it is possible to obtain
\[p_\text{succ} = \frac{2 + \sqrt2}{4}\]
and a violation of the bound $3/4$ was observed experimentally, so at least one of the three assumptions behind the theorem must be false. Which one?

If your interpretation of quantum mechanics has a single world but no collapse, you have a problem

To inaugurate this blog I want to talk about Daniela Frauchiger and Renato Renner’s polemical new paper, Single-world interpretations of quantum theory cannot be self-consistent. Since lots of people want to understand what the paper is saying, but do not want to go through its rather formal language, I thought it would be useful to present the argument here in a more friendly way.

To put the paper in context, it is better to first go through a bit of history.

Understanding unitary quantum mechanics is tough. The first serious attempt to do it only came in 1957, when Everett proposed the Many-Worlds interpretation. The mainstream position within the physics community was not to try to understand unitary quantum mechanics, but to modify it, through some ill-defined collapse rule, and some ill-defined prohibition against describing humans with quantum mechanics. But this solution has fallen out of favour nowadays, as experiments show that larger and larger physical systems do obey quantum mechanics, and very few people believe that collapse is a physical process. The most widely accepted interpretations nowadays postulate that the dynamics are fundamentally unitary, and that collapse only happens in the mind of the observer.

But this seems a weird position to be in, to assume the same dynamics as Many-Worlds, but to postulate that there is anyway a single world. You are bound to get into trouble. What sort of trouble is that? This is the question that the paper explores.

That you do get into trouble was first shown by Deutsch in his 1985 paper Quantum theory as a universal physical theory, where he presents a much improved version of Wigner’s friend gedankenexperiment (if you want to read something truly insane, take a look at Wigner’s original version). It goes like this:

Wigner is outside a perfectly isolated laboratory, and inside it there is a friend who is going to make a measurement on a qubit. Their initial state is

\[ \ket{\text{Wigner}}\ket{\text{friend}}\frac{\ket{0}+\ket{1}}{\sqrt2} \]

After the friend does his measurement, their state becomes

\[ \ket{\text{Wigner}}\frac{\ket{\text{friend}_0}\ket{0} + \ket{\text{friend}_1}\ket{1}}{\sqrt2} \]

At this point, the friend writes a note certifying that he has indeed done the measurement, but without revealing which outcome he has seen. The state becomes

\[ \ket{\text{Wigner}}\frac{\ket{\text{friend}_0}\ket{0} + \ket{\text{friend}_1}\ket{1}}{\sqrt2}\ket{\text{I did the measurement}} \]

Now Wigner undoes his friend’s measurement and applies a Hadamard on the qubit (i.e., rotates them to the Bell basis), mapping the state to

\[ \ket{\text{Wigner}}\ket{\text{friend}}\ket{0}\ket{\text{I did the measurement}} \]

Finally, Wigner and his friend can meet and discuss what they will get if they measure the qubit in the computational basis. Believing in Many-Worlds, Wigner says that they will see the result 0 with certainty. The friend is confused. His memory was erased by Wigner, and the only thing he has is this note in his own handwriting saying that he has definitely done the measurement. Believing in a single world, he deduces he was either in the state $\ket{\text{friend}_0}\ket{0}$ or $\ket{\text{friend}_1}\ket{1}$, and therefore that the qubit, after Wigner’s manipulations, is either in the state $\frac{\ket{0}+\ket{1}}{\sqrt2}$ or $\frac{\ket{0}-\ket{1}}{\sqrt2}$, and that the result of the measurement will be either 0 or 1 with equal probability.

So we have a contradiction, but not a very satisfactory one, as there isn’t an outcome that, if obtained, falsifies the single world theory (Many-Worlds, on the other hand, is falsified if the outcome is 1). The best one can do is repeat the experiment many times and say something like: I obtained N zeroes in a row, which means that the probability that Many-Worlds is correct is $1/(1+2^{-N})$, and the probability that the single world theory is correct is $1/(1+2^{N})$.

Can we strengthen this contradiction? This is one of the things Frauchiger and Renner want to do. Luckily, this strengthening can be done without going through their full argument, as a simpler scenario suffices.

Consider now two experimenters, Alice and Bob, that are perfectly isolated from each other but for a single qubit that both can access. The state of everyone starts as

\[ \ket{\text{Alice}}\frac{\ket{0}+\ket{1}}{\sqrt2}\ket{\text{Bob}} \]

and Alice makes a first measurement on the qubit, mapping the state to

\[ \frac{\ket{\text{Alice}_0}\ket{0}+\ket{\text{Alice}_1}\ket{1}}{\sqrt2}\ket{\text{Bob}} \]

Now focus on one of Alice’s copies, say Alice$_0$. If she believes in a single world, she believes that Bob will definitely see outcome 0 as well. But from Bob’s point of view both outcomes are still possible. If he goes on to do the experiment and sees outcome 1 it is over, the single world theory is falsified.

This argument has the obvious disadvantage of not being testable, as Alice$_0$ and Bob$_1$ will never meet, and therefore nobody will see the contradiction. Still, I find it an uncomfortable contradiction to have, even if hidden from view. And as far as I understand, this is all that Frauchiger and Renner have to say against Bohmian mechanics.

The full version of their argument is necessary to argue against a deeply personalistic single-world interpretation, where one would only demand a single world to exist for themselves, and allow everyone else to be in Many-Worlds. This would correspond to taking the point of view of Wigner in the first gedankenexperiment, or the point of view of Alice$_0$ in the second. As far as I’m aware nobody actually defends such an interpretation, but it does look similar to QBism to me.

To the argument, then. Their scenario is a double Wigner’s friend where we have two friends, F1 and F2, and two wigners, A and W. The gedankenexperiment starts with a quantum coin in a biased superposition of heads and tails:

\[ \frac1{\sqrt3}\ket{h} + \sqrt{\frac23}\ket{t} \]

At time t=0:10 F1 measures the coin in the computational basis, mapping the state to

\[ \frac1{\sqrt3}\ket{h}\ket{F1_h} + \sqrt{\frac23}\ket{t}\ket{F1_t} \]

To avoid clutter, I will redefine the degrees of freedom of this coin to be part of F1’s degrees of freedom, and write simply

\[ \frac1{\sqrt3}\ket{F1_h} + \sqrt{\frac23}\ket{F1_t} \]

Now, F1 prepares a qubit in the state $\ket{0}$ if she saw heads, or the state $\ket{+}$ if she saw tails, mapping the state to

\[ \frac1{\sqrt3}\ket{F1_h}\ket{0} + \sqrt{\frac23}\ket{F1_t}\ket{+} \]

F1 sends this qubit to F2, who measures it in the computational basis at time t=0:20, mapping the state to (I’m writing the state of the qubit inside F2’s state to avoid clutter, as before)

\[ \frac1{\sqrt3}\ket{F1_h}\ket{F2_0} + \frac1{\sqrt3}\ket{F1_t}\ket{F2_0} + \frac1{\sqrt3}\ket{F1_t}\ket{F2_1}\]

Note that the term in the superposition that has $\ket{F2_1}$ has also $\ket{F1_t}$, and no other term in the superposition has $\ket{F2_1}$. Based on that, F2 reasons: If there is a copy of F2 that sees 1 at t=0:20, there must be a copy of F1 that saw tail at t=0:10.

F1, on her side, knows that this is happening, and furthermore she knows that W will at time t=0:40 measure F2 in the basis \[\{\ket{F2_+},\ket{F2_-}\} = \left\{\frac{\ket{F2_0} + \ket{F2_1}}{\sqrt2},\frac{\ket{F2_0} – \ket{F2_1}}{\sqrt2}\right\}\]
On that basis, the current state reads
\[ \frac1{\sqrt6}\ket{F1_h}\ket{F2_+} + \frac1{\sqrt6}\ket{F1_h}\ket{F2_-} + \sqrt{\frac23}\ket{F1_t}\ket{F2_+} \]
Based on this, F1 reasons: If there is a copy of F1 that sees tail at t=0:10, there must be a copy of W that sees $\color{red} F2_+$ at t=0:40.

Now it is A’s turn to act. She measures at time t=0:30 the state of F1 in the basis
\[\{\ket{F1_+},\ket{F1_-}\} = \left\{\frac{\ket{F1_h} + \ket{F1_t}}{\sqrt2},\frac{\ket{F1_h} – \ket{F1_t}}{\sqrt2}\right\}\]
In this basis (and the computational basis for F2) the state of F1 and F2 is
\[ \frac2{\sqrt6}\ket{F1_+}\ket{F2_0} + \frac1{\sqrt6}\ket{F1_+}\ket{F2_1} – \frac1{\sqrt6}\ket{F1_-}\ket{F2_1} \]
Based on this state, A reasons: If there is a copy of A that sees – at t=0:30, there must be a copy of F2 that saw 1 at t=0:20.

Finally, Wigner acts. He measures at time t=0:40 the state of A and the state of F2, both in the superposition basis. The full state is, before the measurement (with F1 included within A)

\[ \frac3{\sqrt12}\ket{A_+}\ket{F2_+} + \frac1{\sqrt12}\ket{A_-}\ket{F2_+} + \frac1{\sqrt12}\ket{A_+}\ket{F2_-} – \frac1{\sqrt12}\ket{A_-}\ket{F2_-} \]
so Wigner obtains the result $A_-,F2_-$ with probability $1/12$. Let us assume he did so (otherwise we just repeat the experiment a few times, and this result will happen with exponentially high probability). Furthermore, we assume that Wigner is the superobserver for whom there is a single world, so there is only one copy of him. Collecting this statement together with the statements in red, we have:

  1. The only copy of Wigner sees $A_-,F2_-$ at t=0:40.
  2. If there is a copy of A that sees – at t=0:30, there must be a copy of F2 that saw 1 at t=0:20.
  3. If there is a copy of F2 that sees 1 at t=0:20, there must be a copy of F1 that saw tail at t=0:10.
  4. If there is a copy of F1 that sees tail at t=0:10, there must be a copy of W that sees $F2_+$ at t=0:40.

Following the chain of implications, we have
The only copy of Wigner sees $A_-,F2_-$ at t=0:40.
There is a copy of A that saw – at t=0:30.
There is a copy of F2 that saw 1 at t=0:20.
There is a copy of F1 that saw tail at t=0:10.
There is a copy of W that sees $F2_+$ at t=0:40.

What should we conclude from this? Is this kind of reasoning valid? The discussions about this paper that I have witnessed have focussed on two questions: Are the red statements even valid, in isolation? Assuming that they are valid, is it legitimate to combine them in this way?

Instead of giving my own opinion, I’d like to state what different interpretations make of this argument.

Collapse models: I told you so.

Copenhagen (old style): Results of measurements must be described classically. If you try to describe them with quantum states you get nonsense.

Copenhagen (new style): There exist no facts of the world per se, there exist facts only relative to observers. It is meaningless to compare facts relative to different observers.

QBism: A measurement result is a personal experience of the agent who made the measurement. An agent can not use quantum mechanics to talk about another agent’s personal experience.

Bohmian mechanics: I don’t actually know what Bohmians make of this. But since Bohmians know about the surrealism of their trajectories, know that “empty” waves have an effect on the “real” waves, know that their solution to the measurement problem is no better than Many-Worlds’, and still find Bohmian mechanics compelling, I guess they will keep finding it compelling no matter what. In this point, I agree with Deutsch: pilot-wave theories are parallel-universes theories in a state of chronic denial.

What do you think?

Update: Rewrote the history paragraph, as it was just wrong. Thanks for Harvey Brown for pointing that out.
Update 2: Changed QBist statement to more accurately reflect the QBist’s point of view.

Hello, world!

Since I routinely write papers, and I have empirical evidence that they were read by people other than the authors and the referees, I conjecture that people might actually be interested in reading what I write! Therefore I’m starting this blog to post some stuff I wanted to write about that, while scientific, are not really scientific papers. Better than using arXiv as a blog ;p

Even though I’m not a native English speaker, I’ll dare to write in Shakespeare’s language anyway. So one shouldn’t expect to find Shakespeare-worthy material here (I assure you it wouldn’t be much better if I were to write in Portuguese). I’ll do this simply because I want to write about physics, and physics is done in English nowadays.