Understanding Bell’s theorem part 1: the simple version

To continue with the series of “public service” posts, I will write the presentation of Bell’s theorem that I would like to have read when I was learning it. My reaction at the time was, I believe, similar to most students’: what the fuck am I reading? And my attempts to search the literature to understand what was going on only made my bewilderement worse, as the papers disagree about what are the assumptions in Bell’s theorem, what are the names of the assumptions, what is the conclusion we should take from Bell’s theorem, and even what Bell’s theorem even is! Given this widespread confusion, it is no wonder that so many crackpots obsess about it!

This is the first of a series of three posts about several versions of Bell’s theorem. I’m starting with what I believe is by consensus the simplest version: the one proved by Clauser, Horne, Shimony, and Holt in 1969, based on Bell’s original version from 1964.

The theorem is about explaining the statistics observed by two experimenters, Alice and Bob, that are making measurements on some physical system in a space-like separated way. The details of their experiment are not important for the theorem (of course, they are important for actually doing the experiment). What is important is that each experimenter has two possible settings, named 0 and 1, and for each setting the measurement has two possible outcomes, again named 0 and 1.

Of course it is not actually possible to have only two settings in a real experiment: usually the measurement depends on a continuous parameter, like the angle with which you set a wave plate, or the phase of the laser with which you hit an ion, and you are only able to set this continuous parameter with finite precision. But this is not a problem, as we only need to define in advance that “this angle corresponds to setting 0” and “this angle corresponds to setting 1”. If the angles are not a good approximation to the ideal settings you are just going to get bad statistics.

Analogously, it is also not actually possible to have only two outcomes for each measurement, most commonly because you lost a photon and no detector clicked, but also because you can have multiple detections, or you might be doing a measurement on a continuous variable, like position. Again, the important thing is that you define in advance which outcomes correspond to the 0 outcome, and which outcomes correspond to the 1 outcome. Indeed, this is exactly what was done in the recent loophole-free Bell tests: they defined the no-detection outcome to correspond to the outcome 1.

Having their settings and outcomes defined like this, our experimenters measure some conditional probabilities $p(ab|xy)$, where $a,b$ are Alice and Bob’s outcomes, and $x,y$ are their settings. Now they want to explain these correlations. How did they come about? Well, they obtained them by measuring some physical system $\lambda$ (that can be a quantum state, or something more exotic like a Bohmian corpuscle) that they did not have complete control over, so it is reasonable to write the probabilities as arising from an averaging over different values of $\lambda$. So they decompose the probabilities as
\[ p(ab|xy) = \sum_\lambda p(\lambda|xy)p(ab|xy\lambda) \]
Note that this is not an assumption, just a mathematical identity. If you are an experimental superhero and can really make your source emit the same quantum state in every single round of the experiment you just get a trivial decomposition with a single $\lambda$ (incidentally, by Caratheodory’s theorem one needs only 13 different $\lambda$s to write this decomposition, so the use of integrals over $\lambda$ in some proofs of Bell’s theorem is rather overkill).

The first assumption that we use in the proof is that the physical system $\lambda$ is not correlated with the settings $x$ and $y$, that is $p(\lambda|xy) = p(\lambda)$. I think this assumption is necessary to even do science, because if it were not possible to probe a physical system independently of its state, we couldn’t hope to be able to learn what its actual state is. It would be like trying to find a correlation between smoking and cancer when your sample of patients is chosen by a tobacco company. This assumption is variously called “freedom of choice”, “no superdeterminism”, or “no conspiracy”. I think “freedom of choice” is a really bad name, as in actual experiments nobody chooses the settings: instead they are determined by a quantum random number generator or by the bit string of “Doctor Who”. As for “no superdeterminism”, I think the name is rather confusing, as the assumption has nothing to do with determinism — it is possible to respect it in a deterministic theory, and it is possible to violate it in a indeterministic theory. Instead I’ll go with “no conspiracy”:

  • No conspiracy:   $p(\lambda|xy) = p(\lambda)$.

With this assumption the decomposition of the probabilities simplifies to
\[ p(ab|xy) = \sum_\lambda p(\lambda)p(ab|xy\lambda) \]

The second assumption that we’ll use is that the outcomes $a$ and $b$ are deterministic functions of the settings $x$ and $y$ and the physical system $\lambda$. This assumption is motivated by the age-old idea that the indeterminism we see in quantum mechanics is only a result of our ignorance about the physical system we are measuring, and that as soon as we have a complete specification of it — given by $\lambda$ — the probabilities would disappear from consideration and a deterministic theory would be recovered. This assumption is often called “realism”. I find this name incredibly stupid. Are the authors that use them really saying that they cannot conceive of an objective reality that is not deterministic? And that such a complex concept such as realism reduces to merely determinism? And furthermore they are blissfully ignoring the existece of collapse models, which are realistic but fundamentally indeterministic. As far as I know the name realism was coined by Bernard d’Espagnat in a Scientific American article from 1979, and since them it caught on. Maybe people liked it because Einstein, Podolsky and Rosen defended that a deterministic quantity is for sure real (but they did not claim that indeterministic quantities are not real), I don’t know. But I refuse to use it, I’ll go with the very straightforward and neutral name “determinism”.

  • Determinism:   $p(ab|xy\lambda) \in \{0,1\}$.

An immediate consequence of this assumption is that $p(ab|xy\lambda) = p(a|xy\lambda)p(b|xy\lambda)$ and therefore that the decomposition of $p(ab|xy)$ becomes
\[ p(ab|xy) = \sum_\lambda p(\lambda)p(a|xy\lambda)p(b|xy\lambda) \]

The last assumption we’ll need is that the probabilities that Alice sees do not depend on which setting Bob used for his measurement, i.e., that $p(a|xy\lambda) = p(a|x\lambda)$. The motivation for it is that since the measurements are made in a space-like separated way, a signal would have to travel from Bob’s lab to Alice’s faster than light in order to influence her result. Relativity does not like it, but does not outright forbid it either, if you are ok with having a preferred reference frame (I’m not). Even before the discovery of relativity Newton already found such action at a distance rather distasteful:

It is inconceivable that inanimate Matter should, without the Mediation of something else, which is not material, operate upon, and affect other matter without mutual Contact… That Gravity should be innate, inherent and essential to Matter, so that one body may act upon another at a distance thro’ a Vacuum, without the Mediation of any thing else, by and through which their Action and Force may be conveyed from one to another, is to me so great an Absurdity that I believe no Man who has in philosophical Matters a competent Faculty of thinking can ever fall into it.

Without using such eloquence, my own worry is that giving up on this would put into question how can we ever isolate a system in order to do measurements on it whose result does not depend on the state of the rest of universe.

This assumption was called in the literature “locality”, “no signalling”, and “no action at a distance”. My only beef with “locality” is that this word is overused, so nobody really knows what it means; “no signalling”, on the other hand is just bad, as the best example we have of a theory that violates this assumption — Bohmian mechanics — does not actually let us signal with it. I’ll go again for the more neutral word and stick with “no action at a distance”.

  • No action at a distance:   $p(a|xy\lambda) = p(a|x\lambda)$ and $p(b|xy\lambda) = p(b|y\lambda)$.

With this assumption we have the final decomposition of the conditional probabilities as
\[ p(ab|xy) = \sum_\lambda p(\lambda)p(a|x\lambda)p(b|y\lambda) \]
This is what we need to prove a Bell inequality. Consider the sum of probabilities
p_\text{succ} = \frac14\Big(p(00|00) + p(11|00) + p(00|01) + p(11|01) \\ p(00|10) + p(11|10) + p(01|11) + p(10|11)\Big)
This can be interpreted as the probability of success in a game where Alice and Bob receive inputs $x$ and $y$ from a referee, and must return equal outputs if the inputs are 00, 01, or 10, and must return different outputs if the inputs are 11.

We want to prove an upper bound to $p_\text{succ}$ from the decomposition of the conditional probabilities derived above. First we rewrite it as
\[ p_\text{succ} = \sum_{abxy} M^{ab}_{xy} p(ab|xy) = \sum_{abxy} \sum_\lambda M^{ab}_{xy} p(\lambda)p(a|x\lambda)p(b|y\lambda) \]
where $M^{ab}_{xy} = \frac14\delta_{a\oplus b,xy}$ are the coefficients defined by the above sum of probabilities. Note now that
\[ p_\text{succ} \le \max_\lambda \sum_{abxy} M^{ab}_{xy} p(a|x\lambda)p(b|y\lambda) \]
as the convex combination over $\lambda$ can only reduce the value of $p_\text{succ}$. And since the functions $p(a|x\lambda)$ and $p(b|y\lambda)$ are assumed to be deterministic, there can only be a finite number of them (in fact 4 different functions for Alice and 4 for Bob), so we can do the maximization over $\lambda$ simply by trying all 16 possibilities. Doing that, we see that
\[p_\text{succ} \le \frac34\]
for theories that obey no conspiracy, determinism, and no action at a distance. This is the famous CHSH inequality.

On the other hand, according to quantum mechanics it is possible to obtain
\[p_\text{succ} = \frac{2 + \sqrt2}{4}\]
and a violation of the bound $3/4$ was observed experimentally, so at least one of the three assumptions behind the theorem must be false. Which one?

If your interpretation of quantum mechanics has a single world but no collapse, you have a problem

To inaugurate this blog I want to talk about Daniela Frauchiger and Renato Renner’s polemical new paper, Single-world interpretations of quantum theory cannot be self-consistent. Since lots of people want to understand what the paper is saying, but do not want to go through its rather formal language, I thought it would be useful to present the argument here in a more friendly way.

To put the paper in context, it is better to first go through a bit of history.

Understanding unitary quantum mechanics is tough. The first serious attempt to do it only came in 1957, when Everett proposed the Many-Worlds interpretation. The mainstream position within the physics community was not to try to understand unitary quantum mechanics, but to modify it, through some ill-defined collapse rule, and some ill-defined prohibition against describing humans with quantum mechanics. But this solution has fallen out of favour nowadays, as experiments show that larger and larger physical systems do obey quantum mechanics, and very few people believe that collapse is a physical process. The most widely accepted interpretations nowadays postulate that the dynamics are fundamentally unitary, and that collapse only happens in the mind of the observer.

But this seems a weird position to be in, to assume the same dynamics as Many-Worlds, but to postulate that there is anyway a single world. You are bound to get into trouble. What sort of trouble is that? This is the question that the paper explores.

That you do get into trouble was first shown by Deutsch in his 1985 paper Quantum theory as a universal physical theory, where he presents a much improved version of Wigner’s friend gedankenexperiment (if you want to read something truly insane, take a look at Wigner’s original version). It goes like this:

Wigner is outside a perfectly isolated laboratory, and inside it there is a friend who is going to make a measurement on a qubit. Their initial state is

\[ \ket{\text{Wigner}}\ket{\text{friend}}\frac{\ket{0}+\ket{1}}{\sqrt2} \]

After the friend does his measurement, their state becomes

\[ \ket{\text{Wigner}}\frac{\ket{\text{friend}_0}\ket{0} + \ket{\text{friend}_1}\ket{1}}{\sqrt2} \]

At this point, the friend writes a note certifying that he has indeed done the measurement, but without revealing which outcome he has seen. The state becomes

\[ \ket{\text{Wigner}}\frac{\ket{\text{friend}_0}\ket{0} + \ket{\text{friend}_1}\ket{1}}{\sqrt2}\ket{\text{I did the measurement}} \]

Now Wigner undoes his friend’s measurement and applies a Hadamard on the qubit (i.e., rotates them to the Bell basis), mapping the state to

\[ \ket{\text{Wigner}}\ket{\text{friend}}\ket{0}\ket{\text{I did the measurement}} \]

Finally, Wigner and his friend can meet and discuss what they will get if they measure the qubit in the computational basis. Believing in Many-Worlds, Wigner says that they will see the result 0 with certainty. The friend is confused. His memory was erased by Wigner, and the only thing he has is this note in his own handwriting saying that he has definitely done the measurement. Believing in a single world, he deduces he was either in the state $\ket{\text{friend}_0}\ket{0}$ or $\ket{\text{friend}_1}\ket{1}$, and therefore that the qubit, after Wigner’s manipulations, is either in the state $\frac{\ket{0}+\ket{1}}{\sqrt2}$ or $\frac{\ket{0}-\ket{1}}{\sqrt2}$, and that the result of the measurement will be either 0 or 1 with equal probability.

So we have a contradiction, but not a very satisfactory one, as there isn’t an outcome that, if obtained, falsifies the single world theory (Many-Worlds, on the other hand, is falsified if the outcome is 1). The best one can do is repeat the experiment many times and say something like: I obtained N zeroes in a row, which means that the probability that Many-Worlds is correct is $1/(1+2^{-N})$, and the probability that the single world theory is correct is $1/(1+2^{N})$.

Can we strengthen this contradiction? This is one of the things Frauchiger and Renner want to do. Luckily, this strengthening can be done without going through their full argument, as a simpler scenario suffices.

Consider now two experimenters, Alice and Bob, that are perfectly isolated from each other but for a single qubit that both can access. The state of everyone starts as

\[ \ket{\text{Alice}}\frac{\ket{0}+\ket{1}}{\sqrt2}\ket{\text{Bob}} \]

and Alice makes a first measurement on the qubit, mapping the state to

\[ \frac{\ket{\text{Alice}_0}\ket{0}+\ket{\text{Alice}_1}\ket{1}}{\sqrt2}\ket{\text{Bob}} \]

Now focus on one of Alice’s copies, say Alice$_0$. If she believes in a single world, she believes that Bob will definitely see outcome 0 as well. But from Bob’s point of view both outcomes are still possible. If he goes on to do the experiment and sees outcome 1 it is over, the single world theory is falsified.

This argument has the obvious disadvantage of not being testable, as Alice$_0$ and Bob$_1$ will never meet, and therefore nobody will see the contradiction. Still, I find it an uncomfortable contradiction to have, even if hidden from view. And as far as I understand, this is all that Frauchiger and Renner have to say against Bohmian mechanics.

The full version of their argument is necessary to argue against a deeply personalistic single-world interpretation, where one would only demand a single world to exist for themselves, and allow everyone else to be in Many-Worlds. This would correspond to taking the point of view of Wigner in the first gedankenexperiment, or the point of view of Alice$_0$ in the second. As far as I’m aware nobody actually defends such an interpretation, but it does look similar to QBism to me.

To the argument, then. Their scenario is a double Wigner’s friend where we have two friends, F1 and F2, and two wigners, A and W. The gedankenexperiment starts with a quantum coin in a biased superposition of heads and tails:

\[ \frac1{\sqrt3}\ket{h} + \sqrt{\frac23}\ket{t} \]

At time t=0:10 F1 measures the coin in the computational basis, mapping the state to

\[ \frac1{\sqrt3}\ket{h}\ket{F1_h} + \sqrt{\frac23}\ket{t}\ket{F1_t} \]

To avoid clutter, I will redefine the degrees of freedom of this coin to be part of F1’s degrees of freedom, and write simply

\[ \frac1{\sqrt3}\ket{F1_h} + \sqrt{\frac23}\ket{F1_t} \]

Now, F1 prepares a qubit in the state $\ket{0}$ if she saw heads, or the state $\ket{+}$ if she saw tails, mapping the state to

\[ \frac1{\sqrt3}\ket{F1_h}\ket{0} + \sqrt{\frac23}\ket{F1_t}\ket{+} \]

F1 sends this qubit to F2, who measures it in the computational basis at time t=0:20, mapping the state to (I’m writing the state of the qubit inside F2’s state to avoid clutter, as before)

\[ \frac1{\sqrt3}\ket{F1_h}\ket{F2_0} + \frac1{\sqrt3}\ket{F1_t}\ket{F2_0} + \frac1{\sqrt3}\ket{F1_t}\ket{F2_1}\]

Note that the term in the superposition that has $\ket{F2_1}$ has also $\ket{F1_t}$, and no other term in the superposition has $\ket{F2_1}$. Based on that, F2 reasons: If there is a copy of F2 that sees 1 at t=0:20, there must be a copy of F1 that saw tail at t=0:10.

F1, on her side, knows that this is happening, and furthermore she knows that W will at time t=0:40 measure F2 in the basis \[\{\ket{F2_+},\ket{F2_-}\} = \left\{\frac{\ket{F2_0} + \ket{F2_1}}{\sqrt2},\frac{\ket{F2_0} – \ket{F2_1}}{\sqrt2}\right\}\]
On that basis, the current state reads
\[ \frac1{\sqrt6}\ket{F1_h}\ket{F2_+} + \frac1{\sqrt6}\ket{F1_h}\ket{F2_-} + \sqrt{\frac23}\ket{F1_t}\ket{F2_+} \]
Based on this, F1 reasons: If there is a copy of F1 that sees tail at t=0:10, there must be a copy of W that sees $\color{red} F2_+$ at t=0:40.

Now it is A’s turn to act. She measures at time t=0:30 the state of F1 in the basis
\[\{\ket{F1_+},\ket{F1_-}\} = \left\{\frac{\ket{F1_h} + \ket{F1_t}}{\sqrt2},\frac{\ket{F1_h} – \ket{F1_t}}{\sqrt2}\right\}\]
In this basis (and the computational basis for F2) the state of F1 and F2 is
\[ \frac2{\sqrt6}\ket{F1_+}\ket{F2_0} + \frac1{\sqrt6}\ket{F1_+}\ket{F2_1} – \frac1{\sqrt6}\ket{F1_-}\ket{F2_1} \]
Based on this state, A reasons: If there is a copy of A that sees – at t=0:30, there must be a copy of F2 that saw 1 at t=0:20.

Finally, Wigner acts. He measures at time t=0:40 the state of A and the state of F2, both in the superposition basis. The full state is, before the measurement (with F1 included within A)

\[ \frac3{\sqrt12}\ket{A_+}\ket{F2_+} + \frac1{\sqrt12}\ket{A_-}\ket{F2_+} + \frac1{\sqrt12}\ket{A_+}\ket{F2_-} – \frac1{\sqrt12}\ket{A_-}\ket{F2_-} \]
so Wigner obtains the result $A_-,F2_-$ with probability $1/12$. Let us assume he did so (otherwise we just repeat the experiment a few times, and this result will happen with exponentially high probability). Furthermore, we assume that Wigner is the superobserver for whom there is a single world, so there is only one copy of him. Collecting this statement together with the statements in red, we have:

  1. The only copy of Wigner sees $A_-,F2_-$ at t=0:40.
  2. If there is a copy of A that sees – at t=0:30, there must be a copy of F2 that saw 1 at t=0:20.
  3. If there is a copy of F2 that sees 1 at t=0:20, there must be a copy of F1 that saw tail at t=0:10.
  4. If there is a copy of F1 that sees tail at t=0:10, there must be a copy of W that sees $F2_+$ at t=0:40.

Following the chain of implications, we have
The only copy of Wigner sees $A_-,F2_-$ at t=0:40.
There is a copy of A that saw – at t=0:30.
There is a copy of F2 that saw 1 at t=0:20.
There is a copy of F1 that saw tail at t=0:10.
There is a copy of W that sees $F2_+$ at t=0:40.

What should we conclude from this? Is this kind of reasoning valid? The discussions about this paper that I have witnessed have focussed on two questions: Are the red statements even valid, in isolation? Assuming that they are valid, is it legitimate to combine them in this way?

Instead of giving my own opinion, I’d like to state what different interpretations make of this argument.

Collapse models: I told you so.

Copenhagen (old style): Results of measurements must be described classically. If you try to describe them with quantum states you get nonsense.

Copenhagen (new style): There exist no facts of the world per se, there exist facts only relative to observers. It is meaningless to compare facts relative to different observers.

QBism: A measurement result is a personal experience of the agent who made the measurement. An agent can not use quantum mechanics to talk about another agent’s personal experience.

Bohmian mechanics: I don’t actually know what Bohmians make of this. But since Bohmians know about the surrealism of their trajectories, know that “empty” waves have an effect on the “real” waves, know that their solution to the measurement problem is no better than Many-Worlds’, and still find Bohmian mechanics compelling, I guess they will keep finding it compelling no matter what. In this point, I agree with Deutsch: pilot-wave theories are parallel-universes theories in a state of chronic denial.

What do you think?

Update: Rewrote the history paragraph, as it was just wrong. Thanks for Harvey Brown for pointing that out.
Update 2: Changed QBist statement to more accurately reflect the QBist’s point of view.

Hello, world!

Since I routinely write papers, and I have empirical evidence that they were read by people other than the authors and the referees, I conjecture that people might actually be interested in reading what I write! Therefore I’m starting this blog to post some stuff I wanted to write about that, while scientific, are not really scientific papers. Better than using arXiv as a blog ;p

Even though I’m not a native English speaker, I’ll dare to write in Shakespeare’s language anyway. So one shouldn’t expect to find Shakespeare-worthy material here (I assure you it wouldn’t be much better if I were to write in Portuguese). I’ll do this simply because I want to write about physics, and physics is done in English nowadays.