Continuing the series on Bell’s theorem, I will now write about its most popular version, the one that people have in mind when they talk about quantum nonlocality: the version that Bell proved in his 1975 paper The theory of local beables.

But first things first: why do we even need another version of the theorem? Is there anything wrong with the simple version? Well, a problem that Bohmians have with it is that its conclusion is heavily slanted against their theory: quantum mechanics clearly respects *no conspiracy* and *no action at a distance*, but clearly does not respect *determinism*, so the most natural interpretation of the theorem is that trying to make quantum mechanics deterministic is a bad idea. The price you have to pay is having action at a distance in your theory, as Bohmian mechanics has. Because of this the Bohmians prefer to talk about another version of the theorem, that lends some support to the idea that the world is in some sense nonlocal.

There is also legitimate criticism to be made against the simple version of Bell’s theorem: namely that the assumption of *determinism* is too strong. This is easy to see, as we can cook up indeterministic correlations that are even weaker than the deterministic ones: if Alice and Bob play the CHSH game randomly they achieve $p_\text{succ} = 1/2$, well below the bound of $3/4$. This implies that just giving up on determinism does not allow you to violate Bell inequalities. You need to lose something more precious than that. What exactly?

The first attempt to answer this question was made by Clauser and Horne in 1974. Their proof goes like this: from *no conspiracy*, the probabilities decompose as

\[ p(ab|xy) = \sum_\lambda p(\lambda)p(ab|xy\lambda) \]

Then, they introduce their new assumption

- Factorisability: $p(ab|xy\lambda) = p(a|x\lambda)p(b|y\lambda)$.

which makes the probabilities reduce to

\[ p(ab|xy) = \sum_\lambda p(\lambda)p(a|x\lambda)p(b|y\lambda) \]

Noting that for any coefficients $M^{ab}_{xy}$ the Bell expression

\[ p_\text{succ} = \sum_{abxy} \sum_\lambda M^{ab}_{xy} p(\lambda)p(a|x\lambda)p(b|y\lambda) \]

is upperbounded by deterministic probability distributions $p(a|x\lambda)$ and $p(b|y\lambda)$, the rest of the proof of the simple version of Bell’s theorem applies, and we’re done.

So they can prove Bell’s theorem only from the assumptions of *no conspiracy* and *factorisability*, without assuming *determinism*. The problem is how to motivate *factorisability*. It is not a simple and intuitive condition like *determinism* or *no action a distance*, that my mum understands, but some weird technical stuff. Why would she care about probabilities factorising?

The justification that Clauser and Horne give is just that factorisability

…is a natural expression of a field-theoretical point of view, which in turn is an extrapolation from the common-sense view that there is no action at a distance.

What are they talking about? Certainly not about quantum fields, which do not factorise. Maybe about classical fields? But only those without correlations, because otherwise they don’t factorise either! Or are they thinking about deterministic fields? But then there would be no improvement with respect to the simple version of the theorem! And anyway why do they claim that it is an extrapolation of no action at a distance? They don’t have a derivation to be able to claim such a thing! It is hard for me to understand how anyone could have taken this assumption seriously. If I were allowed to just take some arbitrary technical condition as an assumption I could prove anything I wanted.

Luckily this unsatisfactory situation only lasted one year, as in 1975 Bell managed to find a proper motivation for *factorisability*, deriving it from his notion of local causality. Informally, it says that causes are close to their effects (my mum is fine with that). A bit more formally, it says that probabilities of events in a spacetime region $A$ depend only on stuff in its past light cone $\Lambda$, and not on stuff in a space-like separated region $B$ (my mum is not so fine with that). So we have

- Local causality: $p(A|\Lambda,B) = p(A|\Lambda)$.

How do we derive *factorisability* from that? Start by applying Bayes’ rule

\[p(ab|xy\lambda) = p(a|bxy\lambda)p(b|xy\lambda)\]

and consider Alice’s probability $p(a|bxy\lambda)$: obtaining an outcome $a$ certainly counts as an event in $A$, and Alice’s setting $x$ and the physical state $\lambda$ certainly count as stuff in $\Lambda$. On the other hand, $b$ and $y$ are clearly stuff in $B$. So we have

\[ p(a|bxy\lambda) = p(a|x\lambda) \]

Doing the analogous reasoning for Bob’s probability $p(b|xy\lambda)$ (and swapping $A$ with $B$ in the definition of local causality) we have

\[ p(b|xy\lambda) = p(b|y\lambda) \]

and substituting this back we get

\[p(ab|xy\lambda) = p(a|x\lambda)p(b|y\lambda)\]

which is just *factorisability*.

So there we have it, a perfectly fine derivation of Bell’s theorem, using only two simple and well-motivated assumptions: *no conspiracy* and *local causality*. There is no need for the technical assumption of *factorisability*. Because of this it annoys me to no end when people implicitly conflate *factorisability* and *local causality*, or even explicitly state that they are equivalent.

Is there any other way of motivating *factorisability*, or are we stuck with *local causality*? A popular way to do it nowadays is through Reichenbach’s principle, which states that if two events A and B are correlated, then either A influences B, B influences A, or there is a common cause C such that

\[ p(AB|C) = p(A|C)p(B|C)\]

It is easy to see that this directly implies *factorisability* for the Bell scenario.

It is often said that Reichenbach’s principle embodies the idea that correlations cry out for explanations. This is bollocks. It demands the explanation to have a very specific form, namely the factorised one. Why? Why doesn’t an entangled state, for example, count as a valid explanation? If you ask an experimentalist that just did a Bell test, I don’t think she (more precisely Marissa Giustina) will tell you that the correlations came out of nowhere. I bet she will tell you that the correlations are there because she spent years in a cold, damp, dusty basement without phone reception working on the source and the detectors to produce them. Furthermore, the idea that “if the probabilities factorise, you have found the explanation for the correlation” does not actually work.

I think the correct way to deal with Bell correlations is not to throw your hands in the air and claim that they cannot be explained, but to develop a quantum Reichenbach principle to tell which correlations have a quantum explanation and which not. This is currently a hot research topic.

But leaving those grandiose claims aside, is there a good motivation for Reichenbach’s principle? I don’t think so. Reichenbach himself motivated his principle from considerations about entropy and the arrow of time, which simply do not apply to a simple quantum state of two qubits. There may be another motivation other than his original one, but I don’t know of any.

To conclude, as far as I know *local causality* is really the only way to motivate *factorisability*. If you don’t like the simple version of Bell’s theorem, you are pretty much stuck with the nonlocal version. But does it also have its problems? Well, the sociological one is its name, which leads to the undying idea in the popular culture that quantum mechanics allows for faster than light signalling or even travelling. But the real one is that it doesn’t allow you to do quantum key distribution based on Bell’s theorem (note that the usual quantum key distribution is based on quantum mechanics itself, and only uses Bell’s theorem as a source of inspiration).

If you use the simple version of Bell’s theorem and believe in *no action at a distance*, a violation of a Bell inequality implies not only that your outcomes are correlated with Bob’s, but also that they are in principle unpredictable, so you managed to share a secret key with him, which you can use for example for a one-time pad (which raises the question of why don’t Bohmians march in the street against funding for research in QKD). But if you use the nonlocal version of Bell’s theorem and violate a Bell inequality, you only find out that your outcomes are not locally causal – they can still be deterministic and nonlocal.1

Update: Rewrote the paragraph about QKD.