# Understanding Bell’s theorem part 2: the nonlocal version

Continuing the series on Bell’s theorem, I will now write about its most popular version, the one that people have in mind when they talk about quantum nonlocality: the version that Bell proved in his 1975 paper The theory of local beables.

But first things first: why do we even need another version of the theorem? Is there anything wrong with the simple version? Well, a problem that Bohmians have with it is that its conclusion is heavily slanted against their theory: quantum mechanics clearly respects no conspiracy and no action at a distance, but clearly does not respect determinism, so the most natural interpretation of the theorem is that trying to make quantum mechanics deterministic is a bad idea. The price you have to pay is having action at a distance in your theory, as Bohmian mechanics has. Because of this the Bohmians prefer to talk about another version of the theorem, that lends some support to the idea that the world is in some sense nonlocal.

There is also legitimate criticism to be made against the simple version of Bell’s theorem: namely that the assumption of determinism is too strong. This is easy to see, as we can cook up indeterministic correlations that are even weaker than the deterministic ones: if Alice and Bob play the CHSH game randomly they achieve $p_\text{succ} = 1/2$, well below the bound of $3/4$. This implies that just giving up on determinism does not allow you to violate Bell inequalities. You need to lose something more precious than that. What exactly?

The first attempt to answer this question was made by Clauser and Horne in 1974. Their proof goes like this: from no conspiracy, the probabilities decompose as
$p(ab|xy) = \sum_\lambda p(\lambda)p(ab|xy\lambda)$
Then, they introduce their new assumption

• Factorisability:   $p(ab|xy\lambda) = p(a|x\lambda)p(b|y\lambda)$.

which makes the probabilities reduce to
$p(ab|xy) = \sum_\lambda p(\lambda)p(a|x\lambda)p(b|y\lambda)$
Noting that for any coefficients $M^{ab}_{xy}$ the Bell expression
$p_\text{succ} = \sum_{abxy} \sum_\lambda M^{ab}_{xy} p(\lambda)p(a|x\lambda)p(b|y\lambda)$
is upperbounded by deterministic probability distributions $p(a|x\lambda)$ and $p(b|y\lambda)$, the rest of the proof of the simple version of Bell’s theorem applies, and we’re done.

So they can prove Bell’s theorem only from the assumptions of no conspiracy and factorisability, without assuming determinism. The problem is how to motivate factorisability. It is not a simple and intuitive condition like determinism or no action a distance, that my mum understands, but some weird technical stuff. Why would she care about probabilities factorising?

The justification that Clauser and Horne give is just that factorisability

…is a natural expression of a field-theoretical point of view, which in turn is an extrapolation from the common-sense view that there is no action at a distance.

What are they talking about? Certainly not about quantum fields, which do not factorise. Maybe about classical fields? But only those without correlations, because otherwise they don’t factorise either! Or are they thinking about deterministic fields? But then there would be no improvement with respect to the simple version of the theorem! And anyway why do they claim that it is an extrapolation of no action at a distance? They don’t have a derivation to be able to claim such a thing! It is hard for me to understand how anyone could have taken this assumption seriously. If I were allowed to just take some arbitrary technical condition as an assumption I could prove anything I wanted.

Luckily this unsatisfactory situation only lasted one year, as in 1975 Bell managed to find a proper motivation for factorisability, deriving it from his notion of local causality. Informally, it says that causes are close to their effects (my mum is fine with that). A bit more formally, it says that probabilities of events in a spacetime region $A$ depend only on stuff in its past light cone $\Lambda$, and not on stuff in a space-like separated region $B$ (my mum is not so fine with that). So we have

• Local causality:   $p(A|\Lambda,B) = p(A|\Lambda)$.

How do we derive factorisability from that? Start by applying Bayes’ rule
$p(ab|xy\lambda) = p(a|bxy\lambda)p(b|xy\lambda)$
and consider Alice’s probability $p(a|bxy\lambda)$: obtaining an outcome $a$ certainly counts as an event in $A$, and Alice’s setting $x$ and the physical state $\lambda$ certainly count as stuff in $\Lambda$. On the other hand, $b$ and $y$ are clearly stuff in $B$. So we have
$p(a|bxy\lambda) = p(a|x\lambda)$
Doing the analogous reasoning for Bob’s probability $p(b|xy\lambda)$ (and swapping $A$ with $B$ in the definition of local causality) we have
$p(b|xy\lambda) = p(b|y\lambda)$
and substituting this back we get
$p(ab|xy\lambda) = p(a|x\lambda)p(b|y\lambda)$
which is just factorisability.

So there we have it, a perfectly fine derivation of Bell’s theorem, using only two simple and well-motivated assumptions: no conspiracy and local causality. There is no need for the technical assumption of factorisability. Because of this it annoys me to no end when people implicitly conflate factorisability and local causality, or even explicitly state that they are equivalent.

Is there any other way of motivating factorisability, or are we stuck with local causality? A popular way to do it nowadays is through Reichenbach’s principle, which states that if two events A and B are correlated, then either A influences B, B influences A, or there is a common cause C such that
$p(AB|C) = p(A|C)p(B|C)$
It is easy to see that this directly implies factorisability for the Bell scenario.

It is often said that Reichenbach’s principle embodies the idea that correlations cry out for explanations. This is bollocks. It demands the explanation to have a very specific form, namely the factorised one. Why? Why doesn’t an entangled state, for example, count as a valid explanation? If you ask an experimentalist that just did a Bell test, I don’t think she (more precisely Marissa Giustina) will tell you that the correlations came out of nowhere. I bet she will tell you that the correlations are there because she spent years in a cold, damp, dusty basement without phone reception working on the source and the detectors to produce them. Furthermore, the idea that “if the probabilities factorise, you have found the explanation for the correlation” does not actually work.

I think the correct way to deal with Bell correlations is not to throw your hands in the air and claim that they cannot be explained, but to develop a quantum Reichenbach principle to tell which correlations have a quantum explanation and which not. This is currently a hot research topic.

But leaving those grandiose claims aside, is there a good motivation for Reichenbach’s principle? I don’t think so. Reichenbach himself motivated his principle from considerations about entropy and the arrow of time, which simply do not apply to a simple quantum state of two qubits. There may be another motivation other than his original one, but I don’t know of any.

To conclude, as far as I know local causality is really the only way to motivate factorisability. If you don’t like the simple version of Bell’s theorem, you are pretty much stuck with the nonlocal version. But does it also have its problems? Well, the sociological one is its name, which leads to the undying idea in the popular culture that quantum mechanics allows for faster than light signalling or even travelling. But the real one is that it doesn’t allow you to do quantum key distribution based on Bell’s theorem (note that the usual quantum key distribution is based on quantum mechanics itself, and only uses Bell’s theorem as a source of inspiration).

If you use the simple version of Bell’s theorem and believe in no action at a distance, a violation of a Bell inequality implies not only that your outcomes are correlated with Bob’s, but also that they are in principle unpredictable, so you managed to share a secret key with him, which you can use for example for a one-time pad (which raises the question of why don’t Bohmians march in the street against funding for research in QKD). But if you use the nonlocal version of Bell’s theorem and violate a Bell inequality, you only find out that your outcomes are not locally causal – they can still be deterministic and nonlocal.[1]

Update: Rewrote the paragraph about QKD.

## 4 thoughts on “Understanding Bell’s theorem part 2: the nonlocal version”

1. Gláucia Murta says:

Hey Mateus,
Very nice set of posts making explicit the different hypothesis in Bell’s theorem! I really enjoyed reading them.

So, in this comment I’m going to summarize our offline discussions about my objection with respect to your last paragraph.
I disagree (and I guess you also disagree now) with your conclusion that the “nonlocal version” of Bell’s theorem “doesn’t allow you to do quantum key distribution”.
My claim is that the security proofs for key distribution are consistent with both, the simple version and the nonlocal version:
* The key distribution protocols where security relies in quantum mechanics uses the Bell violation to estimate what is the best probability of Eve guessing an outcome of Alice. And the idea is to look for all possible tripartite (Alice, Bob and Eve) quantum boxes such that the marginal of Alice and Bob are consistent with the Bell violation.
The more refined protocols, with better rates, make use of the self-testing properties of the CHSH inequality, and therefore they can estimate how much information a quantum Eve could have, given a set of states consistent with the particular violation.
* There is also protocols that uses the violation of a Bell inequality to state security against a non-signalling eavesdropper. (On that subject I cannot say much) but as far as I understand, even for this case the security still holds independent of your interpretation. Again the security is based on optimizing over all possible tripartite (Alice, Bob and Eve) no signalling boxes such that the marginal of Alice and Bob are consistent with the Bell violation and therefore do not admit a decomposition of the “local” form.

So in summary, the security proofs only assume that given the violation there exist no decomposition of the form $p(ab|xy)=\sum_{\lambda} p(\lambda)p(a|x\lambda)p(b|y\lambda)$ for the probability distribution of Alice and Bob’s outcomes.

Finally I just want to comment that even in the nonlocal version we can still conclude that the outcomes of Alice and Bob “are in principle unpredictable” when one observe a violation. And that is because: the fact that they are local causally correlated also implies that, in principle, there could exist a local causal decomposition of your probabilities distribution where, given the knowledge of “lambda”, the outcomes of Alice and Bob are deterministic. Therefore the violation is witnessing that there exist no such a decomposition.

Cheers!

2. Mateus Araújo says:

Thanks for the comment, Gláucia. I did have in mind QKD against no-signalling adversaries, precisely because the normal QKD against quantum adversaries simply assumes the validity of quantum mechanics, which makes Bell’s theorem just an inspiration for them, not a proof of security. But I am guilty of not making this clear =)

But I don’t quite agree with you about QKD against no-signalling adversaries. As you noticed, their proofs (e.g. in arXiv:quant-ph/0405101 or arXiv:0807.2158) only assume that the joint probability distribution of Alice, Bob, and Eve $p(abe|xyz)$ is non-signalling, and prove that if the marginal $p(ab|xy)$ violates a Bell inequality then $e$ cannot be perfectly correlated with $a$. So they don’t need to talk directly about Bell’s theorem or its assumptions. In this sense, you are right: QKD is going to work regardless of your interpretation of Bell’s theorem.

But there is still the question of explaining why QKD works. In the case where there is no violation of a Bell inequality, the situation is clear: the probabilities can be written as
$p(abe|xyz) = \sum_\lambda p(\lambda)p(a|x\lambda)p(b|y\lambda)p(e|z\lambda)$
for deterministic $p(a|x\lambda)$, $p(b|y\lambda)$, and $p(e|z\lambda)$, so Eve can know everything. She just needs to set $e=\lambda$ and wait until Alice and Bob announce their inputs $x$ and $y$ (they need to do this in order to do QKD). Since she built the boxes she knows the functions $p(a|x\lambda)$ and $p(b|y\lambda)$, so she can just calculate all $a$ and $b$ and have their key.

In the case where Alice and Bob do violate a Bell inequality, the probabilities can still be written as
$p(abe|xyz) = \sum_\lambda p(\lambda)p(a|xyz\lambda)p(b|xyz\lambda)p(e|xyz\lambda),$
for deterministic $p(a|xyz\lambda)$, $p(b|xyz\lambda)$, and $p(e|xyz\lambda)$, so a Bohmian Eve can still know everything using the same strategy as before. To rule that out, you need to assume that it is not possible that $a$ depends on $y$ and $b$ depends on $x$. The nonlocal version of Bell’s theorem does not help with that, as it can only say that these probabilities are not locally causal. The simple version, on the other hand, has the assumption no action at a distance ready to do the job. You can insist that it is true, and in this case there will be simply no deterministic decomposition for $p(abe|xyz)$.

In any case, I have some rewriting to do in the post. Cheers!

3. Gláucia Murta says:

Hey Mateus,

Thanks for the reply.
Oh see your point about “explaining why QKD works” using a Bell violation. And I agree with you that the negation of Bell’s theorem in the Bohmnian interpretation and in the nonlocal version do not allow us to rule out that Eve could have information about the key.
And if for the simple version we consider violation+non-action at a distance then we could rule out the deterministic box you’ve constructed.
However I disagree we can take non-action at a distance for granted in the simple version. The simple version shows that assuming determinism + no action at a distance implies the Bell theorem. Therefore violating a Bell inequality just tell we cannot have these 2 assumptions at the same time. So I would say that by itself the simple version also do not lead to qkd, unless we add the extra assumption that no-action at a distance (no-signalling) holds even when violating a Bell inequality. But then why not add this extra assumption also for the nonlocal version?

Cheers!

4. Mateus Araújo says:

Indeed, one does need to assume no action at a distance to have QKD in the simple version, and then one might as well also assume it in the nonlocal version. In fact, most people I know do believe that local causality is false, but no action at a distance is true.

But I don’t think this makes much sense as a theorem, as no action at a distance is a strictly weaker assumption than local causality. I have never seen a theorem being proved from assumptions A and B, where A implies B.

Also sociologically, the authors that prefer to present the nonlocal version of Bell’s theorem usually say that the conclusion is that “the world is nonlocal”. It is just weird to say that you conclude that the world is nonlocal, but actually not so nonlocal that you violate no action at a distance, so in fact the world is also not deterministic.