To continue with the series of “public service” posts, I will write the presentation of Bell’s theorem that I would like to have read when I was learning it. My reaction at the time was, I believe, similar to most students’: what the fuck am I reading? And my attempts to search the literature to understand what was going on only made my bewilderement worse, as the papers disagree about what are the assumptions in Bell’s theorem, what are the names of the assumptions, what is the conclusion we should take from Bell’s theorem, and even what Bell’s theorem even is! Given this widespread confusion, it is no wonder that so many crackpots obsess about it!

This is the first of a series of three posts about several versions of Bell’s theorem. I’m starting with what I believe is by consensus the simplest version: the one proved by Clauser, Horne, Shimony, and Holt in 1969, based on Bell’s original version from 1964.

The theorem is about explaining the statistics observed by two experimenters, Alice and Bob, that are making measurements on some physical system in a space-like separated way. The details of their experiment are not important for the theorem (of course, they are important for actually doing the experiment). What is important is that each experimenter has two possible settings, named 0 and 1, and for each setting the measurement has two possible outcomes, again named 0 and 1.

Of course it is not actually possible to have only two settings in a real experiment: usually the measurement depends on a continuous parameter, like the angle with which you set a wave plate, or the phase of the laser with which you hit an ion, and you are only able to set this continuous parameter with finite precision. But this is not a problem, as we only need to define in advance that “this angle corresponds to setting 0” and “this angle corresponds to setting 1”. If the angles are not a good approximation to the ideal settings you are just going to get bad statistics.

Analogously, it is also not actually possible to have only two outcomes for each measurement, most commonly because you lost a photon and no detector clicked, but also because you can have multiple detections, or you might be doing a measurement on a continuous variable, like position. Again, the important thing is that you define in advance which outcomes correspond to the 0 outcome, and which outcomes correspond to the 1 outcome. Indeed, this is exactly what was done in the recent loophole-free Bell tests: they defined the no-detection outcome to correspond to the outcome 1.

Having their settings and outcomes defined like this, our experimenters measure some conditional probabilities $p(ab|xy)$, where $a,b$ are Alice and Bob’s outcomes, and $x,y$ are their settings. Now they want to explain these correlations. How did they come about? Well, they obtained them by measuring some physical system $\lambda$ (that can be a quantum state, or something more exotic like a Bohmian corpuscle) that they did not have complete control over, so it is reasonable to write the probabilities as arising from an averaging over different values of $\lambda$. So they decompose the probabilities as

\[ p(ab|xy) = \sum_\lambda p(\lambda|xy)p(ab|xy\lambda) \]

Note that this is not an assumption, just a mathematical identity. If you are an experimental superhero and can really make your source emit the same quantum state in every single round of the experiment you just get a trivial decomposition with a single $\lambda$ (incidentally, by Caratheodory’s theorem one needs only 13 different $\lambda$s to write this decomposition, so the use of integrals over $\lambda$ in some proofs of Bell’s theorem is rather overkill).

The first assumption that we use in the proof is that the physical system $\lambda$ is not correlated with the settings $x$ and $y$, that is $p(\lambda|xy) = p(\lambda)$. I think this assumption is necessary to even do science, because if it were not possible to probe a physical system independently of its state, we couldn’t hope to be able to learn what its actual state is. It would be like trying to find a correlation between smoking and cancer when your sample of patients is chosen by a tobacco company. This assumption is variously called “freedom of choice”, “no superdeterminism”, or “no conspiracy”. I think “freedom of choice” is a really bad name, as in actual experiments nobody *chooses* the settings: instead they are determined by a quantum random number generator or by the bit string of “Doctor Who”. As for “no superdeterminism”, I think the name is rather confusing, as the assumption has nothing to do with determinism — it is possible to respect it in a deterministic theory, and it is possible to violate it in a indeterministic theory. Instead I’ll go with “no conspiracy”:

- No conspiracy: $p(\lambda|xy) = p(\lambda)$.

With this assumption the decomposition of the probabilities simplifies to

\[ p(ab|xy) = \sum_\lambda p(\lambda)p(ab|xy\lambda) \]

The second assumption that we’ll use is that the outcomes $a$ and $b$ are deterministic functions of the settings $x$ and $y$ and the physical system $\lambda$. This assumption is motivated by the age-old idea that the indeterminism we see in quantum mechanics is only a result of our ignorance about the physical system we are measuring, and that as soon as we have a complete specification of it — given by $\lambda$ — the probabilities would disappear from consideration and a deterministic theory would be recovered. This assumption is often called “realism”. I find this name incredibly stupid. Are the authors that use them really saying that they cannot conceive of an objective reality that is not deterministic? And that such a complex concept such as realism reduces to merely determinism? And furthermore they are blissfully ignoring the existece of collapse models, which are realistic but fundamentally indeterministic. As far as I know the name realism was coined by Bernard d’Espagnat in a Scientific American article from 1979, and since them it caught on. Maybe people liked it because Einstein, Podolsky and Rosen defended that a deterministic quantity is for sure real (but they did not claim that indeterministic quantities are not real), I don’t know. But I refuse to use it, I’ll go with the very straightforward and neutral name “determinism”.

- Determinism: $p(ab|xy\lambda) \in \{0,1\}$.

An immediate consequence of this assumption is that $p(ab|xy\lambda) = p(a|xy\lambda)p(b|xy\lambda)$ and therefore that the decomposition of $p(ab|xy)$ becomes

\[ p(ab|xy) = \sum_\lambda p(\lambda)p(a|xy\lambda)p(b|xy\lambda) \]

The last assumption we’ll need is that the probabilities that Alice sees do not depend on which setting Bob used for his measurement, i.e., that $p(a|xy\lambda) = p(a|x\lambda)$. The motivation for it is that since the measurements are made in a space-like separated way, a signal would have to travel from Bob’s lab to Alice’s faster than light in order to influence her result. Relativity does not like it, but does not outright forbid it either, if you are ok with having a preferred reference frame (I’m not). Even before the discovery of relativity Newton already found such action at a distance rather distasteful:

It is inconceivable that inanimate Matter should, without the Mediation of something else, which is not material, operate upon, and affect other matter without mutual Contact… That Gravity should be innate, inherent and essential to Matter, so that one body may act upon another at a distance thro’ a Vacuum, without the Mediation of any thing else, by and through which their Action and Force may be conveyed from one to another, is to me so great an Absurdity that I believe no Man who has in philosophical Matters a competent Faculty of thinking can ever fall into it.

Without using such eloquence, my own worry is that giving up on this would put into question how can we ever isolate a system in order to do measurements on it whose result does not depend on the state of the rest of universe.

This assumption was called in the literature “locality”, “no signalling”, and “no action at a distance”. My only beef with “locality” is that this word is overused, so nobody really knows what it means; “no signalling”, on the other hand is just bad, as the best example we have of a theory that violates this assumption — Bohmian mechanics — does not actually let us signal with it. I’ll go again for the more neutral word and stick with “no action at a distance”.

- No action at a distance: $p(a|xy\lambda) = p(a|x\lambda)$ and $p(b|xy\lambda) = p(b|y\lambda)$.

With this assumption we have the final decomposition of the conditional probabilities as

\[ p(ab|xy) = \sum_\lambda p(\lambda)p(a|x\lambda)p(b|y\lambda) \]

This is what we need to prove a Bell inequality. Consider the sum of probabilities

\begin{multline*}

p_\text{succ} = \frac14\Big(p(00|00) + p(11|00) + p(00|01) + p(11|01) \\ p(00|10) + p(11|10) + p(01|11) + p(10|11)\Big)

\end{multline*}

This can be interpreted as the probability of success in a game where Alice and Bob receive inputs $x$ and $y$ from a referee, and must return equal outputs if the inputs are 00, 01, or 10, and must return different outputs if the inputs are 11.

We want to prove an upper bound to $p_\text{succ}$ from the decomposition of the conditional probabilities derived above. First we rewrite it as

\[ p_\text{succ} = \sum_{abxy} M^{ab}_{xy} p(ab|xy) = \sum_{abxy} \sum_\lambda M^{ab}_{xy} p(\lambda)p(a|x\lambda)p(b|y\lambda) \]

where $M^{ab}_{xy} = \frac14\delta_{a\oplus b,xy}$ are the coefficients defined by the above sum of probabilities. Note now that

\[ p_\text{succ} \le \max_\lambda \sum_{abxy} M^{ab}_{xy} p(a|x\lambda)p(b|y\lambda) \]

as the convex combination over $\lambda$ can only reduce the value of $p_\text{succ}$. And since the functions $p(a|x\lambda)$ and $p(b|y\lambda)$ are assumed to be deterministic, there can only be a finite number of them (in fact 4 different functions for Alice and 4 for Bob), so we can do the maximization over $\lambda$ simply by trying all 16 possibilities. Doing that, we see that

\[p_\text{succ} \le \frac34\]

for theories that obey *no conspiracy*, *determinism*, and *no action at a distance*. This is the famous CHSH inequality.

On the other hand, according to quantum mechanics it is possible to obtain

\[p_\text{succ} = \frac{2 + \sqrt2}{4}\]

and a violation of the bound $3/4$ was observed experimentally, so at least one of the three assumptions behind the theorem must be false. Which one?

Mateus, I applaud your initiative. Now if you allow me some criticism: I think that your explanation was crystal clear (for starting undergraduates, say) up to almost the very end, but things got out of rail with p_suck;-) I think that you rushed too much. To be honest, if one gets a feeling of your last 3 equations by eye inspection enough to agree that p_succ must be smaller than 3/4 while in quantum mechanics you get the other result, then I don’t believe such person would need your paused, motivating initial discussion at all.

Thanks for the criticism, masmadera. Indeed, the derivation of the bound 3/4 was too rushed, and I rewrote the end of the post to make it more clear. I had done it this way because in my experience people always have trouble with the conceptual part, not with the mathematical part. But I should just have given the complete derivation to start with.

As for the quantum mechanical part, I intentionally left it out, as the post is about Bell’s theorem, not quantum mechanics =)

Thanks for the update!

Thank you so much for this article. I am an interested amateur with an academic background in math + CS trying to learn about this topic. Yours is the most satisfying treatment I’ve read to date.

Could you explain more about why determinism (p(ab|xyλ) is 0 or 1) implies p(ab|xyλ) = p(a|xyλ)p(b|xyλ) ? For example, what if p(ab|xyλ) = 0 but p(a|xyλ) = 1, p(b|xyλ) = 1? I’m sure there is something obvious I’m missing here.

Ah, I think I understand now. Determinism implies the “factorizability” condition mentioned in http://mateusaraujo.info/2016/07/21/understanding-bells-theorem-part-2-the-nonlocal-version/.

But why do you say that “at least one of the three assumptions behind the theorem must be false”? As far as I understand, and based on your description in the second blog post:

– we can give up “no conspiracy” and this gives us a model which explains the results. The model is “when we changed the settings x and y, we changed the physical system λ, so of course we see different things for different settings”.

– we can give up “no action at a distance” and this gives us a model which explains the results. The model is “the particles communicate instantaneously — when the first is measured, it tells the other one what it was measured as.”

– but if we give up “determinism” this is actually not enough to explain the results. As you say, “This implies that just giving up on determinism does not allow you to violate Bell inequalities”.

Hi Eli,

Thanks for your comments. About determinism implying factorisability, this is a theorem. Your counterexample doesn’t work because if $p(ab|xy) = 0$ then the marginal $p(a|xy)$ cannot be one, as it is defined as $\sum_{b’} p(ab’|xy)$. To prove it, you need to show that if $p(ab|xy)$ is always either zero or one then \[p(ab|xy) = \Big( \sum_{b’} p(ab’|xy) \Big)\Big( \sum_{a’} p(a’b|xy) \Big).\] It’s not hard.

About my sentence “at least one of the three assumptions behind the theorem must be false”, this is just a logical statement. The theorem shows that $A \land B \land C$ is false, so $\lnot(A \land B \land C) = \lnot A \lor \lnot B \lor \lnot C$ is true, that is, at least one of the three assumptions must be false.

Maybe what I wrote in the following post is confusing. One can have a model that respects

no conspiracyandno action at a distancethat does in fact violate Bell inequalities, so in this sense itisenough to give up determinism to explain the results. The problem is that this model must violatefactorisabilityand thereforelocal causality, so we can’t just give up determinism and live happily ever after; we also need to give up something precious, namelylocal causality. This is what I meant with “just giving up on determinism does not allow you to violate Bell inequalities”, I hope it is clear now.Thanks for the response! I much appreciate it.

Re: the first part — I see my trivial error now.

Re: the second — ah, that’s very helpful. Yes, that wording was confusing me. To summarize my understanding now: “determinism” is a strong condition which, in the simple version of Bells’ Theorem, is used in conjunction with “no action at a distance” to get “factorisability”. Giving it up does allow you to violate the inequality, but the resulting model violates “local causality”.

I have two further questions, if you have further time:

– Is there an intuitive description of what a model that respects only “no conspiracy” and “no action at a distance” would look like? (Or is this simply quantum mechanics, with entanglement and so on? Forgive my question: I am learning about Bell’s Theorem without knowing the rest of QM. I think the confusion may stem from the concepts of “entanglement” and “action at a distance” looking similar.)

– To my layman’s eyes, the condition of “action at a distance” looks weaker than it needs to be. You have $p(a|xyλ) = p(a|xλ)$ and $p(b|xyλ) = p(b|yλ)$, i.e. Alice’s measurement does not depend on Bob’s settings and vice versa.

Yet if you had written the full “factorisability” condition here $p(ab|xyλ) = p(a|xλ)p(b|yλ)$ in place of what you had written for “action at a distance”, I would not have batted an eye — it seems like e.g. Newton’s paragraph above would demand that, not only does Alice’s measurement not depend on Bob’s settings and vice versa, but those measurements are independent since they are causally separated.

Why would a physicist not consider this to follow, that “no action at a distance” should demand that Alice and Bob’s measurements are independent?

1 – Is there an intuitive description of what a model that respects only “no conspiracy” and “no action at a distance” would look like?

Yes, this is simply quantum mechanics. Actually, this question is a minefield: everybody disagrees about what “quantum mechanics” is, and the answer depends on which interpretation of quantum mechanics one takes. But if you don’t wanna go into that, what most people mean by “quantum mechanics” is indeed such a model.

2 – Why would a physicist not consider this to follow, that “no action at a distance” should demand that Alice and Bob’s measurements are independent?

Well, “a physicist” might; there has long been debate about what “no action at a distance” – or locality – should mean. Bell was on your side: he thought that the best way to formalise locality was through “local causality”, which directly implies “factorasibility”. Many people support “no action at a distance” not because they think it is strong enough, but rather because the only locality condition that can be salvaged from Bell’s theorem. Why defend “local causality” as the correct formalisation of locality, when we know it is false?

My own opinion is that “local causality” is the way to go – and we can indeed salvage it, in the Many-Worlds version of Bell’s theorem.

Thank you! I think I may have a robust understanding at last. (We’ll see after I try to explain to my friend I’m studying with.)

In particular the insight that definitions of what “no action at a distance” should mean are in dialogue with QM, not fully formed things that are contrasted against it, is helpful.

You’re welcome. You might want to take a look at Bell’s paper “La Nouvelle Cuisine”, it has an enlightening discussion of several concepts of locality.

I am an “engineering type” (not a physics guy). Thanks for attempting to explain Bell’s Theorem as “you would have liked to have heard it when you were first searching for understanding”…..I was following very nicely unil I got to the paragraph below:

“Having their settings and outcomes defined like this, our experimenters measure some conditional probabilities p(ab|xy), where a,b are Alice and Bob’s outcomes, and x,y are their settings. Now they want to explain these correlations. How did they come about? Well, they obtained them by measuring some physical system λ (that can be a quantum state, or something more exotic like a Bohmian corpuscle) that they did not have complete control over, so it is reasonable to write the probabilities as arising from an averaging over different values of λ. So they decompose the probabilities as

p(ab|xy)=∑λp(λ|xy)p(ab|xyλ)”

It wasn’t the conditional probabilities that threw me (ie. p(ab|xy) is meaningful to me), but rather the meaning of “p(λ|xy)”….the Probability of the “System”, given “xy” has occurred? I have no idea what this means? Any help?

Thanks Dave

Hi Dave,

The thing is that the system $\lambda$ doesn’t need to be fixed, and in practice it never is. In quantum mechanics $\lambda$ would be the quantum state produced by (e.g) a laser, but lasers fluctuate, fibers fall out of alignment, the atmosphere interferes, etc., so that in each round of the experiment a slightly different quantum state comes.

It is also conceivable that the changes of the quantum state depend on the settings $x,y$, which would give us a nontrivial $p(\lambda|xy)$. But, as I argue in the “no conspiracy” assumption, it is extremely implausible.

Mateus, I appreciate your help – thanks so much.

If I am I interpreting your symbology correctly when I state (?):

“p(ab|xy)” = the “probability of a and b occurring jointly, given that x and y has occurred”,

Then, how might one similarly “finish the line” starting with:

“p(λ|xy)” = …….?

Dave

You’re welcome. $p(\lambda|xy)$ is the probability of the physical system being in state $\lambda$, given that the settings take the values $x$ and $y$.

For example, if you’re modelling quantum mechanics, $\lambda$ can be the state $\ket{\psi^-}$ in the ideal case, or some noisy version of it $\ket{\psi^-}’$ if some trouble happens, and you’re considering the probability that the quantum state is either of those.

OK, that helps. As I progressed down through your analytical formulation of the contents of your “Truth Table”, I hit a barrier in relating “ab” and “xy” upon he introduction of parameter “λ” …..I see now that those familiar with QD formulations would likely have followed along fine.

I am trying to identify an simple elemental experiment that “raises a flag” that Entanglement could be going on.

I appreciate your help.

Dave

I’m sorry, my truth table? I don’t really understand what you are asking.

Whoops Sorry! I guess I was envisioning a Truth table in my mind as you hypothesized Alice and Bob’s findings (given their experiments’ settings x and y).

I originally was more or less in the EPR camp, and was trying to understand how the EPR argument was being refuted, all of which ended me up in Bell’s world :-)

I have also been attempting to identify a reasonably straight-forward experiment that FIRST raised flags that Entanglement could be in operation. If you can point me in a direction to identify such, I would appreciate it :-)

I apologize for all the “Neophyte Noise” I have injected in this stream.

Dave

No problem.

Long story short, EPR showed that quantum mechanics is nonlocal, and assumed that completing it with hidden variables would make it local. Bell then showed that nope, hidden variables do not help; any local hidden variable theory is in contradiction with quantum mechanics. In order to be compatible with quantum mechanics one needs to make the hidden variables nonlocal, which defeats the whole purpose of the thing.

About entanglement, I don’t know who was the first one that noticed it in an experiment; it was already implicit in the theory, and is exactly what EPR used in their argument. Right afterwards Schrödinger realised that entanglement is what made EPR’s argument tick, and invented the word. I guess the first experimental explorations of entanglement were precisely those prompted by Bell’s inequality, so the first one would be Freedman and Clauser’s experiment in 1972.

Nice summary Mateus…..thanks for indulging me.

I found it fascinating that Bell envisioned that “QD Entanglement” and “Hidden Variables” paradigms would actually manifest themselves differently in experimental results, and indeed quantified that difference analytically. I was just trying to heuristically internalize the meaning of that.

Was EPR inspired in response to the fact that Entanglement was an inherent part of QD theory, or, was it inspired by actual experimental findings?

Dave

You’re welcome.

I also find Bell’s insight fascinating, and I think the fame he got from it is well-deserved. His motivation for doing so is interesting: he was fascinated by Bohmian mechanics, the hidden-variable theory that was thought to be impossible, and was investigating the contradiction between its brutal nonlocality and EPR’s assertion that on the contrary, hidden variables should solve the nonlocality problem. So he decided to see if hidden variables could actually solve the nonlocality problem, and surprise surprise, they cannot.

About EPR: they were not directly inspired by any experiment, just by theory. Since quantum mechanics was formalised in the mid-twenties it was obvious that superpositions are essential for it, and already in the Solvay conference in 1927 Einstein presented a nonlocality argument based only on simple superpositions. It was much less dramatic than the EPR argument, though, and got much less attention.

Thanks for your insights here. I am finding that my understanding of this stuff is painfully incremental and contingent upon being exposed to different insights as well as repeated scrutiny of the same material – I will have to re-read your kickoff -section of this blog many more times, as I notice that every time I go over it, some new aspect and understanding pops out – now that’s what I call valuable!

Dave