## Implementing impossible functions via tomography

A long time ago I had a conversation with Daniel Nagaj about the following problem: given access to an unknown unitary gate $U$, is it possible to build the gate control-$U$ out of it? Nagaj argued that it was impossible, because $U$ and $e^{i\alpha}U$ are physically equivalent for any phase $\alpha$, but control-$U$ and control-$e^{i\alpha}U$ are physically distinct. This is true, but it felt to me too cheap: the global phase $\alpha$ is not even defined, so of course it is not possible to make it appear in the control. Moreover, why would anyone want to have this ghostly phase in the controlled unitary? The more meaningful question is whether is it possible to implement control-$e^{i\phi(U)}U$ for any phase $\phi(U)$.

A quick search through the literature revealed that nobody had answered this question, so I proved it was impossible and wrote a short and simple paper about it. There’s also the plot twist that implementing control-$U$ is often easy in practice, as any realistic implementation of an unknown unitary will have some extra information that makes it possible, but that’s another story.

In any case, at that time I believed that if one is given a large number of copies of $U$, one could just do process tomography on it, and thus easily implement control-$e^{i\phi(U)}U$. Luckily I didn’t write that down anywhere, because a couple of weeks ago Gavorová et al. proved that it is not possible to implement control-$e^{i\phi(U)}U$ for any number of copies of $U$, so in particular even tomography cannot do it.

Now how could this possibly be true? One can always do tomography, right? And if you did tomography, you can implement any function you want of the thing you tomographed, right? It turns out it is not so simple. If you successfully tomograph $U$, what you end up with is with a unitary $U’$ such that $\|\mathfrak{C}(U’) – \mathfrak{C}(U)\| < \epsilon$, where $\mathfrak{C}(U)$ is the Choi-Jamiołkowski representation of $U$. This complication is unfortunately necessary, as $U$ is not physically observable, only $\mathfrak{C}(U)$ is, for the very reason that $\mathfrak{C}(e^{i\alpha}U)$ does not depend on the global phase $\alpha$. It is just not true that $\|U’-U\| < \epsilon$, because we would need to tomograph the global phase to make it happen.

Ok, great, we have tomography, how about the function then? We want that $\|f\big(\mathfrak{C}(U’)\big)-f\big(\mathfrak{C}(U)\big)\| < \delta$, but this only follows from $\|\mathfrak{C}(U’) – \mathfrak{C}(U)\| < \epsilon$ if $f$ is continuous! If $f$ is not continuous then by definition there will exist a pair $U,U’$ such that $\mathfrak{C}(U’),\mathfrak{C}(U)$ are arbitrary close but $f\big(\mathfrak{C}(U’)\big),f\big(\mathfrak{C}(U)\big)$ are not. And Gavorová et al. showed precisely that if $f$ implements the controlled unitary it must be discontinuous.

The source of the discontinuity is not really the controlling, but just the killing off the global phase, so the seemingly simpler task of producing a unitary that does not depend on the global phase is equally impossible.

To be more precise, let $f$ be a function that takes $\mathfrak{C}(U)$ to a unitary $V$ such that $\mathfrak{C}(U) = \mathfrak{C}(V)$. Or equivalently, let $g$ be a function such that $g(U) = g(e^{i\alpha}U)$ for all $\alpha$ and $\mathfrak{C}\big(g(U)\big) = \mathfrak{C}(U)$. Then it is impossible to tomograph $U$ and produce $U’$ such that $\|f\big(\mathfrak{C}(U’)\big)-f\big(\mathfrak{C}(U)\big)\| < \delta$, or that $\|g(U’)-g(U)\| < \delta$.

To see why, let’s consider a specific example of such a function, $g(U) = \frac{u^*}{|u|}U$, where $u$ is the first nonzero element of the first column of $U$. Let
$U = \begin{pmatrix} \sqrt{p} & -\sqrt{1-p}\ e^{i\beta} \\ \sqrt{1-p}\ e^{i\alpha} & \sqrt{p}\ e^{i(\alpha+\beta)} \end{pmatrix}$ be an arbitrary unitary of dimension $2$ with a global phase already removed, so that for $p > 0$ we have $g(U) = U$, and for $p=0$ we have
$g(U) = \begin{pmatrix} 0 & -e^{i(\beta-\alpha)} \\ 1 & 0 \end{pmatrix}.$Suppose that $p >0$, $\alpha=\pi$, and $\beta=0$, so that
$U = \begin{pmatrix} \sqrt{p} & -\sqrt{1-p} \\ -\sqrt{1-p} & -\sqrt{p}\end{pmatrix}.$ Now if $p$ is small enough, this is very close to
$U’ = \begin{pmatrix} 0 & -1 \\ -1 & 0\end{pmatrix},$ so tomography might very well return us that instead. The problem is that
$g(U’) = \begin{pmatrix} 0 & 1 \\ 1 & 0\end{pmatrix},$ which is as far from $g(U)$ as it gets.

You might object that the probability of obtaining such $U’$ is extremely small; it is given roughly1 by the probability of measuring the state $U|0\rangle$ in the computational basis and always obtaining outcome 1, which is $(1-p)^n$. It is way too high for small $p$! In particular for fixed $n$, there is always a $p$ such that this probability is higher than any failure probability $f$ that we accept.

This breaks the fundamental promise of tomograhy: that for any error tolerance $\epsilon > 0$ and failure probability $f > 0$ you just need to measure $n$ copies of $U$ in order to obtain an estimate $U’$ that with probability $1-f$ is $\epsilon$-close to $U$. Sure, for any $U$ there will be an $n$ such that tomography works, but this doesn’t help us since we don’t know what $U$ is before we tomographed it!

Ok, ok, so tomography cannot converge to $g(U)$; it does converge instead to $g(U)$ modulo some random phase. Doesn’t seem so dramatic, does it? Since you didn’t care about the global phase to start with, why should you worry about whether it is random or a function of $U$? Well, yes, if you just do tomography it doesn’t matter, but perhaps you have some other, efficient protocol that does the control modulo a random phase. Now in every iteration of the protocol you might get a different phase, and if you just ignore this you’re going to lose coherence.

## Explicitly local quantum mechanics

I’ve just attended a nice online talk by Charles Bédard about his somewhat recent paper, where he studies the explicitly local model of quantum mechanics developed by Deutsch and Hayden, with a twist by Raymond-Robichaud. I think it’s a neat idea encumbered by horrible notation, so I’ll try to expose it in a more intelligible way here.

First of all, what is the problem? Quantum mechanics is local. Very local in fact, if you look at QFT. It’s Lorentz covariant, by construction. Non-relativistic quantum mechanics is not very local, as it is still non-relativistic, but still much more local than people usually think: the only outrageously nonlocal thing about it is the collapse of the wavefunction, but everybody knows that it is not real. Even Bell’s theorem doesn’t imply nonlocality, as long as you don’t pretend there is a wavefunction collapse happening.

Nevertheless, quantum states seem rather nonlocal. If Alice and Bob share an entangled state, and Alice acts locally on it, the quantum state changes in a seemingly global way, oblivious to the locality of her action. Furthermore, you can’t separate it in Alice’s part of the state and Bob’s part of the state, it only exists as an indivisible whole. Sure, you can define reduced density matrices for Alice and for Bob, but in this way you lose a lot of information, the quantum state is much more than the union of these parts.2

What Deutsch and Hayden showed is that it doesn’t need to be like this. It is possible to define a kind of quantum state that you can separate in Alice’s part and Bob’s part, in such a way that the global quantum state is just the union of both parts, and they both evolve in a perfectly local way: whatever Alice does cannot change Bob’s part and vice-versa. I’ll call them ontic states, because perfectly local states seem perfectly real.

The key idea is to define an ontic state based on the dynamics of quantum mechanics, since they are local: instead of focusing on a quantum state $\ket{\psi}$ we focus on the unitary $U$ such that
$\ket{\psi} = U\ket{0},$where $\ket{0}$ is a standard fixed state, like how on quantum circuits we usually assume that all qubits are initialized on the $\ket{0}$ state. The ontic state of the global system is then just a Choi-Jamiołkowski representation of $U$2
$\lambda(U) := \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger \ket{j}\bra{i} U.$So far so boring. Things get interesting when we represent the ontic states of the parts of the quantum system. First we rewrite the global ontic state explicitly as a composite system:
$\lambda^{AB}(U) := \sum_{ijkl} \ket{i}\bra{j} \otimes \ket{k}\bra{l} \otimes U^\dagger \big( \ket{j}\bra{i} \otimes \ket{l}\bra{k} \big)U,$ and we name the tensor factors as $A_1,B_1,A_2,B_2$, in that order. The ontic state of Alice’s part is then
$\lambda^{A}(U) := \tr_{B_1} \lambda^{AB}(U) = \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger \big(\ket{j}\bra{i} \otimes \id\big) U,$ and the ontic state of Bob’s part is
$\lambda^{B}(U) := \tr_{A_1} \lambda^{AB}(U) = \sum_{kl} \ket{k}\bra{l} \otimes U^\dagger \big( \id \otimes \ket{l}\bra{k}\big) U.$First we show that the ontic states of the parts do evolve locally: if Bob applies an unitary $V$ to his part of the system, the global unitary is now $U’ = (\id \otimes V) U$. Alice’s ontic state is
\begin{align*}
\lambda^{A}(U’) &= \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger (\id \otimes V^\dagger)\big(\ket{j}\bra{i} \otimes \id\big) (\id \otimes V)U \\
&= \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger \big(\ket{j}\bra{i} \otimes V^\dagger V \big) U \\
&= \sum_{ij} \ket{i}\bra{j} \otimes U^\dagger \big(\ket{j}\bra{i} \otimes \id \big) U \\
&=\lambda^{A}(U),
\end{align*} so Bob cannot change it, whatever he does.

Now we show that the ontic state of the global system is just the union of the ontic states of the parts. In fact it’s just the product of them, with some strategically placed indentities:
$\lambda^{AB}(U) = \big[\lambda^{A}(U)\otimes \id_{B_1}\big]\big[\lambda^{B}(U)\otimes \id_{A_1}\big].$ That’s it. A demonstration in a couple of lines of something that was widely regarded as impossible. Neat, no?

We should also remark that these ontic states do give you back the usual quantum states. The global quantum state is
$\rho^{AB} := \bra{0}_{A_2B_2} \lambda^{AB}(U) \ket{0}_{A_2B_2} = U\ket{0}\bra{0}U^\dagger,$ and the reduced density matrices are given by the analogous formulas:
$\rho^{A} := \bra{0}_{A_2B_2} \lambda^{A}(U) \ket{0}_{A_2B_2} = \tr_B\big(U\ket{0}\bra{0}U^\dagger\big),$ and
$\rho^{B} := \bra{0}_{A_2B_2} \lambda^{B}(U) \ket{0}_{A_2B_2} = \tr_A\big(U\ket{0}\bra{0}U^\dagger\big).$

One might wonder if it is possible to do a similar trick using just the quantum state $\ket{\psi} = U\ket{0}$ instead of the unitary $U$ that produces it. As Deutsch and Hayden argue (and Raymond-Robichaud gives a formal proof), the answer is no. The crucial problem is that $(X \otimes \id)\ket{\phi^+} = (\id \otimes X)\ket{\phi^+}$. That is, Alice can act on a quantum state in a completely local way, but you can’t tell from looking at resulting state whether it was changed by Alice or by Bob. This never happens if we look at $U$ instead: if $(S\otimes \id)U = (\id \otimes T)U$ then $S$ and $T$ must be just identity with a phase.

With the introduction done, we can speak about Bédard’s talk itself. He was interested in the number of parameters that are necessary to describe the ontic states. He showed that global ontic state of $n$ quantum systems of dimension $d$ requires $d^{2n}-1$ parameters. This might be obvious from the Choi-Jamiołkowski representation, as it is just the number of parameters of the global unitary $U$ (the global phase doesn’t count), but it was not obvious from the Deutsch-Hayden notation he was working with. More interestingly, he showed that the ontic state of a single quantum system needs $(1-d^{-2})d^{2n}$ parameters. Proving this is difficult in either notation, but intuitively we can expect this number to be true, since the ontic state of a single system does not depend on unitaries that act on the other systems. Since these have $d^{2n-2}-1$ parameters, we subtract this from the number of parameters of the global unitary and voilà!

Now, the dimension of the ontic state of a single system grows exponentially with the number of systems. This might seem strange, but Bédard had a good argument about why this is actually necessary: the quantum state of $n$ systems of dimension $d$ already needs $2d^n-2$ parameters. Since we want the local ontic states to recover at least the global quantum state, if follows that each must have at least $\approx d^n/n$ parameters, if we assume they are uniformly spread.

There was a lively discussion after the talk about Wallace and Timpson’s criticism of the Deutsch-Hayden model. They notice that for any unitary W that acts on the whole Hilbert space except $\ket{0}$, the unitary $U(1\oplus W)$ will give rise to the same quantum states, and therefore to the same observations, but in general different ontic states. They argue that this $W$ is akin to a gauge freedom, and we should identify all these different ontic states, lest we suffer from indeterminism and (ontic) state underdetermination. If we do that, however, we end up back with just regular quantum states and their nonlocality, making the Deutsch-Hayden model completely pointless.

Wallace and Timpson missed a very important point, however. Doing quantum state tomography is not the only way to determine $U$; this would be a hopelessly positivist point of view. No, determining $U$ is determining what are the dynamics of our theory. Doing this is what science is all about, and we use all tools we can to do that, more importantly assuming that the fundamental dynamics do not change with time, and that we can reproduce experiments. Once we know that fundamental dynamics of a (closed) system, we know what $U$ is, there is no underdetermination; the fact that $U(1\oplus W)$ would also give rise to the same observations is not relevant, it’s just a fun fact. The transformation from $U$ to $U(1\oplus W)$ is not a gauge transformation, it is a transformation from the real dynamics to some incorrect dynamics.

Posted in Uncategorised | 6 Comments

## Why QBism is completely empty

A good friend of mine, Jacques Pienaar, has recently converted to QBism, as often happens to people that spend too much time around Chris Fuchs. Saddened by these news, I’ve decided to write a blog post explaining why QBism doesn’t contribute anything to understanding Nature. On the contrary, it is a step backwards, as it doesn’t accept the basic premise that there is a world out there and that our job is to understand it. QBism insists that all the puzzles and paradoxes of quantum mechanics don’t actually matter because it’s all subjective anyway.

The first problem with this attitude, of course, is that it’s solipsism. QBists maintain that quantum states are subjective, merely a degree of belief, and therefore everything that is ultimately a quantum state is also subjective: measurement outcomes, Hamiltonians, unitaries, atoms, molecules, trees, capivaras, other people, planets, galaxies, etc. Not yourself, though. Despite being made of atoms, you somehow avoid this subjectiveness, and are unquestionable real. You are also definitely not a quantum state, and cannot be in a superposition. But isn’t that inconsistent with other people being subjective? No, say the QBists. Quantum theory, they maintain, is a single-user theory. You can use it, and it will work for you. Jacques can also use it, and it will work for him. But you both together? Nope, that’s forbidden. And when your predictions conflict with Jacques’? It doesn’t matter, everything is subjective anyway. Hence, solipsism.

Hopefully I don’t need to spend more than a couple of lines explaining why solipsism is such a terrible idea, because I won’t. I’ll merely remark that solipsism can explain any conceivable experience you might have, being therefore no explanation at all. Also that it is not even possible for two different people to agree that solipsism is a great idea, because they will each have their own, incompatible version of solipsism.

Solipsism has such a bad reputation that QBists bristle at this accusation and reply, offended, that of course they are not solipsists, they accept the existence of an objective reality. When pressed on what this objective reality is, then, if not quantum states, they reply that Hilbert space dimension is definitely real. And… that’s it? We just have some Hilbert spaces with some given dimensions, and we’re supposed to model the world with that? Of course not, they say, clearly Hilbert space dimension is not enough. But they can’t say what else is real, and don’t seem terribly interested in finding out. QBism is nearly two decades old, and still they only go on and on about the subjective stuff. I find this attitude incomprehensible. Surely after you identify which parts of your theory are merely subjective, you would discard them and focus on understanding the objective parts, you know, to understand the world out there?

Ok, if they can’t say what is real, can they at least say how their subjective quantum states relate to reality? Are they a probability distribution over the unknown real stuff, as in a $\psi$-epistemic model? Some sort of hidden-variable theory? No, no, no, they say, of course not, they would then be vulnerable to all sorts of no-go theorems. Their quantum states are just beliefs, and how these beliefs relate to reality can never be specified. See, you can never be wrong if you don’t say anything definite!

Let’s turn our attention to the QBist “explanation” for two quantum puzzles. First, the old classic, the nonlocal collapse of the wave function. The explanation, as you might have guessed, is that it doesn’t matter, because the wave function is just subjective anyway. Well, ok, but what happens in reality then? How are the nonlocal correlations produced? Their answer is “Who knows, who cares, and what is this reality thing anyway? And by the way we’re definitely not solipsists!” Over and over again I hear somebody repeating that this QBist “explanation” works as well as the explanation for the nonlocal collapse of classical correlations. It doesn’t. In the classical case the probability distribution that collapses is indeed subjective, arising from ignorance of the actual state of the system. And we can explain how the correlations arise, because the local hidden-variables theory is actually true! Classical relativistic mechanics is indeed local and deterministic. But to apply this explanation to the quantum regime makes no sense: the local hidden-variables theory is not true. And to simply ignore the necessity of being able to produce the correlations in a local way, as the QBists do, is just ridiculous.

The second puzzle is Wigner’s friend, the gedankenexperiment that’s all the rage in the arXiv nowadays. I have explained it in detail before, but the short version is that Wigner’s friend is stuck inside a decoherence-free box, where she makes a measurement on a qubit that’s in the state $\frac{\ket{0}+\ket{1}}{\sqrt2}$. Assuming that the measurement is unitary, this results in the usual world-splitting represented as
$\ket{\text{Wigner}}\frac{\ket{\text{friend}_0}\ket{0} + \ket{\text{friend}_1}\ket{1}}{\sqrt2}.$Wigner will make a measurement of his friend together with the qubit in the Bell basis, expecting to get result $\Phi^+$ with probability 1. If the friend is a Many-Worlder, both of her copies friend$_0$ and friend$_1$ will have no trouble agreeing with this conclusion, but if she is a QBist both of her copies will believe that her measurement caused a collapse, and therefore that Wigner will get result $\Phi^+$ or $\Phi^-$ with probability $1/2$. It takes a bit of mental gymnastics to get to this result, as QBism forbids her from assigning a quantum state to herself and to consider what Wigner will observe. In any case, her expectations are incompatible with Wigner’s, but that’s ok, as everything is subjective anyway and therefore it is not a problem if different agents assign different probabilities to the same event.

That’s what Rüdiger Schack said in a lecture in Vienna several years ago. Apparently this conclusion was too solipsistic even for the QBists, because they recently released a paper with a different take on this: now they refuse to say which probabilities the friend will assign to Wigner’s measurement, saying that it depends on too many things. They emphasize that the friend is not required to agree with Wigner, and that she can anyway learn what the relevant probabilities are if the experiment is repeated many times.

Now that’s just lame. This is a gedankenexperiment, for the love of Zeus, you’re free to specify whatever you think was not precise enough in order to get an answer. Just refusing to give one is the same trick of never being wrong by never saying anything.

That’s why I much prefer Bohmian mechanics and collapse models to QBism. At least they’re trying to do the right thing, and they’re wrong. QBism is not even trying, and is not even wrong.

EDIT: Shortly after posting this I got an angry email from Chris Fuchs. He sent a link to the post to his group, cc’ed to me, with the comment:

An actual scientist would wait until understanding what he was talking about before speaking. But the author of this surely knows his own inadequacies deep inside. My guess is they eat at him.

I think he needs a hug.

Posted in Uncategorised | 44 Comments

## Mistakes, mistakes, mitsakes

When I write someone to tell them there’s a mistake in their paper, I usually get one of these five reactions, ordered from best to worst:

• “It’s not a mistake.” (and they’re right)
• “Indeed, that’s wrong, I’ll correct it.”
• “It’s not a mistake.” (and they’re wrong)
• “Indeed, that’s wrong, but I’ll not correct it.”
• “Indeed, that’s wrong, people already told me about it, but I’ll not correct it.”

The first two need no explanation. The third one is of course frustrating to hear, but sometimes people will just disagree about what is a mistake or not, and we need to live and let live. The fourth one is really annoying. Your paper is wrong, just fix it, how can you leave something wrong on the arXiv? It’s so easy to fix! But maybe it really doesn’t make a difference, perhaps I’m the only one that even read that part of the paper and got sidetracked by it. But the fifth one really drives me crazy: if several people told you about the mistake it shows that it does matter, it is actually causing problems.

Anyway, I’m ranting about this because I’ve recently updated several of my papers on the arXiv to fix some mistakes. Maybe all my papers are wrong, but at least I correct them when I’m told about the problem. I even get happy about it, it’s a proof that somebody actually reads them. And when the mistake is in Appendix C, wow, respect, you need some determination to reach all the way down there.

Luckily in all the cases the mistakes were only in the proofs, and didn’t change any of the actual results of the papers. In two of the cases, arXiv:1401.8127 and arXiv:1611.08535, it’s easy to understand why: we just had written down the result in a particular way in the first draft, then decided to change the presentation for clarity, did a sloppy job in the rewriting, and ended up with some inconsistent equations. Another case, arXiv:1706.09854, it’s a bit mysterious: we had a completely wrong derivation leading up to a correct quantum circuit. I guess what happened is that we had found the quantum circuit in a different way, and afterwards thought about a way to derive it that didn’t involve black magic, because you really can’t write that you got the answer by sacrificing a black chicken for Exu at some crossroads. We can’t remember what was the original derivation, if it ever existed, but now we managed to find another derivation without black magic that does work.

The last case, arXiv:1903.10526, was really just blind luck. I really had the wrong thing in mind when deriving the result, which does follow from the wrong argument. It’s just that, by chance, it was correct. Zombie Schrödinger guided my hands on this one.

A colleague of mine, Simon Morelli, told me about a fascinating puzzle, the two-envelope paradox: you can take one of two envelopes with money inside them, promised to contain one $x$ and the other $2x$ for some $x > 0$3. You take the envelope and see that it has a quantity $y$ inside. Now you are given the option of switching to the other envelope. Should you?

Naïvely, you might reason that with probability 1/2 the other envelope will contain $2y$, and with probability 1/2 it will contain $y/2$, so the expectation value of switching is
$\frac122y + \frac12\frac{y}2 = \frac54y > y,$ and therefore it is always advantageous to switch, independently of the value of $y$. You don’t even need to look at it, you can just switch immediately after taking the envelope. And if offered the chance to switch again, you would. Although at this point most people would have realised that something is wrong and would stop the nonsense, only badly programmed robots would be stuck in an infinite loop.

What did go wrong, though? Checking the Wikipedia article about it won’t help, it’s a Library of Babel with incorrect solutions, correct solutions, several versions of the problem, and general babble. We have to work out the solution ourselves (and by doing that, increasing the size of the library, but hopefully increasing the proportion of correct solutions).

Well, the assertion that this probability above equals 1/2 is obviously suspect, we need to justify that somehow. It is either the probability that the envelope you chose, given $y$, is the one with the smallest amount of money, so $p(S|y)$, or the probability that it is the one with the biggest amount of money, so $p(B|y)$. The correct expectation value is then
$2y p(S|y) + \frac{y}2 p(B|y),$ and the question of whether this is always larger than $y$ is the question of whether
$p(S|y) > 1/3$ for all $y$. How do we calculate it? By Bayes’ rule we have that
$p(S|y) = \frac{ f(y|S)p(S) } { f(y|S)p(S) + f(y|B)p(B) } = \frac1{1+\frac{f(y|B)}{f(y|S)} },$ where $f(y|S)$ and $f(y|B)$ are the probability densities of having $y$ in the envelopes $S$ and $B$, respectively, and we are assuming that $p(S) = p(B)$, that is, that the probability of picking either envelope to start with is the same.

We just need to know these densities then to solve the problem. Since they’re not given we can’t actually know whether it is a good idea to switch, but we can reason about which densities give us which conclusions, and in particular which density gives us the paradoxical conclusion that switching is always advantageous.

First not that the densities are not independent; since envelope $B$ is constrained to have twice the amount of envelope $S$, it follows that2
$f(y|B) = \frac12 f(y/2|S).$ Now let
$f(y|S) = \frac1M [y \le M] \quad\text{and}\quad f(y|B) = \frac1{2M} [y \le 2M],$ that is, we assume that the amount of money in envelope $B$ is uniformly distributed from 0 to some upper bound $M$. A perfectly reasonable, physical, probability density. It results in $p(S|y) = 2/3$ for $y \le M$, and $p(S|y) = 0$ for $y > M$. Which agrees very well with intuition: if your envelope contains less than $M$, chances are you got the shorter end of the stick and you should switch, and if has more than that you certainly got lucky, and shouldn’t switch.

Now we can let $M$ go to infinity, which is unphysical, but gives us a well-defined result: $p(S|y) = 2/3$ for all $y$, and you should always switch. Which again agrees with intuition, twofold: first of all, you make a nonsensical assumption, and got a nonsensical answer. Secondly, if every positive real number is in fact equally likely, it is always more probable that you got the smallest one. What doesn’t agree with intuition is that $p(S|y)$ is $2/3$ instead of $1/2$, as the naïve calculation assumed, presumably for such a uniform distribution. Can we make it be equal to $1/2$ with some other density?

For that we need $f(y|S) = f(y|B) = \frac12 f(y/2|S)$ for all $y$. Now I’m not a mathematician, but I bet that the unique solution to this functional equation, given some regularity assumption, is $f(y|S) = k/y$ for a positive constant $k$. Which is indeed the Jeffrey’s prior for a number distributed on the positive half-line, a much more meaningful notion of uniform distribution than the previous one, which makes sense for the whole real line. This is not a normalisable density, though, which is probably for the best: it would give the paradoxical conclusion that it is always advantageous to switch envelope.

We can make this a bit more precise if we consider it as a limit of normalisable densities. Let then
$f(y|S) = \frac1{y \log(M^2)} [1/M \le y \le M]$
and
$f(y|B) = \frac1{y \log(M^2)} [2/M \le y \le 2M].$
This gives $p(S|y) = 1$ for $y \in [1/M,2/M]$, as you certainly got envelope $S$, the desired $p(S|y) = 1/2$ for $y \in [2/M,M]$, and $p(S|y) = 0$ for $y \in [M,2M]$, as you certainly got envelope $B$. If we let $M$ go to infinity again, we have indeed $p(S|y) = 1/2$ for all $y$, as our intuition would predict, but probably shouldn’t.

Another intuition says that you should always be indifferent to switching, as you have no idea about what $f(y|S)$ is, so learning $y$ doesn’t give you any information. Maybe there exists an $f(y|S)$ for which this is true? We need $2f(y|S) = f(y|B) = \frac12 f(y/2|S)$, which I bet has as unique solution $f(y|S) = k/y^2$. Again this is not normalisable, so this intuition is on rather shaky ground.

From all these examples, one might think that the paradox only arises from unnormalisable densities, but this is not the case. As shown by Broome, it also arises from the innocent-looking
$f(y|S) = \frac1{(1+y)^2}.$ It results in
$p(S|y) = \frac{y^2 + 4y + 4}{3y^2 + 8y + 6},$ which is indeed always strictly larger than $1/3$.

No, the real problem is that these densities are unphysical. Broome claims that his isn’t:

Both my examples of paradoxical distributions are decent statistical
distributions. There are processes for generating either: […] My continuous distribution can be generated as follows. Pick a number $t$ at random from a uniform distribution between 0 and 1, and let $y$ be $t/(1-t)$. Then $y$ has the required distribution.

While it is true that this $y$ would have the required distribution, there’s no conceivable physical process that could generate it. You would need $t$ to actually come from the continuum, in particular being able to get arbitrarily close to 1 so that $y$ gets arbitrarily large. The continuum is only a mathematical abstraction, though, any physical process for generating $t$ will need to discretize it; if $\epsilon$ is the smallest step you can do, then the largest $t$ strictly smaller than 1 is $1-\epsilon$, which gives you the largest $y$ as $(1-\epsilon)/\epsilon$. But having any upper bound on $y$ is enough to dispel the paradox.

Posted in Uncategorised | 2 Comments

## The problem with being your own father

is not really a moral one. In the typical time travel story you were abandoned as a child, so after you travel back in time you don’t even know who is supposed to be your mother. Now what can you do: refrain from sex completely in order to avoid possibly having sex with your mother? If you were successful you’d just erase your own existence, which is widely considered undesirable, and is anyway impossible, as it would be a grandfather paradox. Or just a father paradox, in this case. This would just make your trip in time a miserable affair. You might as well accept your inexorable destiny and enjoy your time in the past.

No, the problem is, what the hell is your DNA? It needs to respect the consistency condition that the DNA of you-father is exactly the same as the DNA of you-child. But the DNA of you-child is determined by the recombination of the DNA of you-father with the DNA of your mother. For instance, the pair of homologous chromosomes 1 of you-father, F1 and F2, recombine with the your mother’s pair, M1 and M2, giving the possible pairs of you-child as F1M1, F1M2, F2M1, and F2M2 (ignoring crossover). Now F1F2 must be equal to one of these four pairs, which means that at least one of your chromosomes must be equal to your mother’s! That’s nothing surprising if you don’t time-travel, but if you do, it means that at some point you share a chromosome with this girl you just met.

Let’s say then that F1 = M1. Then the possible pairs of you-child are M1M1, M1M2, F2M1, and F2M2. The first possibility is bound to give you some horrible genetic disease, so let’s exclude that3. The second possibility would make you an identical twin of your mother, which is fine for chromosome 1, but cannot be the case for the sex chromosomes, unless you are one of the exceedingly rare true hermaphrodites and possibly also your own mother.

This leaves us with the most prosaic alternatives F2M1 or F2M2. Which are apparently unproblematic, until you ask where does F2 come from. It doesn’t depend on anything from the past, even probabilistic. It’s completely arbitrary, its genes don’t even need to be human, apart from the constraint that they shouldn’t make reproduction with humans impossible.

If that doesn’t disturb you, consider Everett and Roman’s undeserved Fields medal paradox2: one day you wake up to find on your desk a paper, signed by you, with a proof of the Riemann hypothesis. You read the paper, check that the proof is in fact correct, and publish it. You become famous for the result and eventually receive the Fields medal. You then find a time machine, and use it to deposit the paper on your desk just before you found it. Now, who wrote this proof? Who deserves the Fields medal?

## Meaningful Bell inequalities part 2: examples

Here I’ll show examples of the meaningful Bell inequalities I talked about in the previous post, where they are rephrased as nonlocal games and the gap between the local bound and the Tsirelson bound is made as large as possible.

First we have to talk about notation. Instead of writing Bell inequalities like
$\sum_{abxy} M^{ab}_{xy}p(ab|xy) \le L(M),$I’m going to focus on the coefficients $M^{ab}_{xy}$. Since the meaningful Bell inequalities are nonlocal games, the coefficients have a very specific form,
$M^{ab}_{xy} = \mu(x,y)V(a,b,x,y),$ where $\mu(x,y)$ is the probability that the referee asks questions $x$ and $y$ to Alice and Bob, and $V(a,b,x,y)$ is the probability that the referee accepts answers $a$ and $b$ to questions $x$ and $y$. Usually $V(a,b,x,y)$ is restricted to be either 0 or 1, so that it simply determines which answers will make them win or lose the game. Unfortunately we cannot make this restriction, as it would make it impossible to transform every Bell inequality into a nonlocal game. This might seem obvious, but curiously we couldn’t find any proof in the literature, so we had to do one ourselves in our paper, and it wasn’t easy. Even in the cases where the Bell inequality can be transformed into a nonlocal game with such a deterministic predicate, it will not necessarily be optimal, as we’ll see below in the CGLMP case, so there’s really no getting around that.

Braunstein-Caves inequalities

The first example I’d like to show are the Braunstein-Caves inequalities, known to the computer scientists as the Odd Cycle game (even though it works just as well for even cycles)3. Since they can be formulated as an XOR game, which is a particular case of unique games, their optimal version is just the XOR game, as I mentioned in the previous post. The game is very simply: it has $m$ inputs and 2 outputs per party, the inputs are uniformly distributed over integers $x,y \in \{0,\ldots,m-1\}$ that are either equal or differ by one mod $n$, and Alice and Bob win if they give equal outputs for all inputs except for $x=y=0$. More formally, the game has distribution over inputs
$\mu(x,y) = \frac1{2m}([x=y]+[x=y+1 \mod m]),$ where $[\Pi]$ is 1 if the proposition $\Pi$ is true and 0 otherwise, and predicate
$V(a,b,x,y) = [a\oplus b=\delta_{x0}\delta_{y0}],$where $\delta_{ij}$ is the usual Kronecker delta.

The local bound of this nonlocal game, the maximal probability of winning it with local hidden variables, is
$\omega_\ell = 1-\frac1{2m},$and its Tsirelson bound, the supremal probability of winning it with quantum mechanics, is
$\omega_q = \cos^2\left(\frac\pi{4m}\right).$ At first sight this is very disappointing, as $\omega_q-\omega_\ell$ decreases monotonically with $m$, so it would seem that the best game is the one for $m=2$, where it reduces to the CHSH game. What matters though is not really the gap $\chi$ between the local and Tsirelson bounds, but the expected $p$-value you get after playing $n$ rounds of the nonlocal game. The expected $p$-value is upperbounded by $(1-\chi^2)^n$, which reinforces the impression that the Braunstein-Caves game sucks, but a more refined upper bound is
$p \le \left(\sqrt{\omega_\ell \omega_q}+\sqrt{(1-\omega_\ell)(1-\omega_q)}\right)^{2n},$ which is actually minimized for $m=4$. So yes, the Braunstein-Caves game is actually better than the CHSH game for rejecting local hidden variables! This is not all that matters, though; even though the expected $p$-value is smaller, the Braunstein-Caves game for larger $m$ is still useful for self-testing and proving measurement incompatibility.

$I_{mm22}$ inequalities

The second example I’d like to show are the $I_{mm22}$ inequalities. Like the Braunstein-Caves inequalities, they have 2 outputs and $m$ inputs per party, but unlike them, they are facets of the local polytope, so we could expect the $I_{mm22}$ inequalities to be better in all aspects. Well, not for the gap. We only have numerics here, as the Tsirelson bounds are not known exactly, and the $I_{mm22}$ inequalities cannot be transformed into a unique game or a CGLMP game, the only cases where we know the optimal form analytically, but the numbers are pretty disheartening:

$n$ $\omega_\ell$ $\omega_q$ $\chi$
2 0.7500 0.8536 0.1036
3 0.8000 0.8502 0.0502
4 0.8333 0.8654 0.0321
5 0.8571 0.8795 0.0224

In this case the small gap is not misleading: the expected $p$-value does go down monotonically with $m$, and thus the best $I_{mm22}$ inequality for rejecting local hidden variables is in fact the one for $m=2$, where it also reduces to the CHSH. Again, the $p$-value is not everything: the $I_{3322}$ inequality is violated by quantum states that do not violate the CHSH inequality, and its Tsirelson bound violation is probably attained only with infinite-dimensional quantum states, so it is still a pretty interesting inequality.

The example of the $I_{3322}$ inequality is also a good illustration of the arbitrariness in how you write a Bell inequality: pretty much all papers that talk about it formulate it differently, and it is rather tedious to switch from one form to another. To solve this problem we formulated it as a somewhat nice nonlocal game with the bounds mentioned above: if Alice and Bob are given inputs $(0,1)$, $(1,0)$, $(0,2)$, or $(2,0)$, which happens with probability $1/10$ each, they have to give different answers. If they are given instead inputs $(1,2)$ or $(2,1)$, which again happens with probability $1/10$, they have to give equal answers. The last case is when they get inputs $(0,0)$ or $(1,1)$, which happens with probability $1/5$ each. In this case they have to answer anything other than $(0,0)$. We couldn’t find even mildly nice nonlocal games for higher $m$, but since the solution is far from unique hopefully somebody else will manage to do it.

CGLMP inequalities

The third example I’d like to show is the CGLMP inequalities. Unlike the Braunstein-Caves and $I_{mm22}$, the gap finally gets larger than CHSH’s. Apparently it is easier to get a larger gap by increasing the number of outputs than by increasing the number of inputs. In any case, that’s what CGLMP does: it has 2 inputs and $d$ outputs per party. Using the form presented here, it’s easy to massage them into the nonlocal game with input distribution $\mu(x,y) = 1/4$ and predicate
$V(a,b,x,y) = \sum_{k=0}^{d-1} \left(1-\frac{k}{d-1}\right)\left[a-b = (-1)^{x\oplus y}k + xy \mod d\right],$which as mentioned before is already the optimal one.

Curiously, this predicate is probabilistic: if the answers satisfy the condition $\left[a-b = (-1)^{x\oplus y}k + xy \mod d\right]$ for some $k$, the referee accepts them with probability $\left(1-\frac{k}{d-1}\right)$. Even more curiously, it is always possible to make this predicate deterministic, at the cost of having a non-optimal nonlocal game. The best deterministic predicate we found2 is
$V(a,b,x,y) = \left[ (a-b)(-1)^{x\oplus y} \ge xy \right],$ which reduces the size of the gap by a factor of $(d-1)/d$.

The optimal version has a local bound of $\omega_\ell = 3/4$ for all $d$. The Tsirelson bounds are pretty much known: the lower bounds found here match the upper bounds from the NPA hierarchy within numerical precision, so with all probability they are the actual Tsirelson bounds. There’s no proof, however, except for the $d=2$ case, which is as usual just the CHSH inequality. For $d=3$ there’s an almost nice expression, $\frac{1}{24}(15+\sqrt{33})$, but for higher $d$ the lower bound is just the norm of a $d \times d$ matrix with irrational coefficients, there’s little hope for a simple form. The limit is known, though: for large $d$ the Tsirelson bound converges to 1.

Here are some value for small $d$:

$d$ $\omega_\ell$ $\omega_q$ $\chi$
2 0.7500 0.8536 0.1036
3 0.7500 0.8644 0.1144
4 0.7500 0.8716 0.1216
5 0.7500 0.8770 0.1270

CHSH-$d$ game

The fourth example I’d like to show is the CHSH-$d$ game, introduced here. It looks rather innocent: it has $d$ inputs and $d$ outputs per party, with uniform distribution over the inputs $\mu(x,y) = \frac{1}{d^2}$ and predicate
$V(a,b,x,y) = [a+b=xy \mod d],$
so it is a straightforward generalisation of the CHSH game. It is also a unique game, so this form is already the optimal one. Seems simple, no? Nothing could be further than the truth. Neither the local bound nor the Tsirelson bound is known; the only analytical result is an upper bound to the Tsirelson bound,
$\omega_q \le \frac1d + \frac{d-1}{d\sqrt{d}},$ valid only when $d$ is a prime number or a power thereof, and known not to be tight in general. It was first proved here, but I find the later proof much easier to understand. A fascinating variant of CHSH-$d$ for which the above expression is in fact the Tsirelson bound was found here.

The local bound is known only for small $d$, found here together with good lower bounds on the Tsirelson bound:

$d$ $\omega_\ell$ $\omega_q$ $\chi$
2 0.7500 0.8536 0.1036
3 0.6667 0.7124 0.0457
5 0.4800 0.5375 0.0575
7 0.3878 0.4587 0.0709
11 0.3058 0.3325 0.0267
13 0.2781 0.2987 0.0206

We see that the gap is maximal for $d=2$, with a smaller peak at $d=7$, and seems to go only down afterwards. So for all its beauty, the CHSH-$d$ game is useless for the bare purpose of rejecting local hidden variables.

Tailored inequalities

Introduced here, these are an ingenious generalisation of the Braunstein-Caves and together with a variant of the CGLMP inequalities, with $m$ inputs and $d$ outputs per party. They are tailored for the purpose of achieving maximal violation with the maximally entangled state of dimension $d$, and have the very convenient feature that the local and Tsirelson bounds are known exactly for all $m$ and $d$.

At first sight they look quite complicated, but when reformulated as a nonlocal game they are actually rather straightforward: the distribution over the inputs is the same as for the Braunstein-Caves inequalities,
$\mu(x,y) = \frac1{2m}([x=y]+[x=y+1 \mod m]),$and like in the CGLMP case the predicate is probabilistic, being given by
$V(a,b,x,y) = \sum_{k=0}^{d-1} \alpha_k [ a-b + (-1)^{\delta_{xy}+1}k + \delta_{x0}\delta_{y0} = 0 \mod d],$and the $\alpha_k$ are where all the fun is happening:
$\alpha_k = \frac{g(k)-g(d-1)}{g(0)-g(d-1)},$ where
$g(k) = \cot\left(\pi\frac{2mk+1}{2md}\right).$Luckily this game shares the key property of the CGLMP that makes it already be the optimal solution: the predicate depends only on $a-b \mod d$, so we don’t need to do any numerics.

The non-signalling bound is simply 1, unlike in the paper, where it is a horrible expression, the local bound is
$\omega_\ell = 1-\frac1{2m},$the same as for the Braunstein-Caves inequalities (and also unlike in the paper, where it is another horrible expression), and the Tsirelson bound is
$\omega_q = \frac{d g(\frac{d-1}{2m})-g(d-1)}{g(0)-g(d-1)},$which I must admit is more complicated than the expression in the paper. Well, we cannot have everything. For fixed $d$ and large $m$ the Tsirelson bound converges to 1, and for fixed $m$ and large $d$ it converges to $\frac1{2m} + \frac{2m-1}{4m^2}\pi \cot(\frac{\pi}{2m})$.

The gap here decreases monotonically as a function of both $d$ and $m$, but not the expected $p$-value: it is minimised at $d=2$ and $m=4$, when the tailored inequality reduces to Braunstein-Caves. For a fixed $m$ increasing $d$ only decreases the $p$-value, but for a fixed $d$ the minimum seems to be attained at some $m$ that increases with $d$. If one wants a larger gap one could set $\alpha_k = (1-k/(d-1))$, so that the tailored inequalities strictly generalise CGLMP, but that would defeat both points of having a maximal violation for a maximally entangled state and having a simple expression for the Tsirelson bound.

Żukowski, Gisin, and Tsirelson

These are two closely related XOR games with the curious property that the measurement setting is a continuous variable, $x,y\in[0,1]$. One might hope that one can do better than CHSH by allowing this extravagance, but alas, it is not the case. I find them interesting anyway, because these games make it impossible to make the usual mistake of doing the experiment by estimating each individual conditional probability appearing in the Bell inequality. There are infinitely many! You actually need to do the experiment by playing the game: sampling the input according the probability distribution $\mu(x,y)$, and checking whether you won that round by looking at the predicate $V(a,b,x,y)$.

The first variant was introduced by Żukowski in 1993, and the second by Gisin in 1999. Żukowski proved the local bound for his variant, but not the Tsirelson bound. Gisin proved neither, and Tsirelson proved both bounds for Gisin’s variant.

Let’s start with Gisin’s version, the simplest one. Here the probability density over the inputs is uniform, $\mu(x,y) = 1$, and the predicate is given by
$V(a,b,x,y) = [a\oplus b = [x \ge y]].$ The local bound is $\omega_\ell = 3/4$, and the Tsirelson bound is $\omega_q = 1/2+1/\pi$. The gap is thus $1/\pi-1/4 \approx 0.068$, smaller than CHSH’s.

In Żukowski’s version the probability density of the inputs is nontrivial,
$\mu(x,y) = \frac1{8\pi}|\sin(2\pi(x+y))|,$and the predicate a bit weird3,
$V(a,b,x,y) = [a\oplus b = [x+y \ge 1/2 \mod 1]],$ but that pays off with a slightly larger gap: the local bound is $\omega_\ell = 1/2+1/\pi$, and the Tsirelson bound respects $\omega_q \ge 1/2+\pi/8$, so the gap is at least $\pi/8-1/\pi \approx 0.074$. Also, this nonlocal game has a nice generalisation for multiple parties, that in some sense is stronger than the usual WWŻB inequalities4

That’s it! I plan to use this post as a catalogue of the meaningful forms of cool Bell inequalities, so if you know some that I missed please let me know. I’m specially interested in Bell inequalities for which the gap increases with increasing number of inputs; I’m a bit surprised that I couldn’t find a single example of this happening in the physics literature, as the computer science literature is full of them.

## Meaningful Bell inequalities

Since I’ve started learning about Bell inequalities, it always bothered me that they are usually present as just some linear combinations of conditional probabilities, without any physical interpretation besides the fundamental fact that the maximal value this linear combination can attain when the probabilities come from local hidden variable models is smaller than when they come from quantum mechanics5.

For example, the most famous Bell inequality, the CHSH inequality, is usually written as
$\langle A_0 B_0 \rangle + \langle A_0 B_1 \rangle + \langle A_1 B_0 \rangle – \langle A_1 B_1 \rangle \le 2,$ where
$\langle A_xB_y\rangle := p(a{=}b|xy)-p(a{\neq}b|xy).$ It looks like just an arbitrary sum of conditional probabilities; it is justified by noting that this is a facet of the local polytope2, but still this doesn’t give an interpretation of what this sum measures.

To make things worse, there’s a lot of arbitrariness in how we write a Bell inequality: we can simply divide both sides by a positive constant3, which clearly doesn’t make a difference in whether the inequality is violated or not, add a combination of probabilities that sum to 1 to the left hand side and 1 to the right hand side, which also clearly doesn’t make a difference, or add a no-signalling constraint, which I’ll talk about later.

As a consequence, the local bound of the CHSH inequality, 2, is just an arbitrary number, and likewise the Tsirelson bound $2\sqrt{2}$, as these transformations change them willy-nilly. Valerio Scarani famously has a football t-shirt with the player number $2\sqrt{2}$. But he might as well have put $\sqrt{2}$ on it, as people sometimes write the CHSH inequality divided by 2, or pretty much any real number – as long as somebody wrote a paper with the corresponding version of the inequality ;)

One can notice, though, that the CHSH expression is maximized when the correlations are such that $a=b$ when the inputs are $(0,0)$, $(0,1)$, or $(1,0)$, and $a\neq b$ when the inputs are $(1,1)$, and reformulate it as the probability that this happens. The result is the inequality
$\frac14 \left( p(a{=}b|00) + p(a{=}b|01) + p(a{=}b|10) + p(a{\neq}b|11) \right ) \le \frac34,$which now has a nice interpretation as a nonlocal game: it measures the probability that Alice and Bob win the game of outputting $(a,b)$ for input $(x,y)$ such that $a\oplus b = xy$, when the inputs are given with uniform probability. The local bound $\frac34$ is no longer arbitrary, but it is the maximal probability of winning the game using local hidden variables, and likewise the Tsirelson bound $\frac{2+\sqrt{2}}{4}$ is the maximal probability of victory with quantum mechanics. As far as I know the first person that formulated the CHSH inequality like this was Boris Tsirelson, which makes it specially appropriate to call $\frac{2+\sqrt{2}}{4}$ a Tsirelson bound.

It turns out that every Bell inequality can be rewritten as a nonlocal game – provided we allow the winning condition of the nonlocal game be probabilistic – so all of them, together with their local and Tsirelson bounds, can be made meaningful in this way. End of story? No, because there is still arbitrariness in the nonlocal game formulation.

For example, one can add the equation
$\alpha\left(p(00|00) + p(01|00) + p(10|00) + p(11|00) \right) = \alpha$ to the nonlocal game form of the CHSH inequality (after multiplying it by 4, for convenience). We then divide it by $4+\alpha$, so that it is still a valid nonlocal game, albeit a weird one: when given input $(0,0)$, which happens with probability $\frac{1+\alpha}{4+\alpha}$, Alice and Bob win the game with certainty if their outputs $(a,b)$ are equal, and win the game with probability $\frac{\alpha}{1+\alpha}$ if they are different. That is, the referee will decided probabilistically whether to accept their answers. For the other inputs the winning condition doesn’t change, but they are now distributed with probability $\frac{1}{4+\alpha}$ each.

The local bound of the nonlocal game becomes $\frac{3+\alpha}{4+\alpha}$, and the Tsirelson bound $\frac{2+\sqrt{2}+\alpha}{4+\alpha}$. This might be surprising, since adding such a normalisation condition shouldn’t change anything! Well, it doesn’t change which correlations violate the inequality or not, but it certainly changes something: since both local hidden variables and quantum mechanics will by definition produce normalized probabilities, the normalisation condition cannot discriminate between them. Hence the larger the $\alpha$, the worse the nonlocal game will be at this task. Indeed both the local and Tsirelson bounds converge to 1 for large $\alpha$, as in this limit the game reduces to only asking the players to produce normalized probabilities, which they answer with “Yes, sure, no problem. But why?”.

Now, this is obviously silly, but we have to be able to deal with this silliness in a systematic fashion in order to remove this arbitrariness from the nonlocal game formulation. The way I propose to do it, in my new paper with Flavien Hirsch and Marco Túlio, is to optimise over the form of the nonlocal game to make the gap between the local and Tsirelson bounds as large as possible. We show that the expected $p$-value you get after playing $n$ rounds of a nonlocal game with gap $\chi$ is upperbounded by $(1-\chi^2)^n$, so there is a precise sense in which maximizing the gap between the local and Tsirelson bounds increases the power of the game for rejecting local hidden variables.

It turns out that optimizing over such normalisation conditions is rather easy: we just need to use them to make the smallest coefficient of each setting $(x,y)$ equal to zero, and rescale the game such that the largest coefficient of each setting is equal to one.

What is more difficult is dealing with the no-signalling constraints: we can also add to the CHSH game an equation like
$\gamma\left(p(00|00) + p(01|00) – p(00|01) – p(01|01) \right) = 0,$which simply says that the marginal probability that Alice gets result 0 doesn’t depend on whether Bob’s input $y$ is 0 or 1. Now both local hidden variables and quantum mechanics will by definition respect this equation, so again it doesn’t change which correlations violate the inequality or not, but again it does change the power of the game. After a bit of work to remove the negative coefficients and renormalise the game, we see that the local bound becomes $\frac{3+\gamma}{4+2\gamma}$ and the Tsirelson bound $\frac{2+\sqrt2+\gamma}{4+2\gamma}$. For large $\gamma$ both bounds converge to $1/2$, which makes sense: in this limit the game reduces to asking Bob to signal to Alice, but since they cannot do it, irrespectively of using local hidden variables or quantum mechanics, they simply guess randomly, with probability of success $1/2$.

In this case it is trivial to do the optimisation, we just set $\gamma = 0$, and we have the maximal possible discriminating power, but in general this is quite hard: the best we could do is formulate the optimisation as a linear program, so at least it is easy to do it numerically. We also managed to find analytical solutions for two particular cases: when the nonlocal game can be turned into a unique game4 (as is the case for CHSH), doing that makes it optimal, and when the nonlocal game can be turned into a CGLMP game5, doing that makes it optimal.

In these two cases the solution is unique, so maximizing the gap $\chi$ gives you the optimal local bound, the optimal Tsirelson bound, and the optimal form of the Bell inequality6. But in general life is not so easy: fixing the gap $\chi$ to be the maximal one is not enough to determine the local and Tsirelson bounds. We can deal with that by maximising the Tsirelson bound as well, to make it easier to win the nonlocal game with quantum mechanics.

This is enough to get you optimal, unique, and physically meaningful local and Tsirelson bounds. Pure bliss, no? A bit awkward is the fact that this is still not enough to fix the form of the nonlocal game itself. Perhaps it doesn’t matter, and the differences between these form are as inconsequential as relabellings. Or perhaps they do matter, and somebody else will figure out how to pick up the best among them.

I’ll show the optimal forms of some Bell inequalities in a separate post, as this one is getting too long already. But there’s one more thing I’d like to say: Valerio, if you’re reading this, put $\frac{2+\sqrt{2}}{4}$ on your t-shirt.

## The Choi-Jamiołkowski isomorphism: you’re doing it right

Yes, you are. There’s nothing to worry about. I’m just writing this blog post because there are some people trying to convince you otherwise, and I don’t want you to get trolled by them.

First of all, what is the Choi-Jamiołkowski isomorphism? It is one way to write a linear map $\mathcal E$ as a matrix $\mathfrak{C}(\mathcal E)$ so that the matrix is positive semidefinite if and only the map is completely positive. This is very convenient if you want to find out whether $\mathcal E$ is completely positive, or if you’re doing semidefinite programming. If all you want is to write the linear map as a matrix, you’re better off using a representation with a more convenient formula for applying it to a quantum state.

The usual way to define it is
$\mathfrak{C}(\mathcal E) := \sum_{ij} \ket{i}\bra{j} \otimes \mathcal E(\ket{i}\bra{j}),$and I assume this is what you’re doing. Yes, it is fine really, there’s nothing wrong with it. What Leifer, Allen et al., and Hoffreumon et al. complain about it is that it’s basis dependent. Which is true: if instead of the basis $\{ \ket{i} \}$ in the above formula you use the basis $\{ U\ket{i}\}$ for some unitary $U$, the resulting matrix does depend on $U$. To see that just let $U$ apply to $\ket{i}$ a phase that depends on $i$.

But so what? It will be a different matrix, but still a perfectly valid representation for $\mathcal E$. And why are you changing the basis anyway? I don’t think anybody ever did the Choi-Jamiołkowski isomorphism in a basis other than the computational one. Nevertheless, since the choice of basis clearly doesn’t matter for the isomorphism, perhaps it’s more elegant to use an isomorphism that doesn’t depend on it.

Leifer proposes using
$\mathfrak{J}(\mathcal E) := \sum_{ij} \ket{j}\bra{i} \otimes \mathcal E(\ket{i}\bra{j})$instead. It’s easy to see that this is indeed basis-independent, but it has a fatal flaw: it doesn’t represent completely positive maps as positive semidefinite matrices. For example, the representation of the identity map will have a negative eigenvalue. Since this was the main point of using the Choi-Jamiołkowski isomorphism in the first place, no thanks, I’ll pass.

Allen et al. and Hoffreumon et al. promise to give us the best of both worlds, an isomorphism which is basis-independent and represents completely positive maps as positive semidefinite matrices, by defining instead
$\mathfrak{B}(\mathcal E) := \sum_{ij} \ket{i}_{A^*}\bra{j} \otimes \mathcal E(\ket{i}_A\bra{j}),$where $\{\ket{i}_A\}$ is a basis for the primal Hilbert space $A$, and $\{\ket{i}_{A^*}\}$ the dual basis, belonging to the dual Hilbert space $A^*$, the space of linear functionals on $A$.

Now, what is this thing? What I understand as the dual of $\ket{i}$ is simply its bra $\bra{i}$, which would make this isomorphism the same thing that Leifer proposed. After a long discussion with Marco Túlio and Oreshkov, I finally understood what they meant: one can indeed write it as Leifer’s version of the isomorphism, but the operator $\ket{j}\bra{i}$ on the first tensor subsystem has to be understood as a map taking linear functionals to linear functionals, that is, it acts on bras to the left, being defined as
$(\bra{\psi})\ket{j}\bra{i} := \langle \psi | j \rangle \bra{i},$or, to use their notation,
$\ket{i}_{A^*}\bra{j}(\ket{\psi}_{A^*}) := \langle \psi | j \rangle \ket{i}_{A^*}.$Ok, this works, but it is quite inconvenient; I don’t want to use operators that act on the left or change the rules of the bra-ket notation in order to make calculations with this thing. Can’t we redefine this isomorphism so that it works in the usual way?

There is an easy way around it: remember that $\langle \psi | j \rangle = \overline{\langle j | \psi \rangle}$, so if we define these funny vectors as the complex conjugate of the usual vectors, $\ket{\psi}_{A^*} := \overline{\ket{\psi}_A}$, then linear algebra is back to normal.

We can then redefine the isomorphism as
$\mathfrak{B}(\mathcal E) := \sum_{ij} \overline{\ket{i}\bra{j}} \otimes \mathcal E(\ket{i}\bra{j}),$and it is easy to show that it is indeed independent of the choice of basis $\{\ket{i}\}$: let $\mathcal E$ be the identity map again (by linearity it is enough to consider this case). Its representation is given by
$\mathfrak{B}(\mathcal I) = \sum_{ij} \overline{U\ket{i}\bra{j}U^\dagger} \otimes U\ket{i}\bra{j}U^\dagger.$Inserting two resolutions of the identity matrix we get
\begin{align*} \mathfrak{B}(\mathcal I) &= \sum_{ijkl} \overline{U\ket{i}\bra{j}U^\dagger} \otimes \ket{k}\bra{k}U\ket{i}\bra{j}U^\dagger\ket{l}\bra{l} \\
&= \sum_{ijkl} \overline{U\ket{i}\bra{i}U^\dagger\ket{k}\bra{l}U\ket{j}\bra{j}U^\dagger} \otimes \ket{k}\bra{l} \\
&= \sum_{kl} \overline{\ket{k}\bra{l}} \otimes \ket{k}\bra{l},
\end{align*} where we used the fact that $\bra{i}M\ket{j} = \overline{\bra{j}M^\dagger\ket{i}}$.

This isomorphism is not actually basis-independent, though, because it depends on the basis we have chosen to do the complex conjugation in. And to add insult to the injury, the operator you get by choosing some specific basis for the complex conjugation is exactly the same as the operator you get by using this basis to define the usual Choi-Jamiołkowski isomorphism.

So yes, you are doing it right.

Posted in Uncategorised | 4 Comments

## A great month for the quantum switch

If you’re a fan of the quantum switch, like me, the arXiv brought two great news this month: first, an Argentinian-Brazilian-Chilean collaboration finally managed to implement a superposition of more than two causal orders, as I had been bothering people to do since I first worked on the quantum switch. After the experiment was done they invited me, Alastair Abbott, and Cyril Branciard to help with the theory, which we gladly accepted. With their fancy new laboratory in Concepción, they managed to put 4 quantum gates in a superposition of 4 different causal orders, using the path degree of freedom (in optical fibres) of a photon as the control system, and the polarisation degree of freedom as the target system. Using the original algorithm I had developed for the quantum switch, that would have been pointless, as that algorithm required a target system of dimension at least $n!$ (and maybe even $n!^{n-1}$) to make use of a superposition of $n$ gates. To get around that they came up with a new algorithm, that seems just as hard as the old one, but only needs a target system of dimension 2 to work.

They also did an elegant refutation of a criticism that was levelled to previous versions of the experiment: people complained that depending on the state of the control system, the photons were hitting the waveplates implementing the unitaries at different points, which acted differently. In this way the experiment wasn’t actually probing the same unitaries in different orders, but rather different unitaries, which isn’t really interesting. Well here the experimentalists have shown that if you let it all four paths shine at a waveplate implement a unitary simultaneously, you get a beautiful interference pattern, showing that the paths are indistinguishable at the waveplate, and thus the same unitary is implemented, independently of the control system.

The other piece of great news appeared on the arXiv today: Barrett, Lorenz, and Oreshkov managed to show that no bipartite causally nonseparable process is purifiable. This means that if you have a process matrix, encoding the causal relationship between two parties, Alice and Bob, and this process matrix is not simply a mixture of saying that Alice acts before Bob or Bob acts before Alice, but rather encodes an indefinite causal order, then it is not possible to purify this matrix, that is, it is not possible to recover this process from a larger process, with more parties, that is compatible with the principle of conservation of information.

Now I hold the principle of conservation of information very close to my heart, and proposed a purification postulate, that if a process is not purifiable then it is unphysical. If you believe that, then the result of Barrett et al. implies that the only way to put two parties in an indefinite causal order is really just the quantum switch7. I had asked this question 3 years ago, managed only to show that a few processes are not purifiable, and speculated that none of them are. Now they came and did the hard part, solving the question completely for two parties.

UPDATE 13.03: And today the arXiv brought a paper by Yokojima et al. that proved the same result independently. They formulated it in a different way, that I find more clear: if you have a pure process, which encodes the causal relationships between Alice, Bob, a global past, and a global future, then this process is necessarily a variation of the quantum switch.

This paper also has a well-hidden gem: they showed that if you try to superpose two different causal orders, without using a control system to switch between them, then this will never be a valid process matrix. This plays wells in the theme “the quantum switch is the only thing that exists”, and allows us to be careless about talking about superpositions of causal orders. Since we can’t have superpositions without a control, then there’s no need to distinguish those with a control from those without.

Apparently this had been shown by Fabio Costa some years ago, but he kept the result a secret and got scooped.

Posted in Uncategorised | 8 Comments