A great month for the quantum switch

If you’re a fan of the quantum switch, like me, the arXiv brought two great news this month: first, an Argentinian-Brazilian-Chilean collaboration finally managed to implement a superposition of more than two causal orders, as I had been bothering people to do since I first worked on the quantum switch. After the experiment was done they invited me, Alastair Abbott, and Cyril Branciard to help with the theory, which we gladly accepted. With their fancy new laboratory in Concepción, they managed to put 4 quantum gates in a superposition of 4 different causal orders, using the path degree of freedom (in optical fibres) of a photon as the control system, and the polarisation degree of freedom as the target system. Using the original algorithm I had developed for the quantum switch, that would have been pointless, as that algorithm required a target system of dimension at least $n!$ (and maybe even $n!^{n-1}$) to make use of a superposition of $n$ gates. To get around that they came up with a new algorithm, that seems just as hard as the old one, but only needs a target system of dimension 2 to work.

They also did an elegant refutation of a criticism that was levelled to previous versions of the experiment: people complained that depending on the state of the control system, the photons were hitting the waveplates implementing the unitaries at different points, which acted differently. In this way the experiment wasn’t actually probing the same unitaries in different orders, but rather different unitaries, which isn’t really interesting. Well here the experimentalists have shown that if you let it all four paths shine at a waveplate implement a unitary simultaneously, you get a beautiful interference pattern, showing that the paths are indistinguishable at the waveplate, and thus the same unitary is implemented, independently of the control system.

The other piece of great news appeared on the arXiv today: Barrett, Lorenz, and Oreshkov managed to show that no bipartite causally nonseparable process is purifiable. This means that if you have a process matrix, encoding the causal relationship between two parties, Alice and Bob, and this process matrix is not simply a mixture of saying that Alice acts before Bob or Bob acts before Alice, but rather encodes an indefinite causal order, then it is not possible to purify this matrix, that is, it is not possible to recover this process from a larger process, with more parties, that is compatible with the principle of conservation of information.

Now I hold the principle of conservation of information very close to my heart, and proposed a purification postulate, that if a process is not purifiable then it is unphysical. If you believe that, then the result of Barrett et al. implies that the only way to put two parties in an indefinite causal order is really just the quantum switch1. I had asked this question 3 years ago, managed only to show that a few processes are not purifiable, and speculated that none of them are. Now they came and did the hard part, solving the question completely for two parties.

UPDATE 13.03: And today the arXiv brought a paper by Yokojima et al. that proved the same result independently. They formulated it in a different way, that I find more clear: if you have a pure process, which encodes the causal relationships between Alice, Bob, a global past, and a global future, then this process is necessarily a variation of the quantum switch.

This paper also has a well-hidden gem: they showed that if you try to superpose two different causal orders, without using a control system to switch between them, then this will never be a valid process matrix. This plays wells in the theme “the quantum switch is the only thing that exists”, and allows us to be careless about talking about superpositions of causal orders. Since we can’t have superpositions without a control, then there’s no need to distinguish those with a control from those without.

Apparently this had been shown by Fabio Costa some years ago, but he kept the result a secret and got scooped.

Posted in Uncategorised | 8 Comments

Boris Tsirelson 1950-2020

Boris Tsirelson died on the 21st of January 2020 in Switzerland, via assisted suicide after being diagnosed with high grade cancer. It is with great sadness that I type these news. I never met him personally, but I appreciate his work, and had several pleasant interactions with him online. As an undergrad student I asked him some inane questions about quantum correlations, that he patiently and politely answered. I also brushed with him on Wikipedia, where he was an avid contributor.

He was a famous mathematician, but his work in physics was not always recognized: the groundbreaking paper where he proved Tsirelson’ bound and started the characterization of the set of quantum correlations was published in 1980, but only started to get noticed by the time of the fall of the Soviet Union. Discouraged by the lack of interest, he decided to quit the field, and upon doing so made what he called his “scandalous failure”: asserted without proof that the set of quantum correlations generated by tensor-product algebras is equal to the set of quantum correlations generated by commuting algebras. Today this is known as Tsirelson’s problem, since asking the right question is more important than getting the right answer. The story was told much better by Tsirelson himself in a contribution to the IQOQI blog.

The solution of Tsirelson’s problem a mere week before his death was serendipitous. He was fascinated by the result, and concluded: “Envy me, I am very lucky. I am comfortably leaving during a peak of fame.”

Posted in Uncategorised | 2 Comments

Infinity in Nature?

Last week a bomb landed on the arXiv: Ji et al. posted a proof that MIP*=RE which implies, among other goodies, a solution to Tsirelson’s problem. I’m speaking here as if their proof actually holds; I can’t check that myself, as the paper is 165 pages of serious complexity theory, and I’m not a computer scientist. What I could understand made sense, though, and the authors have a good track record of actually proving what they claim, so I’ll just be positive and assume that MIP* is in fact equal to RE.

My first reaction to the result was ill-informed and overly dramatic (which of course a journalist took from my comment on Scott Aaronson’s blog to the Nature News article about it), but a few days after that, I still find the result disturbing, and perhaps you should too.

To explain why, first I need to explain what Tsirelson’s problem is: it asks whether the supremum of the violation of a Bell inequality we can achieve with tensor product strategies, that is, with observables of the form $A_i = A_i’\otimes \mathbb{I}$ and $B_j = \mathbb{I} \otimes B_j’$, is the same as the one we can achieve with commuting strategies, where we only require that $[A_i,B_j] = 0$ for all $i,j$.

It is easy to show that these two values must coincide for finite-dimensional systems, and furthermore that the tensor product value can be approximated arbitrarily well with finite-dimensional systems, which opens up a terrifying possibility: if there exists a Bell inequality for which these two values differ, it would mean that the commuting value can only be achieved by infinite-dimensional systems, and that it can’t even be approximated by finite-dimensional ones! It would make it possible for an experiment to exists that would prove the existence of literal, unapproachable infinity in Nature. Well, Ji et al. proved that there exists a Bell inequality for which the tensor product value is at most $1/2$, whereas the commuting value is $1$. Do you feel the drama now?

Now for the soothing part: Ji et al. cannot show what this Bell inequality is, they cannot even say how many inputs and outputs it would need and, more importantly, they cannot show what the commuting strategy achieving the value $1$ is. It turns out that even showing a purely mathematical commuting strategy that can do this is really hard.

We need more than that, though, if we’re talking about an actual experiment to demonstrate infinity: we need the commuting strategy to be physical. That’s the part I was ill-informed about in my initial reaction: I thought it was natural for QFTs to only commute across spacelike separated regions, and not be separated by a tensor product, but this is not the case: not a single example is known of a non-pathological QFT that is not separated by a tensor product, at least when considering bounded spacetime regions that are spacelike separated and are more than some distance apart. Even that wouldn’t be enough, as a QFT that only commutes might be residually finite, which would mean that it can only achieve the tensor product value in a Bell inequality.

So, I’m not shitting bricks anymore, but I’m still disturbed. With this result the infinity experiment went from being mathematically impossible to merely phisically impossible.

UPDATE: Perhaps it would be useful to show their example of a nonlocal game that has a commuting Tsirelson bound larger than the tensor product Tsirelson bound. For that we will need the key construction in the paper, a nonlocal game $G_\mathcal{M}$ that has a tensor product Tsirelson bound of 1 if the Turing machine $\mathcal{M}$ halts, and a tensor product Tsirelson bound at most $1/2$ if the Turing machine $\mathcal{M}$ never halts.

Consider now $\mathcal{M}(G’)$ to be the Turing machine that computes the NPA hierarchy for the nonlocal game $G’$, and halts if at some level the NPA hierarchy gives an upper bound strictly less than 1 for the commuting Tsirelson bound.

The nonlocal game $G_{\mathcal{M}(G’)}$ will then have a tensor product Tsirelson bound of 1 if the nonlocal game $G’$ has a commuting Tsirelson bound strictly less than 1, and $G_{\mathcal{M}(G’)}$ will have a tensor product Tsirelson bound at most $1/2$ if the nonlocal game $G’$ has a commuting Tsirelson bound equal to 1 (as in this case the Turing machine $\mathcal{M}(G’)$ never halts).

The nonlocal game we need will then be the fixed point $G^*$ such that $G_{\mathcal{M}(G*)} = G^*$. It cannot have a commuting Tsirelson bound strictly less than 1, because then it would need to have a tensor product Tsirelson bound equal to 1, a contradiction. Therefore it must have a commuting Tsirelson bound equal to 1, which also implies that it must have a tensor product Tsirelson bound at most 1/2.

Posted in Uncategorised | 6 Comments

scientists4future #unter1000

I just signed the #unter1000 pledge to not fly to destinations that are under 1,000 km away, and I encourage every reader to sign as well2. We, scientists, need to set an example. We understand better than most the dire situation we are in, and we have more time and money than most to do something about it.

It’s a mild inconvenience for us: instead of flying from Vienna to Cologne in 3 to 5 hours2, we’ll need to spend the whole day (or night) on the train, sacrificing one day of work or of weekend. Yet, our work has flexible hours, and we can be productive inside the train. Unlike most. If we can’t do even this small sacrifice, what hope is there in the fight against climate change?

I don’t want to spread the misconception that individual action is the most effective way to fight climate change. It’s not. The most effective action is to vote the dinosaurs out of office. The government can make the biggest impact by cleaning up the electric grid and banning fossil cars. The second most effective action, though, is collective action, like this one. Don’t just do it yourself, but do it, tell everyone, and tell them to do it as well. The wider #flugscham campaign is having an effect. Domestic flights fell by 12% year-on-year last November in Germany, and by 14% in Sweden. Losses of this scale set a fire under the airlines’ asses to invest in technology to make flying carbon neutral. That’s the goal, because we don’t want to go back to the stone age. But while airplanes still burn fossil fuels, I won’t do it under 1,000 km.

Posted in Uncategorised | Comments Off on scientists4future #unter1000

Quantum supremacy won me a bet

About three years ago, on 21.03.2016, I made a bet with Miguel Navascués. At the time superconducting qubits were in their infancy: devices based on them had very few qubits, and quite shitty ones at that. Nevertheless, I was sure that this was the architecture of the future, as it seemed easy to scale up: there was no obvious problem with adding more qubits to a chip. Moreover, they consisted essentially of electronic circuits, which we know very well how to manufacture. Finally, Google and IBM were pouring serious amounts of money on it.

Miguel was skeptic. He had heard the same story before about ion traps, and instead of flourishing the architecture stagnated. People managed to get a dozen or so excellent qubits in a trap, but that was it. You can’t really add more without messing up the vibrational modes or squeezing the energy levels too close together. There were proposals about how to go around these problems, like linking multiple traps using photons and entanglement swapping, but they didn’t really do the trick. It will be the same with superconducting qubits, he said.

So, we made a bet. If in five years, by 21.03.2021, anybody in the world managed to achieve universal quantum control over 30 physical (not logical) superconducting qubits, Miguel would pay me the burger equivalent of 10 Big Macs in Vienna3. Otherwise, I would pay him.

As the years went by, it became clear that things were going well for superconducting qubits, but we still lacked a dramatic publication that demonstrated the capability (and I wasn’t in Vienna anyway). Until this October, when Google published its demonstration of quantum supremacy with 53 qubits. I wrote then Miguel, who graciously conceded the bet (we didn’t need to use our arbiter, Flaminia Giacomini). Photographic evidence indicates that he wasn’t happy about it, though.

First order photographer: Adán Cabello. Second order photographer: Zizhu Wang.

Posted in Uncategorised | 2 Comments

Superdeterminism is unscientific

Yesterday I saw with disappointment a new paper on the arXiv by Hossenfelder and Palmer, Rethinking Superdeterminism. There they argue that physics took a wrong turn when we immediately dismissed superdeterminism; instead it is a solution to the conundrum of nonlocality and the measurement problem.

No. It’s not. It’s a completely sterile idea. I’ll show why, by fleshing out the calculations of the smoking and cancer example they quote in the paper, and then examining the case of the Bell test.

Let’s suppose you do the appropriate randomized trial, and measure the conditional probabilities2
\[ p(\text{cancer}|\text{smoke}) = 0.15\quad\text{and}\quad p(\text{cancer}|\neg\text{smoke}) = 0.01,\]a pretty damning result. A tobacco company objects to the conclusion, saying that the genome of the subjects was correlated with whether you forced them to smoke2, such that you put more people predisposed to have cancer in the smoking group.

It works like this: the law of total probability says that
\[ p(a|x) = \sum_\lambda p(\lambda|x)p(a|x,\lambda),\] where in our case $a \in \{\text{cancer},\neg\text{cancer}\}$, $x \in \{\text{smoke},\neg\text{smoke}\}$, and $\lambda \in \{\text{predisposed},\neg\text{predisposed}\}$ is the hidden variable, in this case the genome determining whether the person will have cancer anyway. The tobacco company says that your results are explained by the conspiracy $p(\text{predisposed}|\text{smoke}) = 0.15$ and $p(\text{predisposed}|\neg\text{smoke}) = 0$, from which we can calculate the actual cancer rates to be
p(\text{cancer}|\text{smoke},\neg\text{predisposed}) = 0 \\
p(\text{cancer}|\neg\text{smoke},\neg\text{predisposed}) = 0.01,
\end{gather*}so the same data indicates that smoking prevents cancer! If you assume, though, that $p(\text{predisposed}|\text{smoke}) = p(\text{predisposed}|\neg\text{smoke})$, then the absurd conclusion is impossible.

With this example I want to illustrate two points: first, that assuming $p(\lambda|x) \neq p(\lambda)$ is just a generic excuse to dismiss any experimental result that you find inconvenient, be it that smoking causes cancer or that Bell inequalities are violated. Second, that without assuming $p(\lambda|x) = p(\lambda)$ 3 you can’t conclude anything from your data.

In their paper, Hossenfelder and Palmer dismiss this example as merely classical reasoning that is not applicable to quantum mechanics. It’s not. One can always use the law of total probability to introduce a hidden variable to explain away any correlation, whether it was observed in classical or quantum contexts. Moreover, they claim that while $p(\lambda|x) = p(\lambda)$ is plausible in classical contexts, it shouldn’t be assumed in quantum contexts. This is laughable. I find it perfectly conceivable that tobacco companies would engage in conspiracies to fake results related to smoking and cancer, but to think that Nature would engage in a conspiracy to fake the results of Bell tests? Come on.

They also propose an experiment to test their superdeterministic idea. It is nonsense, as any experiment about correlations is without the assumption that $p(\lambda|x) = p(\lambda)$. Of course, they are aware of this, and they assume that $p(\lambda|x) = p(\lambda)$ would hold for their experiment, just not for Bell tests. Superdeterminism for thee, not for me. They say that when $x$ is a measurement setting, changing it will necessarily cause a large change in the state $\lambda$, but if you don’t change the setting, the state $\lambda$ will not change much. Well, but what is a measurement setting? That’s human category, not a fundamental one. I can just as well say that the time the experiment is made is the setting, and therefore repetitions of the experiment done at different times will probe different states $\lambda$, and again you can’t conclude anything about it.

Funnily, they say that “…one should make measurements on states prepared as identically as possible with devices as small and cool as possible in time-increments as small as possible.” Well, doesn’t this sound like a very common sort of experiment? Shouldn’t we have observed deviations from the Born rule a long time ago then?

Let’s turn to how superdeterministic models dismiss violations of Bell inequalities. They respect determinism and no action at a distance, but violate no conspiracy, as I define here. The probabilities can then be decomposed as
\[ p(ab|xy) = \sum_\lambda p(\lambda|xy)p(a|x,\lambda)p(b|y,\lambda),\]and the dependence of the distribution of $\lambda$ on the settings $x,y$ is used to violate the Bell bound. Unfortunately Hossenfelder and Palmer4 do not specify $p(\lambda|xy)$, so I have to make something up. It is trivial to reproduce the quantum correlations if we let $\lambda$ be a two-bit vector, $\lambda \in \{(0,0),(0,1),(1,0),(1,1)\}$, and postulate that it is distributed as
\[p((a,b)|xy) = p^Q(ab|xy),\] where $p^Q(ab|xy)$ is the correlation predicted by quantum mechanics for the specific experiment, and the functions $p(a|x,\lambda)$ and $p(b|y,\lambda)$ are given by
\[p(a|x,(a’,b’)) = \delta_{a,a’}\quad\text{and}\quad p(b|y,(a’,b’)) = \delta_{b,b’}.\] For example, if $p^Q(ab|xy)$ is the correlation maximally violating the CHSH inequality, we would need $\lambda$ to be distributed as
\[ p((a,b)|xy) = \frac14\left(1+\frac1{\sqrt2}\right)\delta_{a\oplus b,xy}+\frac14\left(1-\frac1{\sqrt2}\right)\delta_{a\oplus b,\neg(xy)}.\]The question is, why? In the quantum mechanical case, this is explained by the quantum state being used, the dynamical laws, the observable being measured, and the Born rule. In the superdeterministic theory, what? I have never seen this distribution be even mentioned, let alone justified.

More importantly, why should this distribution be such that the superdeterministic correlations reproduce the quantum ones? For example, why couldn’t $\lambda$ be distributed like
\[ p((a,b)|xy) = \frac12\delta_{a\oplus b,xy},\] violating the Tsirelson bound?5 Even worse, why should the superdeterministic distributions respect even no-signalling? What stops $\lambda$ being distributed like
\[ p((a,b)|xy) = \delta_{a,y}\delta_{b,x}?\]

In their paper, Hossenfelder and Palmer define a superdeterministic theory as a local, deterministic, reductionist theory that reproduces quantum mechanics approximately. I’m certain that such a theory will never exist. Its dynamical equations would need to correlate 97,347,490 human choices with the states of atoms and photons in 12 laboratories around the planet to reproduce the results of the BIG Bell test. Its dynamical equations would need to correlate the frequency of photons emitted by other stars in the Milky Way with the states of photons emitted by a laser in Vienna to reproduce the results of the Cosmic Bell test. Its dynamical equations would need to correlate the bits of a file of the movie “Monty Python and the Holy Grail” with the state of photons emitted by a laser in Boulder to reproduce the results of the NIST loophole-free Bell test. It cannot be done.

Posted in Uncategorised | 63 Comments

Nerd sniping

Last week a friend of mine, Felipe, left written the following equation in the blackboard of the coffee room: \[ \frac{a}{b+c} + \frac{b}{c+a} + \frac{c}{a+b} = 4,\]asking for a solution with positive integers.

It was a harmless sort of nerd sniping (as opposed to the harmful one), that he saw as a meme online. It looks innocent, but it is a rather difficult problem6. The idea was to post it to family WhatsApp groups, as revenge for the retarded math problems with fruits as variables that come up regularly. Finding the idea hilarious, I went on and posted it to my family’s WhatsApp group.

But… how does one solve it? I had inadvertently sniped myself. Trying a little bit, I managed to show that there is no integer solution for $a=b$, and therefore one couldn’t reduce the problem from a third-degree polynomial to a second degree one, so there was no easy way out. I also realised that any integer multiple of a solution is a valid solution, so one can fix $c=1$, look for a solution for rational $a,b$, and multiply this solution by the least common denominator to solve the original problem. Fixing then $c=1$ and rewriting $a,b$ as $x,y$ for clarity, we end up with the polynomial equation
\[ x^3-3x^2-3x^2y-3x + y^3-3y^2-3y^2x-3y-5xy + 1 = 0,\]which looks pretty much hopeless. To get some intuition I plotted the curve, obtaining this:

Well great. Now I was stuck. Off to Google then. Which immediately gave me this answer, a thorough explanation of where the problem came from and how to solve it. End of story? Not really. The answer depended on two magical steps2 that were neither easy nor explained. No no, I want to actually find the solution, not just reproduce it using stuff I don’t understand.

As the mathematician helpfully explains there, the core idea is the centuries-old chord and tangent technique: if one draws a line through two rational points on the curve, or on a tangent point, this line will intersect the curve again on a rational point. So if we have a single rational solution to start with, we can just iterate away and produce new rational points.

Well, this I can understand and do myself, off to work then! I did everything on the programming language that’s all the rage with the kids nowadays, Julia3 on a Jupyter notebook, which you can download here. I wrote up this solution having in mind people like me, who don’t really know anything, but have a generally favourable attitude towards math.

The first step is to brute force search for a rational solution. That’s easy enough, but gives us mostly repetitions of the points $(-1,-1)$, $(1,-1)$, and $(-1,1)$. They make some denominators of the original equation evaluate to zero, and by looking at the graph it is obvious that the point $(-1,-1)$ doesn’t work4, so I excluded them and selected a nice-looking solution as the starting point:
\[P_1 = \left(-\frac5{11},\frac9{11}\right)\]Now, I needed to calculate the slope of the curve at this point. This lies firmly on the straightforward-but-tedious territory5, so I won’t bore you with the details. The result is that
\[ \frac{\mathrm{d}y}{\mathrm{d}x} = \frac{-3x^2+6x+6xy+3y^2+5y+3}{-3x^2-5x-6xy+3y^2-6y-3},\]and with the slope we can easily find the tangent line. Now finding the intersection of this line with the curve is even more straightforward and tedious6, so I’m not going to even type out the answer (for the details check the Wikipedia page linked above). Anyway, now we have the intersection point
\[I = \left(\frac{5165}{9499},-\frac{8784}{9499}\right),\]but we have a problem: if we draw the line between $P_1$ and $I$ we won’t get a new point, as this line intersects the curve only at $P_1$ (twice) and $I$. We could just apply the same procedure that we did to $P_1$: calculate the tangent at $I$ and with that get a new intersection point. This works, but the integers in the numerator and denominator of the numbers we get by iterating this procedure grow hideously fast7, and the computer dies before returning a solution with only positive rationals.

There is an easy way out, though: note that the equation is symmetric under exchange of $x$ and $y$, so we can simply take the new point as
\[P_2 = \left(-\frac{8784}{9499},\frac{5165}{9499}\right),\]and the line through $P_1$ and $P_2$ does intersect the curve at a new point. Taking $P_3$ to be again its flipped version
\[P_3 = \left(-\frac{396650011}{137430135},-\frac{934668779}{137430135}\right),\]we can go on.

A new question does arise. Do we take $P_4$ from the line through $P_2$ and $P_3$, or from the line through $P_1$ and $P_3$? It turns out that both methods work, but the number of digits grows much faster with the former method than with the latter8, so it is unwise to use the former method in general. We iterate away, and find the solution with $P_{13}$, with the resulting integers having more than 150 digits.

Posted in Uncategorised | Comments Off on Nerd sniping

State estimation is dead. Long live purification

I have just been to Perimeter Institute, by generous invitation of Thomas Galley. I gave a talk there about my recent-ish paper, Probability in two deterministic universes. Since I have already blogged about it here, I’m not writing about it again, but rather what I discussed with Thomas about his derivations of the Born rule.

I was interested in his most recent derivation, that besides structural assumptions about measurements and probabilities, needs two substantial assumptions: no-signalling and the possibility of state estimation, or state estimation for brevity. No-signalling is well-motivated and well-understood, but I was curious about state estimation. What does it mean? How does a theory that violates it looks like?

The precise definition is that state estimation is true if there is a finite set of measurement outcomes9 whose probabilities completely determine the quantum state. Or conversely, if state estimation fails, then for any finite set of measurement outcomes there are two different quantum states that give the same probabilities for all these outcomes. This is clearly not obeyed by quantum mechanics in the case of infinite-dimensional systems – you need to know the probability at each point in space to completely determine the wavefunction, which is an infinite set of outcomes2 – so the authors require it only for finite-dimensional systems.

How bad is it to violate it for finite-dimensional systems, then? What can you learn about a quantum state with a small number of measurement outcomes? Do you get a good approximation, or would you have little idea about what the quantum state is? It seems that the former is the case. To illustrate that, we came up with a rather artificial theory where the measurements allow you to deterministically read off bits from some representation of the quantum state; for the case of a qubit $\ket{\psi}=\cos\theta\ket{0}+e^{i\varphi}\sin\theta\ket{1}$ a measurement would tell you the $n$th bit of $\theta$ or $\varphi$. It is clear that this theory violates state estimation: for any finite set of measurements there will be a largest $n$ that they can reach, and therefore any pair of quantum states that differ on bits higher than $n$ will be indistinguishable for this set of measurements. It is also clear that this violation is nothing to worry about: with only $2n$ measurements we can get a $n$-bit approximation for any qubit, which is much better than what can be done in reality! In reality we need about $2^n$ measurements to estimate the probabilities, and therefore the amplitudes, with such an accuracy.

This already tells us that state estimation is too strong; it needs at least to be qualified somehow in order to exclude the deterministic theory above. What does it mean in probabilistic theories, though? An often considered toy theory is one where the structure of quantum mechanics is kept as it is, but the exponent in the Born rule is changed from $2$ to some $n$. More precisely, let the probability of obtaining outcome $i$ when measuring the state $\psi$ in the orthogonal basis $\{\ket{i}\}$ be \[ p(i|\psi) = \frac{|\langle i|\psi\rangle|^n}{\sum_{i’}|\langle {i’}|\psi\rangle|^n}. \]An interesting feature of this theory is that a finite set of measurement outcomes can distinguish all pure states (in fact the same measurements that distinguishes them in quantum theory), so state estimation can only fail here for mixed states.

A nice example is the pair of ensembles
\[\omega_A = \{(p,\ket{0}),(1-p,\ket{1})\}\] and \[\omega_B = \{(1/2,p^\frac1n\ket{0}+(1-p)^\frac1n\ket{1}),(1/2,p^\frac1n\ket{0}-(1-p)^\frac1n\ket{1})\}.\] In quantum mechanics ($n=2$) they are equivalent, both being represented by the density matrix
\[ \rho = \begin{pmatrix} p & 0 \\ 0 & 1-p \end{pmatrix}. \] If $n\neq 2$, though, they are not equivalent anymore, even though they give the same probabilities for any measurements in the X, Y, and Z basis3. To distinguish them we just need to measure the ensembles in the basis \[\{p^\frac1n\ket{0}+(1-p)^\frac1n\ket{1},(1-p)^\frac1n\ket{0}-p^\frac1n\ket{1}\}.\] The probability of obtaining the first outcome for ensemble $\omega_A$ is $p^2 +(1-p)^2$, and for ensemble $\omega_B$ it is some complicated expression that depends on $n$.

Now this is by no means a proof4, but it makes me suspect that it will be rather easy to distinguish any two ensembles that are not equivalent, by making a measurement that contains one of the pure states that was mixed in to make the ensemble. Then if we divide the Bloch sphere in a number of regions, assigning a measurement to cover each such region, we do that with a good enough approximation. Unlike the deterministic theory explored above, in this toy theory it is clearly more laborious to do state estimation than in quantum mechanics, but is still firmly within the realm of possibility.

What now, then? If the possibility of state estimation is not a good assumption from which to derive the Born rule, is there a derivation in this operational framework that follows from better assumptions? It turns out that Galley himself has such a derivation, based only on similar structural assumptions together with no-signalling and purification, with no need for state estimation. But rather ironically, here the roles flip: while I find purification an excellent axiom to use, Galley is not a fan.

Let me elaborate. Purification is the assumption that every mixed state (like the ensembles above) is obtained by ignoring part of a pure state. It implies then that there are no “external” probabilities in the theory; if you want to flip a coin in order to mix two pure states, you better model that coin inside the theory, and as a pure state. Now Galley doesn’t find purification so nice: for once, because classical theories fail purification, and also because it feels like postulating that your theory is universal, which is a big step to take, in particular when the theory in question is quantum mechanics.

Well, I find that classical theories failing purification is just one more example in a huge pile of examples of how classical theories are wrong. In this particular case they are wrong by being essentially deterministic, and only allowing for probabilities when they are put in by hand. About postulating the universality of the theory, indeed that is a big assumption, but so what? I don’t think good assumptions need to be self-evidently true, I just think they should be well-motivated and physically meaningful.

Addendum: A natural question to ask is whether both no-signalling and purification are necessary in such a derivation. It turns out the answer is yes: the toy theory where the exponent in the Born rule is $n$ respects purification, when extended in the obvious way for composite systems, but violates no-signalling, and Galley’s rule respects no-signalling but violates purification.

Posted in Uncategorised | 2 Comments

How to manipulate numbers and get any result you want

This post hast little to do with physics, let alone quantum mechanics; I’m just writing it because I saw reports in the media about a study done by three German professors that had the incredible conclusion that electric vehicles emit more CO$_2$ than diesel vehicles. I’ll not focus in debunking this “study”, as it has already been thoroughly debunked in the newspaper articles that I’ve linked, but rather I’ll explain how the calculation is done, and how one would go about manipulating it to get the result you want. They are simple mistakes, that would have been caught by even cursory peer-review, so maybe the lesson here is that non-peer-reviewed “studies” like this one are better ignored altogether.

So, we are interested in the total amount of CO$_2$ that a vehicle emits over its lifetime. It is the emissions caused by producing it in the first place, $P$, plus the amount of CO$_2$ it emits per km $\eta$ times the distance $L$ it travels over its lifetime: $P + \eta L$. To get a number that is easier to relate with, we divide everything by $L$ and get the effective emissions per km $\frac{P}{L}+\eta$. We want to compare a diesel vehicle with an electric vehicle, so we want to know whether
\[\frac{P_E}{L}+\eta_E\quad\text{or}\quad\frac{P_D}{L}+\eta_D\] is bigger.

Assume that $P_E > P_D$, because of the extra emissions needed to produced the battery of the electric vehicle, and that $\eta_E < \eta_D$, as it is much more efficient to extract energy from oil in a big power plant than in an internal combustion engine 5.

Now, what could you do to make the electric vehicles look bad? Well, since their production causes more emissions, you want to emphasise that in the equation, and since they emit less CO$_2$ when running, you want to downplay that. How? We have three variables, so we have three ways of manipulating the numbers: we can multiply $P_E$ and $P_D$ by some large number $n_P$ (e.g. by assuming that the factories producing the cars are powered purely by oil shale), we can divide $\eta_E$ and $\eta_D$ by some large number $n_\eta$ (e.g. by assuming the cars are ran always at maximal efficiency), and we can divide $L$ by some large number $n_L$ (assuming that car is scrapped after a few kilometres).

What is the effect of doing that? If the real numbers say that electric vehicles are better, that is, that
\[\frac{P_E}{L}+\eta_E < \frac{P_D}{L}+\eta_D,\]which is equivalent to
\[ \frac{P_E-P_D}{L(\eta_D-\eta_E)} < 1,\]then the manipulations of the previous paragraph imply in multiplying the left hand side of this inequality by $n_Pn_Ln_\eta$; if we want to flip it we just need to make $n_P,n_L,$ and $n_\eta$ large enough so that \[ n_Pn_Ln_\eta\frac{P_E-P_D}{L(\eta_D-\eta_E)} > 1.\]

And what the authors of the study did? All of the above. Most interestingly, they used the NEDC driving cycle to calculate $\eta_D$ and $\eta_E$, a ridiculously efficient driving cycle that has been discarded in favour of the less unrealistic WLTC. They did this because WLTC numbers hadn’t yet been released for the Tesla Model 3, the electric car they used for the comparison. They claim that this is not a problem, because NEDC favours city driving, where electric cars excel, so if anything this incorrect assumption would be tilting the scales in favour of electric cars. As we have seen, though, this is not the case: pretending that the cars are more efficient than they are tilts the scales in favour of the diesels.

Another mistake the authors made is to assume that cars only last 10 years or 150.000 km before going to the junkyard, which is about half of the actual number. Again this tilts the scales if favour of the diesels, as the production of electric cars causes more emissions. The reason they made this mistake is because they assumed that the battery of an electric car would only last this much, which is false for two reasons: first because a Tesla battery retains more than 90% of its capacity after 250,000 km, hardly junkyard material, and second because batteries that have in fact degraded too much to be useful in a car, say retaining only 70% of their capacity, do not go to the junkyard, but instead are reused for applications where the energy/weight ratio doesn’t matter, like grid storage.

The third mistake the authors made is exaggerating the emissions caused by production, using a discredited study that claimed that producing the lithium-ion battery causes 145 kg CO$_2$/kWh. The peer-reviewed number I could find is 97 kg CO$_2$/kWh for production in China. Even that seems too high, though, as Tesla’s batteries are produced in the Gigafactory 1 in Nevada, which has a cleaner energy mix, and should eventually be powered fully by rooftop solar. One thing that might look like a mistake but isn’t is that the authors don’t consider the emissions caused by producing the components that are common to both electrics and diesels2: wheels, body, seats, etc. Of course, ignoring that means that the number you get is not effective emissions per km, but it doesn’t change which car is the best, as that depends only on the difference $P_E-P_D$.

With the theory done, let’s get to the numbers. The authors use $P_E = 10,875,000$ gCO$_2$, $\eta_E = 83$ gCO$_2$/km, $P_D = 0$ gCO$_2$, $\eta_D = 144$ gCO$_2$/km, and $L=150,000$ km, which results in the effective emissions
\[ E = 155\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 144\text{ gCO}_2/\text{km},\]their absurd conclusion that electric vehicles emit more CO$_2$. Now what I find amazing is that this conclusion requires all three mistakes working together; correct any of the three and it flips.

First we correct $\eta_E$ and $\eta_D$ using the WLTC numbers (which are still too optimistic, but are the best I’ve got), which are already available for both the Model 3 (16 kWh/100 km) and the Mercedes C 220 d (5.1 l/100 km)3, resulting in $\eta_E = 88$ gCO$_2$/km and $\eta_D = 163$ gCO$_2$/km, and the effective emissions
\[ E = 160\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 163\text{ gCO}_2/\text{km}.\] Next we keep the wrong $\eta_E$ and $\eta_D$ and just correct $L$, setting it to $250,000$ km, resulting in the effective emissions
\[ E = 126\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 144\text{ gCO}_2/\text{km}.\] Next we keep the wrong $\eta_E, \eta_D$, and $L$, correcting only the emissions caused by the production of the battery. Putting 97 kg CO$_2$/kWh results in $P_E = 7,275,000$ gCO$_2$ and effective emissions
\[ E = 131\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 144\text{ gCO}_2/\text{km}.\] To finalize, let’s calculate the true numbers, correcting all three mistakes at once and also taking into account the emissions caused by producing the parts common in both vehicles. I couldn’t find a good number for that, just some estimates that put it around 20 tons of CO$_2$. Using this results in $P_E = 27,275,000$ gCO$_2$, $\eta_E = 88$ gCO$_2$/km, $P_D = 20,000,000$ gCO$_2$, $\eta_D = 163$ gCO$_2$/km, and $L=250,000$ km, and effective emissions \[ E = 197\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 243\text{ gCO}_2/\text{km}.\]

It doesn’t look very impressive, though. Only 19% less emissions? Is all the trouble worth it? The point is that none of the emissions of electric vehicles are necessary: as the grid cleans up both their production and operation will be CO$_2$-free. Diesels, though, will always burn diesel, so at best they will cause only the tailpipe emissions4, and the ultimate numbers will be \[ E = 0\text{ gCO}_2/\text{km}\quad\text{and}\quad D = 135\text{ gCO}_2/\text{km}.\] There is no need to wait, though: electric vehicles are better for the environment than diesels. Not in the future, not depending on magical technologies, not in Norway, but right here, and right now. And this is only about CO$_2$ emissions; electric vehicles also have the undeniable benefit of not poisoning the atmosphere in densely populated cities.

Posted in Uncategorised | 10 Comments

The many-worlds interpretation of objective probability

Philosophers really like problems. The more disturbing and confusing the better. If there’s one criticism you cannot levy at them is that they are not willing to tackle the difficult issues. I have argued with philosophers endlessly about Bell’s theorem, the Trolley problem, Newcomb’s paradox, Searle’s Chinese room, the sleeping beauty problem, etc. Which made me very surprised when I asked a couple of philosophers about objective probability, and found them strangely coy about it. The argument went along the lines of “objective probability is frequentism, frequentism is nonsense, subjective probability makes perfect sense, there’s only subjective probability”.

Which is really bizarre argument. Yes, frequentism is nonsense, and yes, subjective probability makes perfect sense. But that’s all that is true about it. No, objective probability is not the same thing as frequentism, and no, subjective probability is not the only probability that exists. Come on, that’s denying the premise! The question is interesting precisely because we strongly believe that objective probability exists; either because of quantum mechanics, or more directly from the observation of radioactive decay. Does anybody seriously believe that whether some atom decays or not depends on the opinion of an agent? There even existed natural nuclear reactors, where chain reactions occurred much before any agent existed to wonder about them.

In any case, it seems that philosophers won’t do anything about it. What can we say about objective probability, though? It is easy to come up with some desiderata: it should to be objective, to start with. The probability of some radioactive atom decaying should just be a property of the atom, not a property of some agent betting about it. Agents and bets are still important, though, as it should make sense to bet according to the objective probabilities. In other words, Lewis’ Principal Principle should hold: rational agents should set their subjective probabilities to be equal to the objective probabilities, if the latter are known5. Last but not least, objective probabilities should be connected to relative frequencies via the law of large numbers, that is, we need that
\[ \text{Pr}(|f_N-p|\ge\varepsilon) \le 2e^{-2N\varepsilon^2}, \] or, in words, the (multi-trial) probability that the frequency deviates more than $\varepsilon$ from the (single-trial) probability after $N$ trials goes down exponentially with $\varepsilon$ and $N$ 2.

I think it is also easy to come up with a definition of objective probability that fulfills these desiderata, if we model objectively random processes as deterministic branching processes. Let’s say we are interested the decay of an atom. Instead of saying that it either decays or not, we say that the world branches in several new worlds, in some of which the atom decays, and some of which it does not. Moreover, we say that we can somehow count the worlds, that is, that we can attribute a measure $\mu(E)$ to the set of worlds where event $E$ happens and a measure $\mu(\neg E)$ to the set of worlds where event $\neg E$ happens. Then we say that the objective probability of $E$ is
\[p(E) = \frac{\mu(E)}{\mu(E)+\mu(\neg E)}.\] Now, before you shut off saying that this is nonsense, because the Many-Worlds interpretation is false, so we shouldn’t consider branching, let me introduce a toy theory where this deterministic branching is literally true by fiat. In this way we can separate the question of whether the Many-Worlds interpretation is true from the question of whether deterministic branching explains objective probability.

This toy theory was introduced by Adrian Kent to argue that probability makes no sense in the Many-Worlds interpretation. Well, I think it is a great illustration of how probability actually makes perfect sense. It goes like this: the universe is a deterministic computer simulation3 where some agents live. In this universe there is a wall with two lamps, and below each a display that shows a non-negative integer. This wall also has a “play” button, that when pressed makes either of the lamps light up.

Kent's universe

The agents there can’t really predict which lamp will light up, but they have learned two things about how the wall works. The first is that if the number below a lamp is zero, that lamp never lights up. The second is that if the numbers are set to $n_L$ and $n_R$, respectively, and they press “play” multiple times, the fraction of times where the left lamp lights up is often close to $n_L/(n_L+n_R)$.

What is going on, of course, is that when “play” is pressed the whole computer simulation is deleted and $n_L+n_R$ new ones are initiated, $n_L$ with the left lamp lit, and $n_R$ with the right lamp lit. My proposal is to define the objective probability of some event as the proportion of simulations where this event happens, as this quantity fulfills all our desiderata for objective probability.

This clearly fulfills the “objectivity” desideratum, as a proportion of simulations is a property of the world, not some agent’s opinion. It also respects the “law of large numbers” desideratum. To see that, fist notice that for a single trial the proportion of simulations where the left lamp lights up is
\[p(L) = \frac{n_L}{n_L+n_R}.\] Now the number of simulations where the left lamp lights up $k$ times out of $N$ trials is
\[ {N \choose k}n_L^kn_R^{N-k},\] so if we divide by total number of simulations $(n_L+n_R)^N$, we see that the proportion of simulations where the left lamp lit $k$ times out of $N$ is given by \[\text{Pr}(N,k) = {N \choose k}p(L)^k(1-p(L))^{N-k}.\]Since this is formally identical to the binomial distribution, it allows us to prove a theorem formally identical to the law of large numbers:
\[ \text{Pr}(|k/N-p(L)|\ge\varepsilon) \le 2e^{-2N\varepsilon^2}, \]which says that the (multi-trial) proportion of simulations where the frequency deviates more than $\varepsilon$ from the (single-trial) proportion of simulations after $N$ trials goes down exponentially with $\varepsilon$ and $N$.

Last but not least, to see that if fulfills the “Principal Principle” desideratum, we need to use the decision-theoretic definition of subjective probability: the subjective probability $s(L)$ of an event $L$ is the highest price a rational agent should pay to play a game where they receive $1$€ if event $L$ happens and nothing otherwise. In the $n_L$ simulations where the left lamp lit the agent ends up with $(1-s(L))$ euros, and in the $n_R$ simulations where the right lamp lit the agent ends up with $-s(L)$ euros. If the agent cares equally about all its future selves, they should accept to pay $s(L)$ as long as \[(1-s(L))n_L-s(L)n_R \ge 0,\]which translates to \[s(L) \le \frac{n_L}{n_L+n_R},\] so indeed the agent should bet according to the objective probability if they know $n_L$ and $n_R$4.

And this is it. Since it fulfills all our desiderata, I claim that deterministic branching does explain objective probability. Furthermore, it is the only coherent explanation I know of. It is hard to argue that nobody will ever come up with a single-world notion of objective probability that makes sense, but at least in one point such a notion will always be unsatisfactory: why would something be in principle impossible to predict? Current answers are limited to saying that quantum mechanics say so, or that if we could predict the result of a measurement we would run into trouble with Bell’s theorem. But that’s not really an explanation, it’s just saying that there is no alternative. Deterministic branching theories do offer an explanation, though: you cannot predict which outcome will happen because all will.

Now the interesting question is whether this argument applies to the actual Many-Worlds interpretation, and we can get a coherent definition of objective probability there. The short answer is that it’s complicated. The long answer is the paper I wrote about it =)

Posted in Uncategorised | 8 Comments