Floating-point arithmetic and semidefinite programming

Another day working with MATLAB, another nasty surprise. I was solving a SDP that was already working well, did a minor change, and suddenly it started taking minutes to reach a solution, when usually it took seconds.

After a long investigation, I realized that the problem was that my input was not exactly Hermitian anymore. I had switched from writing manually the components of a matrix to defining the matrix as $A = \ket{\psi}\bra{\psi}$ for a complex vector $\ket{\psi}$. Now how could that possibly be a problem? It is a very simple theorem to prove that the outer product of a vector with itself is always Hermitian. Well, not with MATLAB. I can’t figure out why, but probably MATLAB uses some smart algorithm for the outer product that ruins this property due to floating-point arithmetic. What I could determine is that if you compute the outer product the naïve way, then even with MATLAB the result will be exactly Hermitian. Also, with Python the output is exactly Hermitian.

To solve this problem, one can simply redefine $A$ as $(A+A^\dagger)/2$, which is very fast and gives an exactly Hermitian output. I don’t like doing that, though, as $(A+A^\dagger)/2$ is always exactly Hermitian, even when $A$ is not even close to Hermitian. If I fucked up the definition of $A$ this will hide my mistake, and make debugging harder. Instead, what I did was write my own outer product function, computing it as $\ket{\psi}\otimes\bra{\phi}$. It is slower than whatever black magic MATLAB is doing, sure, but it is fast enough, and leaves my mistakes easily visible.

It dawned on me that this subtlety was probably the source of many bugs and slowdowns in my codes over the years. I decided to go hunting, and find out the fascinating properties of MATLAB algebra that doesn’t preserve Hermiticity where it should. It turns out that if $A$ and $B$ are exactly Hermitian, then both $A+B$ and $A \otimes B$ are exactly Hermitian, as they should. The problem is really when you do matrix multiplication.

Which shouldn’t produce further problems, right? After all, $AB$ is in general not Hermitian, so we don’t have anything to worry about. Except, of course, that the Hilbert-Schmidt inner product $\operatorname{tr}(A^\dagger B)$ is real for Hermitian $A,B$, and this is not true in MATLAB. Argh! $\operatorname{tr}(A^\dagger B)$ appears very often in the objective of a SDP, and it really needs to be a real number, as you can’t minimize a complex function. It turns out that this is not MATLAB’s fault, it is a fundamental problem of floating-point arithmetic.

An well-known but often-forgotten fact is the floating-point addition is not associative. Let $a=10^{20}$ and $b=10^{-20}$. Then with floating point numbers we see that $b+(a-a) = b$ but $(b+a)-a = 0$. Which is the issue with the Hilbert-Schmidt inner product: we get stuff like $z+\bar{z}+w+\bar{w}+\ldots$, but not in order. $z+\bar{z}$ is of course real, and $w+\bar{w}$ as well, but $z+(\bar{z}+w)+\bar{w}$? Nope, not in floating-point arithmetic.

Here I don’t think a proper solution is possible. One can of course write $\Re[\operatorname{tr}(A^\dagger B)]$, but that will hide your mistakes when $A$ and $B$ are not in fact Hermitian. A better idea is to vectorize $A$ and $B$ to avoid matrix multiplication, writing $\Re[\langle A|B\rangle]$. This only helps with speed, though, it doesn’t touch the real problem.

Posted in Uncategorised | 6 Comments

Relativity is your friend

I’ve just finished reading Andy Weir’s new book “Project Hail Mary”. Great story, I heartily recommend it to anyone who is into hard sci-fi. There’s minor plot point that I found hilarious: an alien civilization managed to do interstellar travel before discovering relativity (it makes perfect sense in the story). The poor bastards were very confused: the trip was shorter and took much less fuel than they had planned for!

I was aware that relativity make trips shorter from the point of view of the traveller, but fuel? Never heard about relativity saving fuel as well. I just had to check it out!

First let’s loot at the time, though, as it is also rather interesting. Let’s say you want to travel to a star $L$ light-years away, and you have an extremely powerful and efficient rocket, so the only limitation you have is the acceleration your body can take. What you do is then accelerate at a constant rate $a$ up to half-way point, where you start decelerating at the same rate $a$, so that you actually stop at your destination. How long does that take? Luckily the Usenet Physics FAQ already has all the calculations, so I won’t do them again, I’ll just copy their results.

From the point of view of the Earth the number of years1 it takes is
\[t = \sqrt{L^2+4L/a}.\] You see it is always larger than $L$, which is the familiar result that you can’t travel faster than light. More interesting is that time it takes from the point of view of the traveller, the proper time:
\[\tau = \frac2a \cosh^{-1}(aL/2 +1 ).\] You’re probably not so familiar with the hyperbolic cosine, so it’s worth noting that $\cosh(x) \approx e^x/2$ for large $x$, which results in
\[\tau \approx \frac2a \log(aL+2)\] for large $L$. Yep, that’s a logarithm! It’s much, much faster from the point of view of the traveller. And those poor bastards that didn’t known relativity? They’ll calculate the trip time to be
\[t_\text{Newton} = 2 \sqrt{L/a}, \]which is indeed faster than the time from the point of view of the Earth $t$, but much slower than the time from the point of view of the traveller, which is what matters here.

For concreteness, let’s look at the time it takes to reach Tau Ceti, which is roughly 12 light-years away, at a nice comfy acceleration of one $g$. We have that $t=13.8$ years, $\tau=5.2$ years, and $t_\text{Newton} = 6.8$. Ok, not so brutal, but you’d definitely notice if you’d arrive at your destination one and a half years before schedule. This is a bit misleading, though, because this distance too small for the power of the logarithm to really kick in. If instead you go to the centre of the Milky Way, roughly $30\,000$ light-years away, you get $\tau=20$ years, $t_\text{Newton}=341$ years, and $t = 30\,002$ years. Now that’s brutal.

How about fuel, though? I don’t think we’ll ever do interstellar travel with rockets, much more plausible is a laser-on-light-sail propulsion as investigated by Breakthrough Starshot, but it’s fun to do the calculations anyway. The Newtonian calculation is an easy application of Tsiolkovsky’s equation: for each kilogram of payload, we need
\[M_\text{Newton} = \exp\left(\frac2{v_e}\sqrt{aL}\right)-1 \]kilograms of fuel. Here $v_e$ is the effective exhaust velocity, which we take to be 1, as we are using an amazing photon rocket. The relativistic rocket equation is surprisingly simple: we just replace the time in Tsiolkovsky’s equation with the proper time. The result is that we need
\[M = \exp\left(\frac2{v_e}\cosh^{-1}(aL/2 +1 )\right)-1\] kilograms of fuel for each kilogram of payload. Using the same approximation for the hyperbolic cosine as before, we get
\[ M \approx (aL+2)^{2/v_e}.\] Now this is fucking amazing. We have an exponential in the Newtonian case, but in the relativistic case it’s just a polynomial! The exponential from Tsiolkovsky gets negated by the logarithm in the proper time. Even for Tau Ceti the difference is already extremely large: $M=204$ kg, whereas $M_\text{Newton} = 1\,138$ kg! And for the centre of the Milky Way? $M \approx 10^9$ kg, and $M_\text{Newton} \approx 10^{152}$ kg. Ok, both are impossible, but the Newtonian one is much more impossible!

So don’t complain about the lightspeed barrier. Relativity actually makes life much easier for would-be interstellar travellers. A Newtonian universe would be a shit place to live.

Posted in Uncategorised | 2 Comments

Climbing a space elevator

I was arguing on the internet about the possibility of building a space elevator; someone was claiming that carbon nanotubes were going to make it possible and we’d build one in the near future. I’m very skeptical, first because we’re nowhere near the capability of building high-strength carbon nanotube bundles tens of thousands of kilometres long, and secondly because it needs to be cheap, and this absurdly long and absurdly strong cable seems anything but cheap. Furthermore, if we postulate that some magical technology will allow that to happen, I think it’s dishonest to not allow some magical technology to also help the competition. For instance, we could have fully reusable rockets which cost little other than fuel to launch; and this technology seems much more plausible.

But this is not what I’m going to talk about, space elevator cables have been discussed to death. What I’m interested in is a point that another poster brought up: how much does the energy needed to climb the space elevator costs? They were arguing that the proposed method of powering the climber, laser beaming, had an absurdly low efficiency of 0.005, and paying for that energy alone already made the space elevator uneconomical, we didn’t even need to consider the astronomical costs of constructing it in the first place.

Now wait, come on, so even if we did have the magical cable for free it still wouldn’t be worth using the space elevator? That’s going too far. If laser beaming has such a low efficiency, this just means we should use some other method for powering the climber. Delivering the power via copper wire is out of question: the wire would be extremely massive, and over such a long distance the electrical resistance would also make it uneconomical. Of course we could use a superconductor to get rid of resistance, but then we have to cool the damn thing as well, making the weight and the cost even worse. No no no, I want to find a solution for powering the climber without needing any more magical technology, the only magic I’ll allow is the space elevator cable itself.

The obvious solution is solar panels. In space they work very reliably, there’s no atmosphere to filter the sun, and even the shadow of the Earth quickly stops being a problem as you get higher2. They don’t generate much power, though, so it would take a long time to climb to geostationary orbit. How long exactly?

It turns out this is an interesting physics problem. We’re not moving with a constant speed, or a constant force, but with a constant power. I didn’t study this problem either in school or university, so I wanted to find out how to do it. The principle is easy: kinetic energy is given by
\[ E = \frac12 mv^2, \] so we differentiate that and get
\[ \frac{d}{dt}E = P = m v a,\]which results in the differential equation
\[a = \frac{P}{mv},\] that even I can solve.

Life gets hard when we put that together with the forces acting on the climber: gravitational $GMm/r^2$ and centrifugal $m\omega^2 r$ (the famous “fictitious” force that hurts as much as any “real” force). The resulting acceleration is
\[ a = \frac{P}{mv} – \frac{GM}{r^2} + \omega^2 r,\]a really nasty differential equation that I have no idea how to solve. No matter, my laptop can solve it numerically without even turning on the cooling fan, I just need the values for the parameters. Most of them are fixed: $G$ is the gravitational constant, $M$ is the mass of the Earth, and $\omega$ is its angular speed. I just need to determine $P/m$, the power per mass available from the solar panels.

Annoyingly enough, I couldn’t find solid numbers anywhere, but a good estimate can be obtained by considering the solar panels of the ISS. They produce roughly 30 kW of power (before degradation) and weight roughly 1.1 tons, giving as a power density of 27 W/kg. To compensate for degradation, they’re about to install new ones, that promise a 20% weight reduction for the same performance. Again, I couldn’t find solid numbers, but given that they are going to be installed next month, and do have a plausible method for reducing weight, I’ll believe that 20% and put a power density of 34 W/kg solidly in the “not magic” category.

The climber is not made only of solar panels, though. Let’s reserve 80% of its mass for payload, structure, and so on, and use 20% for solar panels, which lowers the power density to 6.8 W/kg. Solving the equation numerically, letting the initial position be the equatorial radius of the Earth and initial velocity 1 m/s2, we see that 82.5 days are needed to reach geostationary orbit. But wait! At geostationary orbit the gravitational and centrifugal forces balance out, so we’ll just have the motors working with no resistance, and the speed of the climber will quickly diverge! We have to check if we’re not accidentally breaking relativity. Solving numerically for speed, we see that at geostationary height it is roughly 300 m/s. Yeah, no danger of breaking relativity here. Still, that’s a little too fast to be plausible, so let’s apply the brakes at 100 m/s. This only lengthens the journey to 82.7 days, so it’s really not worth considering; there’s a bigger error anyway from not taking into account the time the climber spends in Earth’s shadow, which will make the journey a couple of days longer.3

That takes a lot of patience, but it’s really not an issue for geostationary satellites or interplanetary missions that take years anyway. It is an issue for manned missions, we really don’t want to spend 3 months loitering in space getting bathed in radiation. Can we speed it up somehow?

Well, we can send power up via a copper wire for a short distance, say 100 km. In fact we need to do this, as we can’t deploy this massive solar array inside the atmosphere. Transmission over such distances is completely routine, so I’m not going to bother calculating the specifics. Just assume that we send enough power for a constant speed of 100 m/s, and solve the differential equation numerically again with these new initial conditions. The result is 81 days. Well, yeah, we’ll leave the space elevator for cargo and send people up in rockets.

Posted in Uncategorised | 1 Comment

Being Referee #2

It has happened to all of us. You spend months refining your latest result, carefully writing it up as a well-polished manuscript. You argue with your coauthors about everything: from which version of the proof is more enlightening, to the order the ideas should be presented, and even whether $\epsilon$ or $\varepsilon$ has more typographical merit. Finally content with the manuscript, you submit it to the journal of your desires, and the long wait begins for the referee reports. When they finally arrive, there’s a short and shallow report from Referee #1 saying it’s fine, publish it, just add a citation to Refereey McRefereeFace. But then comes the long, thorny report of Referee #2. This shrivelled soul writes that your bombastic claims in the abstract are not supported by the results in the text. This arrogant asshole thinks the problem you solved is of no interest whatsoever. This heartless pedant says that your proof is nothing but a handwavy argument. This presumptuous prick argues that the whole research program was misguided to start with. And anyway it was already done a long time ago in Russia.

Well, I have a confession to make. I am Referee #2. Contrary to common misconception, I am not a sadist. I have received myself countless negative referee reports, and I know very well how much it hurts to see your work trashed like this. It breaks my heart to reject a paper, specially when I see it’s the first paper a student has ever written. Nevertheless, it is my duty. I do believe in peer-review, and I think readers should be able to trust what they read in a reputable journal. This is specially important in our age of widespread disinformation.

The most common problem a paper has is claiming too much in the abstract, what is known as overselling the result4. This is what I find most frustrating as a referee because almost always the authors know what they wrote is bullshit, they just think it’s harmless, and I should just let it slide. The argument is that anybody that reads the paper will understand what the result is actually about and disregard the abstract, which is just colourful marketing. This is true, but misses the point that almost nobody reads the paper; the vast majority of the readers will be lay people that know of the result through a press release, that credulously repeats the claims of the abstract. Great, now you’re fooling the general public that paid for the research. Even among the physicists that read the paper, most of them will not be experts, and have to take your claims at face value. So no, I will not let it slide, and I will tell the authors to cut the crap without the least bit of remorse.

When there is a technical mistake in the paper there is usually very little drama; of course it’s bad news, but the mistake was already there, and I helped find it. The authors usually just fix it and thank me in the acknowledgments. Once the authors refused to fix the mistake, the paper was rejected, and they simply resubmitted it to a different journal. Which asked me again to referee the paper. Then I got furious about wasting my time trying to help the authors, and simply copy-pasted my previous referee report.

But in the other cases? Yes, sometimes the problem you have worked on indeed interests nobody but you. I do feel sorry for the authors, but I will write that in the referee report if that’s my opinion. And when the result was in fact done a long time ago in Russia? It’s very frustrating, I know, I have also rediscovered stuff that was published in the Siberian Journal of Mathematics. Life sucks, but what can I do? We have to respect the priority of Soviet physicists. And when the research program was a terrible idea in the first place? Well, I wrote that in my referee report only once, and it was really heartbreaking, as it was also the first paper of a PhD student. I decided to say it anyway, because otherwise the student would probably spent the rest of their PhD working on a nonsensical idea, and their career would go down the drain. I tried to be gentle about it, but sometimes being gentle doesn’t help when the news is that your mother has died.

Perhaps you’re not convinced, and want anyway to punish Referee #2 for the negative reports. Well, I have a feeling that editors know that I’m strict with my referee reports, so they send me papers they want to be rejected. Every day I see really nice papers getting published, but somehow they never come to me for refereeing. So rest assured, being Referee #2 is its own punishment.

EDIT: And just after I posted this, I got rejected from TQC with this paper. The killing blow came in fact from Referee #2: “Overall, I enjoyed reading this work. It starts with a clear and well-motivated question and mixes up wisely several ingredients from the theory of non-locality in order to tackle it. However, I did not find the results that exciting.” Oh, that hurts!

Posted in Uncategorised | Comments Off on Being Referee #2

Why I am unhappy about all derivations of the Born rule (including mine)

There are a lot of derivations of the Born rule in the literature. To cite just a few that I’m most familiar with, and illustrate different strategies: by Everett, Deutsch, Wallace, Żurek, Vaidman, Carroll and Sebens, me, and Mandolesi. I’m intentionally leaving out frequentist derivations, who should just be ashamed of themselves, and operationalist derivations, because I don’t think you can explain anything without going into ontology.

I’m not happy with any of them. Before going into detail why, I need to say what are my criteria for a satisfactory derivation of the Born role:

  1. It needs to use postulates that either come from the non-probabilistic part of quantum mechanics, or are themselves better motivated than the Born rule itself.
  2. The resulting Born rule must predict that if you make repeated measurements on the state $\alpha\ket{0}+\beta\ket{1}$ the fraction of outcome 0 will often be close to $|\alpha|^2/(|\alpha|^2+|\beta|^2)$.

The first criterion is hopefully uncontroversial: if the derivation of the Born rule depends on the probabilistic part of quantum mechanics it is circular, and if it uses more arcane postulates than the Born rule itself, why bother?

The second criterion tries to sidestep the eternal war about what is the definition of probability, focussing on the one point that everyone should agree about: probabilities must predict the frequencies. Should, but of course they don’t. Subjectivists insist that probabilities are the degrees of belief of a rational agent, leaving the connection with frequencies at best indirect. To the extent that this is a matter of semantics, fine, I concede the point, define probabilities to be subjective, but then I don’t care about them. I care about whatever objective property of the world that causes the frequencies to come out as they do. Call it propensity, call it chance, call it objective probability, doesn’t matter. This thing is what the Born rule is about, not about what is going on in the heads of rational agents.

And there is such a thing, there is a regularity in Nature that we have to explain. If you pass 1000 photons through a polariser, then a half-wave plate oriented at $\pi/4$ radians, and then measure their polarisation, you will often find the number of photons with horizontal polarisation to be close to 500.2 To simulate this experiment,2 I generated 8000 hopefully random bits from /dev/urandom, divided them into sets of 1000, and counted the number of ones in each set, obtaining $504, 495, 517, 472, 504, 538, 498$, and $488$. We often got something close to 500. This is the regularity in Nature, this is the brute fact that a derivation of the Born rule must explain. Everything else is stamp collection.3

I’ve emphasized the second criterion because it’s where most derivations fail. Consider the most fashionable one, by Deutsch and Wallace, which I reviewed here. They have some axioms about what it means to be a rational agent, some axioms about what is a measurement, and some axioms about how rational agents should be indifferent to some transformations done on the quantum state. From these axioms they conclude that rational agents should bet on the outcomes of a quantum experiment with probabilities given by the Born rule. Who cares? Should we really believe that the statistics of an experiment will be constrained by rationality axioms? And conversely, if you can show that the statistics of a quantum experiment follow the Born rule, doesn’t it become obvious that rational agents should bet that they do, making the whole decision-theoretic argument superfluous? It’s worth noting that this same criticism applies to my derivation, as it is just a cleaned up version of the Deutsch-Wallace argument.

Let’s move on to Vaidman, Carroll, and Sebens. Their derivations differ on several important points, but I’m interested here in their common point: they passionately argue that probability is about uncertainty, that a genuine source of uncertainty in Many-Worlds is self-locating uncertainty, and that locality implies that your self-locating uncertainty must be given by the Born rule. Arguing about whether probability is uncertainty is a waste of time4, but their second point is well-taken: after a measurement has been done and before you know the outcome, you are genuinely uncertain about in which branch of the wavefunction you are. I just don’t see how could this be of fundamental relevance. I can very well do the experiment with my eyes glued to the screen of the computer, so that I’m at first aware that all possible outcomes will happen, and then aware of what the outcome in my branch is, without ever passing through a moment of uncertainty in between. Decoherence does work fast enough for that to happen.5 What now? No probability anymore? And then it appears when I close my eyes for a few seconds? That makes sense if probability is only in my head, but then you’re not talking about how Nature works, and I don’t care about your notion of probability.

Żurek’s turn, now. His proof follows the pattern of all the ones I talked about until now: it argues that probabilities should be invariant under some transformations on entangled states, and from a bunch of equalities gets the Born rule. An important different is that he refuses to define what probabilities are, perhaps in order to avoid the intractable debate. I think it’s problematic for his argument, though. One can coherently argue about what agents can know about, what they should care about, as the other authors did. But an undefined notion? It can do anything! Without saying what probability is how can you say it is invariant under some transformation? Also, how can this undefined notion explain the regularity of the frequencies? Maybe it has nothing to do with frequencies, who knows?

Let’s go back to Everett’s original attempt. It is of a completely different nature: he wanted to derive a measure $\mu$ on the set of worlds, according to which most worlds would observe statistics close to what the Born rule predicted. I think this makes perfect sense, and satisfies the second criterion. The problem is the first. He assumed that $\mu$ must be additive on a superposition of orthogonal worlds, i.e., that if $\ket{\psi} = \sum_i \ket{\psi_i}$ then
\[ \mu(\ket{\psi}) = \sum_i \mu(\ket{\psi_i}). \] I have no problem with this. But then he also assumed that $\mu$ must depend only on the 2-norm of the state, i.e., that $\mu(\ket{\psi}) = f(\langle \psi|\psi\rangle)$. This implies that
\[ f\left( \sum_i \langle \psi_i|\psi_i\rangle \right) = \sum_i f(\langle \psi_i|\psi_i\rangle), \] which is just Cauchy’s equation. Assuming that $\mu$ is continuous or bounded is enough to imply that $\mu(\ket{\psi}) = \alpha \langle \psi|\psi\rangle$ for some positive constant $\alpha$, that can be taken equal to 1, and we have a measure that gives rise to the Born rule.

But why assume that $\mu$ must depend only on the 2-norm of the state? This unjustified assumption is what is doing all the work. If we assume instead that $\mu$ depends only on the $p$-norm of the state, the same argument implies that $\mu(\ket{\psi}) = \|\ket{\psi}\|_p^p$, an absolute value to the $p$ rule, instead of an absolute value squared rule. Everett would be better off simply postulating that $\mu(\ket{\psi}) = \langle \psi|\psi\rangle$. Had he done that, it would be clear that the important thing was replacing the mysterious the Born rule, with its undefined probabilities, with the crystal clear Born measure. Instead, the whole debate was about how proper was his derivation of the Born measure itself.

The last proof I want to talk about, by Mandolesi, agrees with Everett’s idea of probability as measure theory on worlds, and as such satisfies the second criterion. The difficulty is deriving the Born measure. To do that, he uses not only the quantum states $\ket{\psi}$, but the whole rays $R_\psi = \{c \ket{\psi}; c \in \mathbb C \setminus 0\}$. He then considers a subset $R_{\psi,U} = \{c \ket{\psi}; c \in U\}$ of the ray, for some measurable subset $U$ of the complex numbers, to which he assigns the Lebesgue measure $\lambda(R_{\psi,U})$. Given some orthogonal decomposition of a ray $c \ket{\psi} = \sum_i c \ket{\psi_i}$ – which is most interesting when the states $\ket{\psi_i}$ correspond to worlds – the fraction of worlds of kind $i$ is given by
\[ \frac{\lambda(R_{\psi_i,U})}{\lambda(R_{\psi,U})} = \frac{\langle \psi_i|\psi_i \rangle}{\langle \psi|\psi \rangle}, \] which does not depend on $U$, and coincides with the Born measure.

I think he’s definitely using postulates more arcane than the Born measure itself. He needs not only the rays and their subsets, by also the Lebesgue measure on the complex line, for which his only justification is that it is natural. No argument here, it is very natural, but so is the Born measure! It is just an inner product, and is there anything more natural than taking inner products in a Hilbert space?

To conclude, I’m skeptical that there can even be a satisfactory derivation of the Born rule. It obviously doesn’t follow from the non-probabilistic axioms alone; we need anyway an additional postulate, and it’s hard to think of a more natural one than $\mu(\ket{\psi}) = \langle \psi|\psi\rangle$. Perhaps we should focus instead in showing that such an axiomatization already gives us everything we need? In this vein, there’s this curious paper by Saunders, which doesn’t try to derive the Born rule, but instead carefully argues that the mod-squared amplitudes play all the roles we require from objective probabilities. I find it curious because he never mentions the interpretation of the mod-squared amplitudes as the amounts of worlds with a given outcome, which is what makes the metaphysics intelligible.

Posted in Uncategorised | 15 Comments

P-values and Bayes factors

I’m feeling bad about having used $p$-values in my paper about Bell inequalities, despite considering them bullshit. The reason we decided to do it is because we wanted the paper to be useful for experimentalists, and all of them use $p$-values instead of Bayes factors6. We are already asking them to abandon their beloved Bell inequalities and use nonlocal games instead, and that is the main point of the paper. We thought that also asking them to give up on $p$-values would be too much cognitive dissonance, and distract from the main issue. Good reasons, no?

Still, still. The result is that there’s yet another paper perpetuating the use of $p$-values, and moreover my colleagues might see it and think that I defend them! The horror! My reputation would never recover. To set things right, I’m writing this post here to explain why $p$-values are nonsense, and restate our results in terms of Bayes factors.

First of all, what is a $p$-value? It is the probability of observing the data you got, or something more extreme, under the assumption that the null hypothesis is true2. If this probability is less than some arbitrary threshold, often set at $0.05$ or $10^{-5}$, the null hypothesis is said to be rejected.

The very definition already raises several red flags. Why are we considering the probability of “something more extreme”, instead of just the probability of the data we actually got? Also, what counts as “more extreme”? It doesn’t only sound ambiguous, it is ambiguous. And what about this arbitrary threshold? How should we set it, and why should we care about this number? More worrisome, how can we possibly regard the null hypothesis in isolation? In practice we always have another hypothesis in mind, but come on. What if we do reject the null hypothesis, but the alternative hypothesis attributes only slightly higher probability to the observed outcome? Obviously the experiment was just inconclusive, but apparently not if you take $p$-values seriously.

Perhaps these are just counterintuitive properties of a definition that has a solid basis in probability theory, and we need to just grow up and accept that life is not as simple as we would want. Well, as it turns out, there’s absolutely no basis for using $p$-values. Pearson just pulled the definition out of his ass, for use in his $\chi^2$-test. People then ran with it and started applying it everywhere.

In stark contrast, a Bayes factor is defined simply as the ratio of the probabilities of observing the data you got, given the null or the alternative hypotheses:
\[ K := \frac{p(D|H_0)}{p(D|H_1)}.\] It suffers none of the problems that plague the definition of $p$-value: We only consider the actually observed data, no “more extreme” events are relevant. It explicitly depends on both hypotheses, there’s no attempt to reject an hypothesis in isolation. There’s no arbitrary threshold to set, it is just a number with a clear operational interpretation.

More importantly, it doesn’t come out of anybody’s ass, but from Bayes’ theorem: it is the data we need to calculate the posterior probability $p(H_0|D)$. In the case of two hypotheses, we have that
\[ \frac{p(H_0|D)}{p(H_1|D)} = \frac{p(D|H_0)p(H_0)}{p(D|H_1)p(H_1)} = K\frac{p(H_0)}{p(H_1)}.\] How do our results look when written in terms of the Bayes factor then?

Before we see that, we have to undo another bad decision we did in our paper. Our goal was to minimize the expected $p$-value from a Bell test. But we don’t observe an expected $p$-value, we observe a $p$-value. And the $p$-value we expect to observe is the $p$-value of the expected number of victories $n\omega_q$. This is given by a hideous expression we have no hope of optimizing directly, but luckily there’s a simple and power upper bound, the Chernoff bound3:
\[p_n \le \exp(-n D(\omega_q||\omega_\ell)),\] where $D(\omega_q||\omega_\ell)$ is the relative entropy. And the Bayes factor of the expected number of victories is
\[K_n = \exp(-n D(\omega_q||\omega_\ell)),\] exactly the same expression. Neat no? We don’t even need to fight about $p$-values and Bayes factors, we just maximize the relative entropy. The difference is that for the $p$-value this is just an upper bound, whereas for the Bayes factor it is the whole story.

In the paper we maximized the gap of the nonlocal game $\chi := \omega_q-\omega_\ell$, because the expected $p$-value was too difficult to minimize directly, and because a large gap a sufficient condition for having a small expected $p$-value. Now the Bayes factor is a simple expression that we could minimize directly, but at least our work was not in vain: a large gap is also a sufficient condition for a small Bayes factor, so it is still a good idea to maximize it. To see that, we only need to note that
\[D(\omega_q||\omega_\ell) \ge -\chi\log(1-\chi).\] Interestingly, having a large ratio $\omega_q/\omega_\ell$ is not sufficient for a small Bayes factor. The ratio is of widespread interest in the literature because a large ratio implies a large resistance to noise. Intuitively, this should be useless if $\omega_q$ is too small, as the favourable outcome would happen too seldom in an experiment. Indeed, in the Khot-Vishnoi game we have $\omega_q \ge 1/\log^2(d)$ and $\omega_\ell \le e^2/d$, so the ratio gets arbitrarily large with $d$, but the relative entropy and thus the Bayes factor gets arbitrarily close to 04. It’s a near miss, though, and the intuition is wrong. If we had instead $\omega_q \ge 1/\sqrt{\log(d)}$ and the same $\omega_\ell$, the relative entropy would actually diverge with increasing $d$, even though the gap gets arbitrarily small.

It’s easy to show that
\[ D(\omega_q||\omega_\ell) \le \log\left( \frac{\omega_q}{\omega_\ell} \right), \] so having a large ratio is a necessary condition for a small Bayes factor. We have then a nice duality between the ratio and the gap: the gap is sufficient, but not necessary, and the ratio is necessary, but not sufficient.

I don’t think that beauty implies true; otherwise neutrinos would be massless. But it’s a joy when you have an independent argument for something to be true, and it turns out to be quite beautiful as well.

Posted in Uncategorised | Comments Off on P-values and Bayes factors

Hilbert’s Hotel and Gödel’s Freezer

A friend of mine, Flavien Hirsch, invented a fascinating paradox in probability, called Hilbert’s Hotel and Gödel’s Freezer. Technically speaking it is the same thing as the Sleeping Beauty problem, which I’ve written about here, but I find it much nicer because it’s more symmetrical, you actually care about the answer, and intuition leads you astray. Without further ado, here it is:

Gödel has a Freezer which contains 4 people in a state of suspended animation, with you among them. He picks 2 of them randomly, with uniform probability, to send to Hilbert’s Hotel, a nice place which is never full. The other 2 are doomed to stay in the Freezer forever. Being a bit mad and a bit evil, Gödel decides to do the transfer in a capricious way. He sends them via two tram trips, each with 3 people: one of the chosen ones, who stays at Hilbert’s Hotel at the end of the trip, and the doomed ones, who are taken back to the Freezer. Not being too evil, Gödel erases the memory of the doomed ones after each trip so that they won’t be tortured by their dream of freedom. You wake up in the tram, are told about the whole setup, and ask yourself: what’s the probability that I’m going to Hilbert’s Hotel?

There are two answers that seem obviously correct, but conflict with each other:5

  • $1/2$ – As 2 out of 4 people are going to Hilbert’s Hotel, and by assumption they were picked randomly and uniformly, the probability of each person must be $1/2$. Since everybody will wake up with certainty and ask themselves the question, you don’t learn anything by waking up, so the probability stays $1/2$.
  • $1/3$ – Out of the 3 people in the tram only one is going to Hilbert’s Hotel. Since there’s nothing distinguishing anyone there, they must all have the same probability of being the chosen one, which must therefore be $1/3$.

Now suppose there are $n$ people instead of 4, with still only 2 doomed ones. The probability $1/2$ becomes $(n-2)/n$, while $1/3$ stays $1/3$. The probability that you will go to Hilbert’s Hotel is therefore rather high. But you wake up in the tram – which will happen in any case – and now suddenly you should be sad that your probability is only $1/3$? How can that possibly be true?

The difficulty here is that we’re trying to work with an informal, intuitive notion of probability. That’s not good enough. If we want to have actual answers we need to figure out what probability means. We should first notice that the paradox is about subjective probability. The probability that Gödel used to pick the chosen ones can be taken to be objective, but that’s in the past, now the objective probability that you’ll go to Hilbert’s Hotel is either zero or one, you just don’t know what it is. So even if we had an amazing and universally agreed upon definition of objective probability that wouldn’t help us at all.

What definition of subjective probability do we have then? I’m happy with the decision-theoretical one: you say that an event $E$ happens with subjective probability $p$ if $p$ is the fair price you assign to a bet that pays 1€ if $E$ happens, and nothing otherwise. Fair here means that you don’t expect to win or lose money from this bet, so you’d be fine buying or selling the bet for this this price. For example, if $E$ happens with objective probability $q$, and you pay $p$ for the bet, then with probability $q$ you win $(1-p)$€, and with probability $1-q$ you win $-p\,$€. The expected value is $q(1-p)-(1-q)p=q-p$, so you will expect to win money if $p < q$, and you will expect to lose money if $p>q$, so the fair price is $p=q$.

You wake up in the tram then, and consider, what is the fair price of a bet that pays 1€ if I end up at Hilbert’s Hotel? There are two cases to consider: if you are a chosen one, you’ll pay $p$€ and win 1€, so your net gain is $(1-p)$€. But if you’re a doomed one, you’ll pay $p$€ and win nothing, with a net gain of $-p$€. In this trip. But if you are a doomed one you will certainly do another trip (or will have already done it). So you’ll do the same bet twice, paying in total $-2p$€. The bet just became less attractive, and it stands to reason that you’ll be willing to pay less for it! The expectation value is $(1-p)/2 -2p/2 = (1-3p)/2$, which is zero for $p=1/3$.

But wait. Is it fair to make the doomed one pay twice for the bet? Well, the bet is imaginary in any case, but the doomed one is who wakes up twice and asks themselves the question twice. Paying for the bet twice is just a formalisation of being wrong twice. Intuitively, you might not want to guess you’re a doomed one because even though the probability is $1/2$, when you’re wrong you’ll be wrong twice.

Maybe we shouldn’t count the mistake of the doomed one twice, because their memory will be erased between the trips? Wiseman suggested this would be natural in my original post about the Sleeping Beauty. I find it rather artificial. If we erase their memory after the second trip as well (as the statement of the paradox says), should we then never count their mistake? And thus assign subjective probability 1 to going to Hilbert’s Hotel? Or in a completely different situation, should I assign probability 1 to winning a game of Russian roulette, as my memory will be erased if I lose? I think this is really changing the question from being wrong about your prediction than knowing that you were wrong.

It does raise an interesting question, though, of what does it take to make the subjective probability of going to Hilbert’s Hotel match the objective probability $1/2$. One could get rid of the multiple awakenings, by saying that Gödel will only let the passengers of the tram wake up in the first trip. But this is not enough, because now the probability that you will wake up if given that you are a chosen one is $1/2$, while the probability that you will wake up given that you are a doomed one is still 1, so the expectation value is $(1-p)/4 – p/2 = (1-3p)/4$, so the fair price is still $1/3$.

It does work if Gödel decides to wake up each of the four people exactly once, say all passengers of the tram wake up once, in the first trip, and the remaining chosen one wakes up alone in the second trip (and somehow doesn’t notice that the other two passengers are still asleep). Now everybody will wake up once, and only once, with probability 1, and the expectation value is $(1-p)/2 – p/2 = 1/2 – p$, and finally we have $p=1/2$. But the problem clearly changed.

Posted in Uncategorised | 1 Comment

Beautiful angles with beautiful cosines

I’ve recently proved a theorem that is probably well-known to anyone that studied number theory. I didn’t know it before, though, and it was a lot of fun, so I decided to write it up here in case there are other people around that like number theory a bit but not enough to actually learn it.

The question I wanted to answer was which beautiful angles have beautiful cosines? My completely objective definitions are that a beautiful angle is of the form $\frac{p}{q}\pi$ for small integers $p,q$, and a beautiful cosine is of the form $\frac{a}{b}+\sqrt{\frac{c}{d}}$, again for small integers. Surprisingly, the question can be answered completely, and it is not even difficult if you know number theory (I don’t).

The first thing to notice is that any algebraic number uniquely defines its minimal polynomial, which is the polynomial with rational coefficients (and leading coefficient one) of least degree that has this algebraic number as a root. So we only need to compute the minimal polynomial of $\alpha$ and see if it matches the minimal polynomial of $\cos(\frac{p}{q}\pi)$ for any $p,q$. Which doesn’t really help, because what the hell is the minimal polynomial of $\cos(\frac{p}{q}\pi)$?

Luckily though the minimal polynomial of $e^{i\pi\frac{p}{q}}$ instead is very well characterized: it is known as a cyclotomic polynomial, and it is given simply by $\Phi_n(x) := \prod_k (x-e^{i2\pi\frac{k}{n}})$, over all $k$ that are coprime to $n$.

Since $\cos(\frac{p}{q}\pi) = \alpha$ iff $e^{i\frac{p}{q}\pi} = \alpha \pm i\sqrt{1-\alpha^2}$ what we do instead is to compute the minimal polynomial of $\alpha + i\sqrt{1-\alpha^2}$ (which is equal to the minimal polynomial of $\alpha – i\sqrt{1-\alpha^2}$), and see if it matches some cyclotomic polynomial.

Let’s start by disregarding the trivial cases where $\frac{a}{b}+\sqrt{\frac{c}{d}}$ is $1,0,$ or $-1$. We know that $\cos(0),\cos(\pi/2),\cos(-\pi/2),$ and $\cos(\pi)$ give them, there’s no need for the big guns. We’ll assume from here on that $\frac{a}{b}+\sqrt{\frac{c}{d}}$ is never an integer.

The first interesting case is when $c=0$, so we’re just asking which beautiful angles have rational cosines. To answer that we need the minimal polynomial of $z=\frac{a}{b}+i\sqrt{1-\frac{a}{b}^2}$. It’s clear that $(x-z)(x-z^*)$ works, and we can’t get rid of the root $z^*$, so that’s the minimal one. Explicitly it is
\[ x^2 -2 \frac{a}{b} x +1, \]so it can only match cyclotomic polynomials of degree 2. Well, the degree of a cyclotomic polynomial $\Phi_n(x)$ is simply the number of integers $k \le n$ that are coprime to $n$, so we need to find which integers $n$ have exactly 2 smaller integers that are coprime to it.

This is kind of a strange question, but the reverse is rather straightforward: the number of integers coprime to $n$ is the definition of Euler’s totient function $\varphi(n)$. It has the simple lower bound of $\varphi(n) \ge \sqrt{n/2}$, which allows us to brute-force our problem: we just need to look at $n$ up to 8 and we’ll have all integers with exactly 2 coprimes. They are 3,4, and 6, so we just need to check whether our minimal polynomial matches $(x-e^{i2\pi\frac{1}{3}})(x-e^{i2\pi\frac{2}{3}})$, $(x-e^{i2\pi\frac{1}{4}})(x-e^{i2\pi\frac{3}{4}})$, or $(x-e^{i2\pi\frac{1}{6}})(x-e^{i2\pi\frac{5}{6}})$. Which is kind of overkill, because we already have the angles, so we can simply check instead which of them have rational cosines, which turn out to be $\cos(\frac{2}{3}\pi) = \cos(\frac{4}{3}\pi) = -\frac{1}{2}$, and $\cos(\frac{1}{3}\pi) = \cos(\frac{5}{3}\pi) = \frac{1}{2}$. And these are the only ones.

Moving on to the next interesting case, we set $a=0$, so that we are asking which beautiful angles have cosines equal to $\sqrt{\frac{c}{d}}$. We need then to figure out the minimal polynomial of $z = \sqrt{\frac{c}{d}} +i\sqrt{1-\frac{c}{d}}$. Now the polynomial $(x-z)(x-z^*)$ doesn’t work anymore, as it has irrational coefficients, but this is easy to fix by considering instead $(x-z)(x-z^*)(x+z)(x+z^*)$, which does work and is also clearly the minimal polynomial. Working it out, we get
\[x^4 + \left(2-4\frac{c}{d}\right)x^2 + 1.\] Calculating which integers have exactly four coprimes, like before, we see that they are 5, 8, 10, and 12, which is way too much work to check by hand, so we’ll do something else instead: another nice property of cyclotomic polynomials is that they only have integer coefficients, so we only need to look for $c,d$ such that $4\frac{c}{d}$ is integer. The only nontrivial cases are $c=1,d=2$ and $c=3,d=4$, which tells us that the only beautiful cosines that are square roots of rationals are $\pm\sqrt{1/2}$ and $\pm\sqrt{3/4}$. The angles that give these cosines are the well-known $\pi/4,7\pi/4,3\pi/4,5\pi/4$ and $\pi/6,11\pi/6,5\pi/6,5\pi/6$.

To finalize let’s consider cosines of the form $\frac{a}{b}+\sqrt{\frac{c}{d}}$ for $a \neq 0$ and $c \neq 0$. We need the minimal polynomial for the hideous thing
\[z = \frac{a}{b}+\sqrt{\frac{c}{d}}+i\sqrt{1-\left(\frac{a}{b}+\sqrt{\frac{c}{d}}\right)^2}.\] The previous case does not work, but if you think about it, it is easy to see that with
\[w = \frac{a}{b}-\sqrt{\frac{c}{d}}+i\sqrt{1-\left(\frac{a}{b}-\sqrt{\frac{c}{d}}\right)^2}\] the polynomial $(x-z)(x-z^*)(x-w)(x-w^*)$ will do the job. Working it out with Mathematica, because that’s already above what I can do by hand, we get
\[x^4 – 4\frac{a}{b}x^3 + \left(2+4\frac{a^2}{b^2}-4\frac{c}{d}\right)x^2 – 4\frac{a}{b}x + 1.\] At this point it’s better to just look up what are the cyclotomic polynomials for $n=5,8,10,12$, which as we said before are the only ones with degree 4. They are
\begin{align*}
\Phi_5(x) &= x^4 + x^3 + x^2 + x +1 \\
\Phi_8(x) &= x^4 + 1 \\
\Phi_{10}(x) &= x^4 – x^3 + x^2 – x + 1 \\
\Phi_{12}(x) &= x^4 – x^2 + 1,
\end{align*}which is not that bad. $\Phi_8(x)$ and $\Phi_{12}(x)$ are right out because they require $a=0$, which we analysed before. To make our minimal polynomial equal to $\Phi_5(x)$ we need $a=-1$ and $b=4$, which tells us that $c/d = 5/16$. We have a match! It implies that the cosines of the angles $2\pi/5, 4\pi/5, 6\pi/5, 8\pi/5$ must be $-\frac{1}{4}+\sqrt{\frac{5}{16}}$ or $-\frac{1}{4}-\sqrt{\frac{5}{16}}$. A similar matching occurs with $\Phi_{10}(x)$, the only difference is that $a=1$ instead of $-1$. It implies that the cosines of the angles $\pi/5, 3\pi/5, 7\pi/5, 9\pi/5$ must be $\frac{1}{4}+\sqrt{\frac{5}{16}}$ or $\frac{1}{4}-\sqrt{\frac{5}{16}}$. And that’s it! There can be no more.

To summarize, we found out that the only possible beautiful cosines of beautiful angles are $0, \pm1, \pm\frac{1}{2},\pm\frac{1}{\sqrt{2}},\pm\frac{\sqrt{3}}{2},\frac{1\pm\sqrt{5}}{4},\frac{-1\pm\sqrt{5}}{4}$. And the beautiful angles with these cosines are all the integer multiples of $\pi,\pi/2,\pi/3,\pi/4,\pi/5,\pi/6$. But be careful, this holds really only for cosines, the sines of integer multiples of $\pi/5$ are never beautiful.2

Funnily enough, these are precisely the angles and cosines we learn in school, except for $\pi/5$. Usually we only learn a couple of examples that are good enough, and leave behind an incredible wealth of interesting cases. But here no, out teachers taught us all the angles and cosines that were worth knowing! $\pi/5$ is an understandable exception, it is not as beautiful as the others, and we should anyway demand both the sines and cosines to be beautiful.

Posted in Uncategorised | Comments Off on Beautiful angles with beautiful cosines

Implementing impossible functions via tomography

A long time ago I had a conversation with Daniel Nagaj about the following problem: given access to an unknown unitary gate $U$, is it possible to build the gate control-$U$ out of it? Nagaj argued that it was impossible, because $U$ and $e^{i\alpha}U$ are physically equivalent for any phase $\alpha$, but control-$U$ and control-$e^{i\alpha}U$ are physically distinct. This is true, but it felt to me too cheap: the global phase $\alpha$ is not even defined, so of course it is not possible to make it appear in the control. Moreover, why would anyone want to have this ghostly phase in the controlled unitary? The more meaningful question is whether is it possible to implement control-$e^{i\phi(U)}U$ for any phase $\phi(U)$.

A quick search through the literature revealed that nobody had answered this question, so I proved it was impossible and wrote a short and simple paper about it. There’s also the plot twist that implementing control-$U$ is often easy in practice, as any realistic implementation of an unknown unitary will have some extra information that makes it possible, but that’s another story.

In any case, at that time I believed that if one is given a large number of copies of $U$, one could just do process tomography on it, and thus easily implement control-$e^{i\phi(U)}U$. Luckily I didn’t write that down anywhere, because a couple of weeks ago Gavorová et al. proved that it is not possible to implement control-$e^{i\phi(U)}U$ for any number of copies of $U$, so in particular even tomography cannot do it.

Now how could this possibly be true? One can always do tomography, right? And if you did tomography, you can implement any function you want of the thing you tomographed, right? It turns out it is not so simple. If you successfully tomograph $U$, what you end up with is with a unitary $U’$ such that $\|\mathfrak{C}(U’) – \mathfrak{C}(U)\| < \epsilon$, where $\mathfrak{C}(U)$ is the Choi-Jamiołkowski representation of $U$. This complication is unfortunately necessary, as $U$ is not physically observable, only $\mathfrak{C}(U)$ is, for the very reason that $\mathfrak{C}(e^{i\alpha}U)$ does not depend on the global phase $\alpha$. It is just not true that $\|U’-U\| < \epsilon$, because we would need to tomograph the global phase to make it happen.

Ok, great, we have tomography, how about the function then? We want that $\|f\big(\mathfrak{C}(U’)\big)-f\big(\mathfrak{C}(U)\big)\| < \delta$, but this only follows from $\|\mathfrak{C}(U’) – \mathfrak{C}(U)\| < \epsilon$ if $f$ is continuous! If $f$ is not continuous then by definition there will exist a pair $U,U’$ such that $\mathfrak{C}(U’),\mathfrak{C}(U)$ are arbitrary close but $f\big(\mathfrak{C}(U’)\big),f\big(\mathfrak{C}(U)\big)$ are not. And Gavorová et al. showed precisely that if $f$ implements the controlled unitary it must be discontinuous.

The source of the discontinuity is not really the controlling, but just the killing off the global phase, so the seemingly simpler task of producing a unitary that does not depend on the global phase is equally impossible.

To be more precise, let $f$ be a function that takes $\mathfrak{C}(U)$ to a unitary $V$ such that $\mathfrak{C}(U) = \mathfrak{C}(V)$. Or equivalently, let $g$ be a function such that $g(U) = g(e^{i\alpha}U)$ for all $\alpha$ and $\mathfrak{C}\big(g(U)\big) = \mathfrak{C}(U)$. Then it is impossible to tomograph $U$ and produce $U’$ such that $\|f\big(\mathfrak{C}(U’)\big)-f\big(\mathfrak{C}(U)\big)\| < \delta$, or that $\|g(U’)-g(U)\| < \delta$.

To see why, let’s consider a specific example of such a function, $g(U) = \frac{u^*}{|u|}U$, where $u$ is the first nonzero element of the first column of $U$. Let
\[U = \begin{pmatrix} \sqrt{p} & -\sqrt{1-p}\ e^{i\beta} \\ \sqrt{1-p}\ e^{i\alpha} & \sqrt{p}\ e^{i(\alpha+\beta)} \end{pmatrix} \] be an arbitrary unitary of dimension $2$ with a global phase already removed, so that for $p > 0$ we have $g(U) = U$, and for $p=0$ we have
\[g(U) = \begin{pmatrix} 0 & -e^{i(\beta-\alpha)} \\ 1 & 0 \end{pmatrix}. \]Suppose that $p >0$, $\alpha=\pi$, and $\beta=0$, so that
\[U = \begin{pmatrix} \sqrt{p} & -\sqrt{1-p} \\ -\sqrt{1-p} & -\sqrt{p}\end{pmatrix}. \] Now if $p$ is small enough, this is very close to
\[U’ = \begin{pmatrix} 0 & -1 \\ -1 & 0\end{pmatrix}, \] so tomography might very well return us that instead. The problem is that
\[g(U’) = \begin{pmatrix} 0 & 1 \\ 1 & 0\end{pmatrix}, \] which is as far from $g(U)$ as it gets.

You might object that the probability of obtaining such $U’$ is extremely small; it is given roughly2 by the probability of measuring the state $U|0\rangle$ in the computational basis and always obtaining outcome 1, which is $(1-p)^n$. It is way too high for small $p$! In particular for fixed $n$, there is always a $p$ such that this probability is higher than any failure probability $f$ that we accept.

This breaks the fundamental promise of tomograhy: that for any error tolerance $\epsilon > 0$ and failure probability $f > 0$ you just need to measure $n$ copies of $U$ in order to obtain an estimate $U’$ that with probability $1-f$ is $\epsilon$-close to $U$. Sure, for any $U$ there will be an $n$ such that tomography works, but this doesn’t help us since we don’t know what $U$ is before we tomographed it!

Ok, ok, so tomography cannot converge to $g(U)$; it does converge instead to $g(U)$ modulo some random phase. Doesn’t seem so dramatic, does it? Since you didn’t care about the global phase to start with, why should you worry about whether it is random or a function of $U$? Well, yes, if you just do tomography it doesn’t matter, but perhaps you have some other, efficient protocol that does the control modulo a random phase. Now in every iteration of the protocol you might get a different phase, and if you just ignore this you’re going to lose coherence.

Posted in Uncategorised | Comments Off on Implementing impossible functions via tomography

Explicitly local quantum mechanics

I’ve just attended a nice online talk by Charles Bédard about his somewhat recent paper, where he studies the explicitly local model of quantum mechanics developed by Deutsch and Hayden, with a twist by Raymond-Robichaud. I think it’s a neat idea encumbered by horrible notation, so I’ll try to expose it in a more intelligible way here.

First of all, what is the problem? Quantum mechanics is local. Very local in fact, if you look at QFT. It’s Lorentz covariant, by construction. Non-relativistic quantum mechanics is not very local, as it is still non-relativistic, but still much more local than people usually think: the only outrageously nonlocal thing about it is the collapse of the wavefunction, but everybody knows that it is not real. Even Bell’s theorem doesn’t imply nonlocality, as long as you don’t pretend there is a wavefunction collapse happening.

Nevertheless, quantum states seem rather nonlocal. If Alice and Bob share an entangled state, and Alice acts locally on it, the quantum state changes in a seemingly global way, oblivious to the locality of her action. Furthermore, you can’t separate it in Alice’s part of the state and Bob’s part of the state, it only exists as an indivisible whole. Sure, you can define reduced density matrices for Alice and for Bob, but in this way you lose a lot of information, the global quantum state is not a function of the reduced density matrices.2

What Deutsch and Hayden showed is that it doesn’t need to be like this. It is possible to define a kind of quantum state that you can separate in Alice’s part and Bob’s part, in such a way that the global quantum state is a function of both parts, and they both evolve in a perfectly local way: whatever Alice does cannot change Bob’s part and vice-versa. I’ll call them ontic states, because perfectly local states seem perfectly real.

The key idea is to define an ontic state based on the dynamics of quantum mechanics, since they are local: instead of focusing on a quantum state $\ket{\psi}$ we focus on the unitary $U$ such that
\[\ket{\psi} = U\ket{0},\]where $\ket{0}$ is a standard fixed state, like how on quantum circuits we usually assume that all qubits are initialized on the $\ket{0}$ state. The ontic state of the global system is then just the Choi-Jamiołkowski representation of $U$2
\[ \lambda(U) := \sum_{ij} \ket{i}\bra{j} \otimes U \ket{i}\bra{j} U^\dagger.\]So far so boring. Things get interesting when we represent the ontic states of the parts of the quantum system. First we rewrite the global ontic state explicitly as a composite system:
\[ \lambda^{AB}(U) := \sum_{ijkl} \ket{i}\bra{j} \otimes \ket{k}\bra{l} \otimes U \big( \ket{i}\bra{j} \otimes \ket{k}\bra{l} \big)U^\dagger,\] and we name the tensor factors as $A_I,B_I,A_O,B_O$, in that order. The ontic state of Alice and Bob’s parts are then defined as
\[ \lambda^{A}(U) := \tr_{B_O} \lambda^{AB}(U)\quad\text{and}\quad\lambda^{B}(U) := \tr_{A_O} \lambda^{AB}(U).\]If you use the identity $(\id \otimes U)\ket{\phi^+} = (U^T \otimes \id) \ket{\phi^+}$ they get the convenient representation
\[ \lambda^{A}(U) = \sum_{ij} {U}^T \big( \ket{i}\bra{j} \otimes \id \big){U}^* \otimes \ket{i}\bra{j},\]with the analogous formula holding for Bob.

Now we deliver the goods. First we show that the ontic states of the parts do evolve locally: if Bob applies an unitary $V$ to his part of the system, the global unitary is now $(\id \otimes V) U$. Alice’s ontic state is
\begin{align*}
\lambda^{A}\big((\id \otimes V)U\big) &= \sum_{ij} {U}^T(\id \otimes V^T) \big( \ket{i}\bra{j} \otimes \id \big)(\id\otimes V^*){U}^* \otimes \ket{i}\bra{j} \\
&= \sum_{ij} {U}^T \big( \ket{i}\bra{j} \otimes \id \big){U}^* \otimes \ket{i}\bra{j},
\end{align*} so Bob cannot change it, whatever he does.

Next we show that the ontic state of the global system is a function of the ontic states of the parts. In fact it’s just the product of them, with some strategically placed identities:
\[\lambda^{AB}(U) = \big[\lambda^{A}(U)\otimes \id_{B_O}\big]\big[\lambda^{B}(U)\otimes \id_{A_O}\big].\] That’s it. A demonstration in a couple of lines of something that was widely regarded as impossible. Neat, no?

We should also remark that these ontic states do give you back the usual quantum states. The global quantum state is
\[ \rho^{AB} := \bra{0}_{A_IB_I} \lambda^{AB}(U) \ket{0}_{A_IB_I} = U\ket{0}\bra{0}U^\dagger, \] and the reduced density matrix are given by the analogous formula:
\[ \rho^{A} := \bra{0}_{A_IB_I} \lambda^{A}(U) \ket{0}_{A_IB_I} = \tr_{B_O}\big(U\ket{0}\bra{0}U^\dagger\big).\]

One might wonder if it is possible to do a similar trick using just the quantum state $\ket{\psi} = U\ket{0}$ instead of the unitary $U$ that produces it. As Deutsch and Hayden argue (and Raymond-Robichaud gives a formal proof), the answer is no. The crucial problem is that $(X \otimes \id)\ket{\phi^+} = (\id \otimes X)\ket{\phi^+}$. That is, Alice can act on a quantum state in a completely local way, but you can’t tell from looking at resulting state whether it was changed by Alice or by Bob. This never happens if we look at $U$ instead: if $(S\otimes \id)U = (\id \otimes T)U$ then $S$ and $T$ must be just identity with a phase.

With the introduction done, we can speak about Bédard’s talk itself. He was interested in the number of parameters that are necessary to describe the ontic states. He showed that global ontic state of $n$ quantum systems of dimension $d$ requires $d^{2n}-1$ parameters. This might be obvious from the Choi-Jamiołkowski representation, as it is just the number of parameters of the global unitary $U$ (the global phase doesn’t count), but it was not obvious from the Deutsch-Hayden notation he was working with. More interestingly, he showed that the ontic state of a single quantum system needs $(1-d^{-2})d^{2n}$ parameters. Proving this is difficult in either notation, but intuitively we can expect this number to be true, since the ontic state of a single system does not depend on unitaries that act on the other systems. Since these have $d^{2n-2}-1$ parameters, we subtract this from the number of parameters of the global unitary and voilà!

Now, the dimension of the ontic state of a single system grows exponentially with the number of systems. This might seem strange, but Bédard had a good argument about why this is actually necessary: the global quantum state of $n$ systems of dimension $d$ already needs $2d^n-2$ parameters. Since we want the local ontic states to recover at least the global quantum state, if follows that each must have at least $\sim d^n/n$ parameters, if we assume they are uniformly spread.

There was a lively discussion after the talk about Wallace and Timpson’s criticism of the Deutsch-Hayden model. They notice that for any unitary W that acts on the whole Hilbert space except $\ket{0}$, the unitary $U(1\oplus W)$ will give rise to the same quantum states, and therefore to the same observations, but in general different ontic states. They argue that this $W$ is akin to a gauge freedom, and we should identify all these different ontic states, lest we suffer from indeterminism and (ontic) state underdetermination. If we do that, however, we end up back with just regular quantum states and their nonlocality, making the Deutsch-Hayden model completely pointless.

Wallace and Timpson missed a very important point, however. Doing quantum state tomography is not the only way to determine $U$; this would be a hopelessly positivist point of view. No, determining $U$ is determining what are the dynamics of our theory. Doing this is what science is all about, and we use all tools we can to do that, more importantly assuming that the fundamental dynamics do not change with time, and that we can reproduce experiments. Once we know that fundamental dynamics of a (closed) system, we know what $U$ is, there is no underdetermination; the fact that $U(1\oplus W)$ would also give rise to the same observations is not relevant, it’s just a fun fact. The transformation from $U$ to $U(1\oplus W)$ is not a gauge transformation, it is a transformation from the real dynamics to some incorrect dynamics.

Posted in Uncategorised | 6 Comments