## The horrifying world of confidence intervals

We often see experimental results reported with some “error bars”, such as saying that the mass of the Higgs boson is $125.10 \pm 0.14\, \mathrm{GeV/c^2}$. What do these error bars means, though? I asked some people what they thought it was, and the usual answer was that the true mass was inside those error bars with high probability. A very reasonable thing to expect, but it turns out that this is not true. Usually these error bars represent a frequentist confidence interval, which has a very different definition: it says that if you repeat the experiment many times, a high proportion of the confidence intervals you generate will contain the true value.

Fair enough, one can define things like this, but I don’t care about hypothetical confidence intervals of experiments I didn’t do. Can’t we have error bars that represent what we care about, the probability that the true mass is inside that range? Of course we can, that is a Bayesian credible interval. Confusingly enough, credible intervals will coincide with confidence intervals in most cases of interest, even though they answer a different question and can be completely different in more exotic problems.

Let’s focus then on the Bayesian case: is the intuitive answer people gave correct then? Yes, it is, but it doesn’t help us define what the credible interval is, as there will be infinitely many intervals that contain the true value with probability (e.g.) 0.95. How do we pick one? A nice solution would be to demand the credible interval to be symmetric around the estimate, so that we could have the usual $a\pm b$ result. But think about the most common case of parameter estimation: we want to predict the vote share that some politician will get in an election. If the poor candidate was estimated to get 2% of the votes, we can’t have the error bars to be $\pm$4%. Even if we could do that, there’s no reason why it should be symmetric: it’s perfectly possible that a 3% vote share is more probable than a 1% vote share.

A workable, if more cumbersome, definition is the Highest Posterior Region: it is a region where all points inside it have a posterior distribution larger than the points outside it. It is well-defined, except for some pathological cases we don’t care about, and is also the smallest possible region containing the true value with a given confidence. Great, no? What could go wrong with that?

Well, for starters it’s a region, not an interval. Think of a posterior distribution that has two peaks: the highest posterior region will be two intervals, each centred around one of the peaks. It’s not beautiful, but it’s not really a problem, the credible region is accurately summarizing your posterior. Your real problem is having a posterior with two peaks. How did that even happen!?

But this shows a more serious issue: the expectation value of a two-peaked distribution might very well be in the value between the peaks, and this will be almost certainly outside the highest posterior region. Can this happen with a more well-behaved posterior, that has a single peak? It turns out it can. Consider the probability density
$p(x) = (\beta-1)x^{-\beta},$ defined for $x \ge 1$ and $\beta > 2$. To calculate the highest posterior region for some confidence $1-\alpha$, note that $p(x)$ is monotonically decreasing, so we just need to find $\gamma$ such that
$\int_1^\gamma \mathrm{d}x\, (\beta-1)x^{-\beta} = 1-\alpha.$Solving that we get $\gamma = \frac1{\sqrt[\beta-1]{\alpha}}$. As for our estimate of the (fictitious) parameter we take the mean of $p(x)$, which is $\frac{\beta-1}{\beta-2}$. For the estimate to be outside the credible interval we need than that
$\frac{\beta-1}{\beta-2} > \frac1{\sqrt[\beta-1]{\alpha}},$which is a nightmare to solve exactly, but easy enough if we realize that the mean diverges as $\beta$ gets close to 2, whereas the upper boundary of the credible interval grows to a finite value, $1/\alpha$. If we take then choose $\beta$ such that the mean is $1/\alpha$ it will always be outside the credible interval!

A possible answer is “deal with it, life sucks. I mean, there’s a major war going on in Ukraine, estimates lying outside the credible interval is the least of our problems”. Fair enough, but maybe this means we chose our estimate wrong? If we take our estimate as the mode of the posterior then by definition it will always be inside the highest posterior region. The problem is there’s no good justification for using the mode as the estimate. The mean can be justified as the estimate that minimizes the mean squared error, which is quite nice, but I know of no similar justification for the mode. Also, the mode is rather pathological: if our posterior again has two peaks, but one of them is very tall and has little probability mass, the mode will be there but will be a terrible estimate.

A better answer is that sure, life sucks, we have to deal with it, but note that the probability distribution $(\beta-1)x^{-\beta}$ is very pathological. It will not arise as a posterior density in any real inference problem. That’s fine, it just won’t help against Putin. Slava Ukraini!

Posted in Uncategorised | 5 Comments

## What happened to QIP?

QIP is one of the most famous and prestigious conference series in quantum information. It has been going on since 1998, and getting a paper accepted by them is considered very good for your career. I have often attended, and can confirm, it’s a good conference. This year I didn’t submit a paper because it will be in the United States and getting a visa is a huge pain in the ass.

Some friends did, though, and were quite shocked by the registration fees they are charging. The cheapest student fee available is 475 dollars! If you’re not a student you have to pay at least 575 dollars, and if you’re late the fee goes up to 850 dollars! This is for attending in person. If you’re a student attending online it’s free, fair enough, but if you’re not a student attending online costs 250 dollars!

In contrast, last year’s QIP was in München, completely online, and completely for free. What happened? Did QIP suddenly become a for-profit organization? They even list some deep-pocketed sponsors like Amazon, IBM, and Microsoft. Where is all this money going?

Of course an in-person conference must be more expensive than an online conference, but this is completely out of the ordinary. I’m organizing an in-person conference now, the Tsirelson Memorial Workshop, and we are not even charging a conference fee, just 50€ for dinner. We are counting with the generous support of IQOQI, but QIP apparently has much richer sponsors. Our conference is also much smaller, but the price should be a concave function of the number of participants, not convex!

EDIT: A friend of mine was at the budget talk of QIP and reported me the details. The answer is clear: QIP just doesn’t give a shit about wasting the participants’ money. The total income was 790 k\$, with 460 k\$ coming form the sponsors, and 330 k\$from registration fees. The total amount of expenses was 750 k\$, with the breakdown being 220 k\$for “Freeman”, 220 k\$ for “Convention center”, 150 k\$for the conference dinner with the rump session, 80 k\$ for “Virtual platform”, and 80 k\$for other stuff. Now this “Freeman” is a logistics company, and maybe logistics do cost a lot in California, so I’m not going to criticize that. But 220 k\$ for using the convention centre for one week? That’s insane. They booked the “Pasadena Civic Auditorium”, which is a really nice place, but come on. Couldn’t Caltech offer the Beckman Auditorium for a reasonable price? It is also gigantic and really beautiful. I paid roughly 2000€ for booking the conference venue for 80 people for one week, including the coffee break. QIP had roughly 800 participants, so 20 k\$would be reasonable for the costs, with 200 k\$ being just profit. And 150 k\$for the conference dinner? So roughly 187 \$ per person? That’s a seriously fancy dinner. Now 80 k\$for the “Virtual platform” is just stupid. They could have done like every conference in the world and used Discord, that costs peanuts. But no, they insisted in paying a huge amount of money for developing their own platform, which the participants tell me was a piece of shit. Well done. Another cost that was not detailed in the breakdown was renting the auditorium of the Hilton Hotel for streaming the Zoom talks. Sounds expensive, and a bit insane. I can afford it, I have plenty of grant money for travelling to conferences. But I’m not going to set it on fire like this. If the next QIP doesn’t come back to reality I’m not going to attend. Posted in Uncategorised | Comments Off on What happened to QIP? ## A satisfactory derivation of the Born rule Earlier this year I wrote a post complaining about all existing derivations of the Born rule. A couple of months later yet another derivation, this time by Short, appeared on the arXiv. Purely out of professional duty I went ahead and read the paper. To my surprise, it’s actually pretty nice. The axioms are clean and well-motivated, and it does make a connection with relative frequencies. I would even say it’s the best derivation so far. So, how does it go? Following Everett, Short wants to define a measure on the set of worlds, informally speaking a way of counting worlds. From that you can do everything: talk about probability, derive the law of large numbers, and so on. Let’s say your world branches into a superposition of infinitely many worlds1, indexed by the natural numbers$\mathbb{N}$: $\ket{\psi} = \sum_{i\in \mathbb{N}} \alpha_i \ket{\psi_i}.$ Then the probability of being in the world$i$is understood as the fraction of your future selves in the world$i$, the relative measure $p_{\ket{\psi}}(i) = \frac{\mu_{\ket{\psi}}(i)}{\mu_{\ket{\psi}}(\mathbb{N})}.$The law of large numbers states that most of your future selves see frequencies close to the true probability. Mathematically, it is a statement like $p_{\ket{\psi}^{\otimes n}}\left(|f_i\,-\,p_{\ket{\psi}}(i)| > \varepsilon \right) \le 2e^{-2n\varepsilon^2},$which you can prove or disprove once you have the measure2. Now, to the axioms. Besides the ones defining what a measure is, Short assumes3 that if$\alpha_i = 0$then$\mu_{\ket{\psi}}(i) = 0$, and that if a unitary$U$acts non-trivially only on a subset$S$of the worlds, then$\mu_{U\ket{\psi}}(S) = \mu_{\ket{\psi}}(S)$. The first axiom is hopefully uncontroversial, but the second one demands explanation: it means that if you mix around some subset of worlds, you just mix around their measures, but do not change their total measure. It corresponds to the experimental practice of assuming that you can always coarse-grain or fine-grain your measurements without changing their probabilities. I think it’s fine, I even used it in my own derivation of the Born rule. It is very powerful, though. It immediately implies that the total measure of any quantum state only depends on its 2-norm. To see that, consider the subset$S$to be the entire set$\mathbb{N}$; the second axiom implies that you can apply any unitary to your quantum state$\ket{\psi}$without changing its measure. Applying then$U = \ket{0}\bra{\psi}/\sqrt{\langle \psi|\psi\rangle} + \ldots$we take$\ket{\psi}$to$\sqrt{\langle \psi|\psi\rangle}\ket{0}$, so for any quantum state$\mu_{\ket{\psi}}(\mathbb{N}) = \mu_{\sqrt{\langle \psi|\psi\rangle}\ket{0}}(\mathbb{N})$. It also implies that if a unitary$U$acts trivially on a subset$S$then we also have that$\mu_{U\ket{\psi}}(S) = \mu_{\ket{\psi}}(S)$, because$U$will act non-trivially only on the complement of$S$, and $\mu_{U\ket{\psi}}(S) + \mu_{U\ket{\psi}}(\mathbb{N} \setminus S) = \mu_{U\ket{\psi}}(\mathbb{N}) = \mu_{\ket{\psi}}(\mathbb{N}) = \mu_{\ket{\psi}}(S) + \mu_{\ket{\psi}}(\mathbb{N} \setminus S).$ We can then see we don’t need to consider complex amplitudes. Consider the unitary$U_{i_0}$such that$U_{i_0}\ket{i_0} = \frac{\alpha_{i_0}^*}{|\alpha_{i_0}|}\ket{i_0}$for some$i_0$, and acts as identity on the other$\ket{i}$. The second axiom implies that it doesn’t change the measure of$i_0$. Repeating the argument for all$i_0$, we map$\ket{\psi}$to$\sum_i |\alpha_i|\ket{i}$without changing any measures. Now we shall see that to compute the measure of any world$i$in any state$\ket{\psi}$, that is,$\mu_{\ket{\psi}}(i)$, it is enough to compute$\mu_{\alpha_i\ket{0}+\beta\ket{1}}(0)$for some$\beta$. Consider the unitary $U = \Pi_i + \frac{\ket{i+1}\bra{\psi}(\id-\Pi_i)}{\sqrt{\bra{\psi}(\id-\Pi_i)\ket{\psi}}} + \ldots,$where$\Pi_i$is the projector onto world$i$. It maps any state$\ket{\psi}$into $U\ket{\psi} = \alpha_i \ket{i} + \beta\ket{i+1},$where$\beta=\sqrt{\bra{\psi}(\id-\Pi_i)\ket{\psi}}$, and we have that$\mu_{\ket{\psi}}(i) = \mu_{U\ket{\psi}}(i)$. Now consider the unitary $V = \ket{0}\bra{i} + \ket{i}\bra{0} + \ldots$ It takes$U\ket{\psi}$to $VU\ket{\psi} = \alpha_i\ket{0} + \beta\ket{i+1}$ It acts trivially on$i+1$, so$\mu_{VU\ket{\psi}}(i+1) = \mu_{U\ket{\psi}}(i+1)$. Since the total measure of$U\ket{\psi}$and$VU\ket{\psi}$are equal, we have that $\mu_{VU\ket{\psi}}(0) + \mu_{VU\ket{\psi}}(i+1) = \mu_{U\ket{\psi}}(i) + \mu_{U\ket{\psi}}(i+1),$ so$\mu_{VU\ket{\psi}}(0) = \mu_{U\ket{\psi}}(i)$. Doing the same trick again to map$i+1$to$1$we reduce the state to$\alpha_i\ket{0}+\beta\ket{1}$as we wanted. This reduction does all the heavy lifting. It implies in particular that if two worlds have the same amplitude, they must have the same measure, so if we have for example the state$\ket{\psi} = \alpha\ket{0} + \alpha\ket{1}$, then$\mu_{\ket{\psi}}(0) = \mu_{\ket{\psi}}(1)$. Since$\mu_{\ket{\psi}}(\mathbb{N}) = \mu_{\ket{\psi}}(0) + \mu_{\ket{\psi}}(1)$, we have that $p_{\ket{\psi}}(0) = p_{\ket{\psi}}(1) = \frac12.$A more interesting case is the state$\ket{\psi} = \alpha\sqrt{p}\ket{0} + \alpha\sqrt{q}\ket{1}$for positive integers$p,q$. We apply to it the unitary$U$such that $U\ket{0} = \frac1{\sqrt p}\sum_{i=0}^{p-1} \ket{i}\quad\text{and}\quad U\ket{1} = \frac1{\sqrt q}\sum_{i=p}^{p+q-1} \ket{i},$taking$\ket{\psi}$to$\alpha\sum_{i=0}^{p+q-1}\ket{i}$. Now all amplitudes are equal, and therefore all the measures are equal, call it$x$. Then the total measure is$(p+q)x$, the measure of the original world 0 is$px$, and the measure of the original world 1 is$qx$. Therefore the probability of the original world 0 is $p_{\ket{\psi}}(0) = \frac{p}{p+q}.$ Since the argument is valid for all$\alpha$, we have proven the Born rule for all worlds where the ratio between the amplitudes is the square root of a positive rational. Since such amplitudes are dense in the set of amplitudes, we only need a continuity argument to get the complete Born rule. Normally I don’t care about the continuity argument, as one usually needs a separate postulate to get it, and the continuum is just a convenient fiction anyway. Here the situation is a bit more interesting, because the axioms we have are already strong enough to get it, there’s no need for an extra continuity axiom. Unfortunately I couldn’t find an elegant proof, so I’ll refer you to the original paper for that. To conclude, I’m still skeptical about this proving the Born rule business, in the sense of replacing it with a better set of axioms to be included in the axioms of quantum mechanics. I don’t think we’ll ever get something better than simply postulating the measure of worlds to be$\mu(\ket{\psi}) = \langle\psi|\psi\rangle$. It’s a similar situation with the other axioms: there are tons of good arguments why one should use complex numbers, or tensor products, or unitary evolution. But when it comes down to writing down the axioms of quantum mechanics, nobody uses the arguments, they write the axioms directly. If what you want is an argument why we should use the Born rule, though, then this is a pretty good one. Posted in Uncategorised | 10 Comments ## Violating the Tsirelson bound I started writing the previous post as an introduction to another subject, but it got too long and I decided to publish it separately. What I actually wanted to talk about is the following question: what happens if you are doing device-independent quantum key distribution (DIQKD) experimentally, and you violate the Tsirelson bound? I don’t meant in the sense of violating quantum mechanics, but doing it in the way quantum mechanics allows. If you do the experiment perfectly, then the probability of winning each round of the CHSH game is exactly the Tsirelson bound$\omega = \frac{2+\sqrt{2}}{4}$. Then the probability of getting a number of victories$v$out of$n$rounds of the CHSH game such that $\frac{v}{n} > \frac{2+\sqrt{2}}{4}$ is given by $\sum_{v = \lceil n \omega \rceil}^n \binom{n}{v} \omega^v(1-\omega)^{n-v}.$ This is not small at all, it is equal to$\omega$for$n=1$, and goes to 1/2 for large$n$. So yeah, it’s perfectly possible to violate the Tsirelson bound, and it is not a matter of experimental error or doing too few rounds4. On the contrary, experimental error is precisely what makes it very unlikely to win the CHSH game too often. This is very unsatisfactory, though, we are relying on experimental error to sweep the problem under the rug. Clearly DIQKD must also work in the ideal case. Even if you only care about the realistic case, there’s a different scenario where this matters: as proposed by Brown and Fawzi$^{\otimes 2}$, one can use an estimate of the whole probability distribution to do DIQKD instead of only the probability of winning the CHSH game. This makes it harder for the eavesdropper to cheat, and thus gives us better key rates. The problem is that we’re now dealing with a high-dimensional object instead of a one-dimensional parameter, and we need the estimates of all the parameters to land in the quantum-allowed region. The probability that at least one falls outside is appreciable. It’s hard to give a precise statement about this, because it will depend on the quantum state and the measurements you are doing, but the fact of the matter is that experimentalists routinely get estimates outside of the quantum-allowed region2. For simplicity, though, we’ll focus on the one-dimensional case. Why wouldn’t it work, though? What is the problem with violating the Tsirelson bound? The idea of DIQKD is that Alice and Bob play the CHSH game, calculate the frequency of winning$\frac{v}n$, and do an optimization over all quantum states with winning probability equal to$\frac{v}n$, picking up the worst, that is, the one that gives Eve the most information about the key they’re trying to generate. Well, if$\frac{v}n > \frac{2+\sqrt{2}}{4}$there’s just no quantum state with this winning probability, so you can’t figure out how much information Eve can have in this way. What can we do then? One obvious solution is to say that the winning probability$p$is equal to$\frac{2+\sqrt{2}}{4}$. After all, this is the closest we can get to the frequency$\frac{v}n$while staying in the range allowed by quantum mechanics. That’s not a good idea though. We would be assuming that Eve has no information whatsoever about Alice and Bob’s key, while it is perfectly possible that$p$is slightly smaller than$\frac{2+\sqrt{2}}{4}$, which would give her a bit of information. In fact, the probability that$p$is exactly$\frac{2+\sqrt{2}}{4}$is zero, just because$p$is a continuous parameter. It is very likely, on the other hand, that$p$is close to$\frac{2+\sqrt{2}}{4}$. This is what you should assume. And this is not even related to violating the Tsirelson bound. Even if you find that$\frac{v}n = 0.8$, it would be stupid to assume that$p=0.8$, as it’s almost certainly not. Assuming a flat prior over the quantum-allowed region, the probability density of$p$is given by $f(p|v,n) = \frac{p^v(1-p)^{n-v}}{\int_{1-\omega}^{\omega}\mathrm{d}q\, q^v(1-q)^{n-v}},$ for$p \in [1-\omega,\omega]$and zero otherwise. Which finally brings us to the DIQKD papers I mentioned in the previous post. How did they deal with this problem? It turns out, they did something completely different. They set some expected winning probability$p_0$and some tolerance$\delta$, and if the measured frequency$v/n$is at least$p_0-\delta$, they assume that the actual probability$p$is also at least$p_0-\delta$. I find that very strange, they are not using the measured frequency for anything other than this test, the key rate is calculate solely based on$p_0-\delta$. This is not wrong, I must emphasize, the probability that the winning probability is at least$p_0-\delta$given that the frequency is at least$p_0-\delta$is indeed very high, and they have a proper security proof. I just find it bizarre that they are discarding valuable information, using the measured frequency can give you a much better idea on what the actual winning probability is. For example, if the measured frequency is very close to$p_0-\delta$, then the probability that the winning probability is at least$p_0-\delta$is close to 1/2. Not as high as we’d like. On the other hand, if the measured frequency is much higher than$p_0-\delta$, the winning probability is likely much higher, and you’re needlessly lowering your key rate by being so pessimistic. Posted in Uncategorised | 6 Comments ## DIQKD is here! This has been a momentous week for quantum information science. The long-awaited for experimental demonstration of device-independent quantum key distribution (DIQKD) is finally here! And not one only demonstration, but three in a row. First, the Oxford experiment came out, which motivated the München and the Hefei experiments to get their data out quickly to make it clear they did it independently. To give a bit of context, for decades the community had been trying to do a loophole-free violation of a Bell inequality. To the perennial criticism that such an experiment was pointless, because there was no plausible physical model that exploited the loopholes in order to fake a violation, people often answered that a loophole-free Bell test was technologically relevant, as it was a pre-requisite for DIQKD.3 That was finally achieved in 2015, but DIQKD had to wait until now. It’s way harder, you need less noise, higher detection efficiency, and much more data in order to generate a secure key. Without further ado, let’s look at the experimental results, summarized in the following table.$\omega$is the probability with which they win the CHSH game, distance is the distance between Alice and Bob’s stations, and key rate is the key rate they achieved. Experiment$\omega$Distance Key rate Oxford 0.835 2 m 3.4 bits/s München 0.822 700 m 0.0008 bits/s Hefei 0.756 220 m 2.6 bits/s I’ve highlight the München and the Hefei key rates in red because they didn’t actually generate secret keys, but rather estimated that this is the rate they would achieve in the asymptotic limit of infinitely many rounds. This is not really a problem for the Hefei experiment, as they were performing millions of rounds per second, and could thus easily generate a key. I suspect they simply hadn’t done the key extraction yet, and rushed to get the paper out. For the München experiment, though, it is a real problem. They were doing roughly 44 rounds per hour. At this rate it would take years to gather enough data to generate a key. Why is there such a drastic difference between Hefei and München? It boils down to the experimental technique they used to get high enough detection efficiency. Hefei used the latest technology in photon detectors, superconducting nanowire single-photon detectors,2 which allowed them to reach 87% efficiency. München, on the other hand, used a completely different technique: they did the measurement on trapped atoms, which has efficiency of essentially 100%. The difficulty is to entangle the atoms. To do that you make the atoms emit photons, and do an entangled measurement on the photons, which in turns entangles the atoms via entanglement swapping. This succeeds with very small probability, and is what makes the rate so low. What about Oxford? Their experimental setup is essentially the same as München, so how did they get the rate orders of magnitude higher? Just look at the distance: in Oxford Alice and Bob were 2 metres apart, and in München 700 metres. The photon loss grows exponentially with distance, so this explains the difference very well. That’s cheating, though. If we are two metres apart we don’t need crypto, we just talk. One can see this decay with distance very well in the Hefei paper: they did three experiments, with a separation of 20, 80, and 220 metres, and key rates of 466, 107, and 2.6 bits/s. In the table I only put the data for 220 metres separation because that’s the only relevant one. It seems that the Hefei experiment is the clear winner then, as the only experiment achieving workable keyrates over workable distances. I won’t crown them just yet, though, because they haven’t done a standard DIQKD protocol, but added something called “random post-selection”, which should be explained in a forthcoming paper and in the forthcoming Supplemental Material. Yeah, when it appears I’ll be satisfied, but not before. EDIT: In the meanwhile the Hefei group did release the Supplemental Material and the paper explaining what they’re doing. It’s pretty halal. The idea is to use the full data for the Bell test as usual, as otherwise you’d open the detection loophole and compromise your security, but for the key generation only use the data where both photons have actually been detected. Which gets you much more key, as the data where one or both photons were lost is pretty much uncorrelated. There’s an interesting subtlety that they can’t simply discard all the data where a photon has been lost, because they only have one photodetector per side; Alice (or Bob) simply assigns outcome ‘0’ to the photons that came to this photodetector, and ‘1’ to the photons that didn’t arrive there. Now if there was no loss at all, the ‘1’ outcomes would simple correspond to the photons with the other measurement result. But since there is loss, they correspond to a mixture of the other measurement result and the photons that have been lost, and there is no way to distinguish them. Still, they found it’s advantageous to discard some of the data with outcome ‘1’, as this improves the correlations. Now they don’t have a full security proof for this new protocol with random post-selection, they only examined the simplified scenario where the source emits the same state in each round and the eavesdropper makes a measurement in each round independently. I suppose this is just a matter of time, though. Extending the security proof to general case is hard, but usually boils down to proving that the eavesdropper can’t do anything better than attacking each round independently. EDIT2: It turns out the Hefei experiment didn’t actually use a random setting for each round, as is necessary in DIQKD, but just did blocks of measurements with fixed settings. It’s straightforward to change the setup to use randomized settings, the standard method is to use a Pockels cell to change the setting electronically (rather than mechanically) at a very high rate. However, Pockels cells are nasty devices, which use a lot of power and even need active cooling, and are bound to increase the noise in the setup. They also cause bigger losses than regular waveplates. It’s hard to estimate how much, but it’s safe to assume that the keyrate of the Hefei experiment will go down when they do it. Posted in Uncategorised | 15 Comments ## Floating-point arithmetic and semidefinite programming Another day working with MATLAB, another nasty surprise. I was solving a SDP that was already working well, did a minor change, and suddenly it started taking minutes to reach a solution, when usually it took seconds. After a long investigation, I realized that the problem was that my input was not exactly Hermitian anymore. I had switched from writing manually the components of a matrix to defining the matrix as$A = \ket{\psi}\bra{\psi}$for a complex vector$\ket{\psi}$. Now how could that possibly be a problem? It is a very simple theorem to prove that the outer product of a vector with itself is always Hermitian. Well, not with MATLAB. I can’t figure out why, but probably MATLAB uses some smart algorithm for the outer product that ruins this property due to floating-point arithmetic. What I could determine is that if you compute the outer product the naïve way, then even with MATLAB the result will be exactly Hermitian. Also, with Python the output is exactly Hermitian. To solve this problem, one can simply redefine$A$as$(A+A^\dagger)/2$, which is very fast and gives an exactly Hermitian output. I don’t like doing that, though, as$(A+A^\dagger)/2$is always exactly Hermitian, even when$A$is not even close to Hermitian. If I fucked up the definition of$A$this will hide my mistake, and make debugging harder. Instead, what I did was write my own outer product function, computing it as$\ket{\psi}\otimes\bra{\phi}$. It is slower than whatever black magic MATLAB is doing, sure, but it is fast enough, and leaves my mistakes easily visible. It dawned on me that this subtlety was probably the source of many bugs and slowdowns in my codes over the years. I decided to go hunting, and find out the fascinating properties of MATLAB algebra that doesn’t preserve Hermiticity where it should. It turns out that if$A$and$B$are exactly Hermitian, then both$A+B$and$A \otimes B$are exactly Hermitian, as they should. The problem is really when you do matrix multiplication. Which shouldn’t produce further problems, right? After all,$AB$is in general not Hermitian, so we don’t have anything to worry about. Except, of course, that the Hilbert-Schmidt inner product$\operatorname{tr}(A^\dagger B)$is real for Hermitian$A,B$, and this is not true in MATLAB. Argh!$\operatorname{tr}(A^\dagger B)$appears very often in the objective of a SDP, and it really needs to be a real number, as you can’t minimize a complex function. It turns out that this is not MATLAB’s fault, it is a fundamental problem of floating-point arithmetic. An well-known but often-forgotten fact is the floating-point addition is not associative. Let$a=10^{20}$and$b=10^{-20}$. Then with floating point numbers we see that$b+(a-a) = b$but$(b+a)-a = 0$. Which is the issue with the Hilbert-Schmidt inner product: we get stuff like$z+\bar{z}+w+\bar{w}+\ldots$, but not in order.$z+\bar{z}$is of course real, and$w+\bar{w}$as well, but$z+(\bar{z}+w)+\bar{w}$? Nope, not in floating-point arithmetic. Here I don’t think a proper solution is possible. One can of course write$\Re[\operatorname{tr}(A^\dagger B)]$, but that will hide your mistakes when$A$and$B$are not in fact Hermitian. A better idea is to vectorize$A$and$B$to avoid matrix multiplication, writing$\Re[\langle A|B\rangle]$. This only helps with speed, though, it doesn’t touch the real problem. Posted in Uncategorised | 6 Comments ## Relativity is your friend I’ve just finished reading Andy Weir’s new book “Project Hail Mary”. Great story, I heartily recommend it to anyone who is into hard sci-fi. There’s minor plot point that I found hilarious: an alien civilization managed to do interstellar travel before discovering relativity (it makes perfect sense in the story). The poor bastards were very confused: the trip was shorter and took much less fuel than they had planned for! I was aware that relativity make trips shorter from the point of view of the traveller, but fuel? Never heard about relativity saving fuel as well. I just had to check it out! First let’s loot at the time, though, as it is also rather interesting. Let’s say you want to travel to a star$L$light-years away, and you have an extremely powerful and efficient rocket, so the only limitation you have is the acceleration your body can take. What you do is then accelerate at a constant rate$a$up to half-way point, where you start decelerating at the same rate$a$, so that you actually stop at your destination. How long does that take? Luckily the Usenet Physics FAQ already has all the calculations, so I won’t do them again, I’ll just copy their results. From the point of view of the Earth the number of years3 it takes is $t = \sqrt{L^2+4L/a}.$ You see it is always larger than$L$, which is the familiar result that you can’t travel faster than light. More interesting is that time it takes from the point of view of the traveller, the proper time: $\tau = \frac2a \cosh^{-1}(aL/2 +1 ).$ You’re probably not so familiar with the hyperbolic cosine, so it’s worth noting that$\cosh(x) \approx e^x/2$for large$x$, which results in $\tau \approx \frac2a \log(aL+2)$ for large$L$. Yep, that’s a logarithm! It’s much, much faster from the point of view of the traveller. And those poor bastards that didn’t known relativity? They’ll calculate the trip time to be $t_\text{Newton} = 2 \sqrt{L/a},$which is indeed faster than the time from the point of view of the Earth$t$, but much slower than the time from the point of view of the traveller, which is what matters here. For concreteness, let’s look at the time it takes to reach Tau Ceti, which is roughly 12 light-years away, at a nice comfy acceleration of one$g$. We have that$t=13.8$years,$\tau=5.2$years, and$t_\text{Newton} = 6.8$. Ok, not so brutal, but you’d definitely notice if you’d arrive at your destination one and a half years before schedule. This is a bit misleading, though, because this distance too small for the power of the logarithm to really kick in. If instead you go to the centre of the Milky Way, roughly 30,000 light-years away, you get$\tau=20$years,$t_\text{Newton}=341$years, and$t = 30,002$years. Now that’s brutal. How about fuel, though? I don’t think we’ll ever do interstellar travel with rockets, much more plausible is a laser-on-light-sail propulsion as investigated by Breakthrough Starshot, but it’s fun to do the calculations anyway. The Newtonian calculation is an easy application of Tsiolkovsky’s equation: for each kilogram of payload, we need $M_\text{Newton} = \exp\left(\frac2{v_e}\sqrt{aL}\right)-1$kilograms of fuel. Here$v_e$is the effective exhaust velocity, which we take to be 1, as we are using an amazing photon rocket. The relativistic rocket equation is surprisingly simple: we just replace the time in Tsiolkovsky’s equation with the proper time. The result is that we need $M = \exp\left(\frac2{v_e}\cosh^{-1}(aL/2 +1 )\right)-1$ kilograms of fuel for each kilogram of payload. Using the same approximation for the hyperbolic cosine as before, we get $M \approx (aL+2)^{2/v_e}.$ Now this is fucking amazing. We have an exponential in the Newtonian case, but in the relativistic case it’s just a polynomial! The exponential from Tsiolkovsky gets negated by the logarithm in the proper time. Even for Tau Ceti the difference is already extremely large:$M=204$kg, whereas$M_\text{Newton} = 1,138$kg! And for the centre of the Milky Way?$M \approx 10^9$kg, and$M_\text{Newton} \approx 10^{152}$kg. Ok, both are impossible, but the Newtonian one is much more impossible! So don’t complain about the lightspeed barrier. Relativity actually makes life much easier for would-be interstellar travellers. A Newtonian universe would be a shit place to live. Posted in Uncategorised | 2 Comments ## Climbing a space elevator I was arguing on the internet about the possibility of building a space elevator; someone was claiming that carbon nanotubes were going to make it possible and we’d build one in the near future. I’m very skeptical, first because we’re nowhere near the capability of building high-strength carbon nanotube bundles tens of thousands of kilometres long, and secondly because it needs to be cheap, and this absurdly long and absurdly strong cable seems anything but cheap. Furthermore, if we postulate that some magical technology will allow that to happen, I think it’s dishonest to not allow some magical technology to also help the competition. For instance, we could have fully reusable rockets which cost little other than fuel to launch; and this technology seems much more plausible. But this is not what I’m going to talk about, space elevator cables have been discussed to death. What I’m interested in is a point that another poster brought up: how much does the energy needed to climb the space elevator costs? They were arguing that the proposed method of powering the climber, laser beaming, had an absurdly low efficiency of 0.005, and paying for that energy alone already made the space elevator uneconomical, we didn’t even need to consider the astronomical costs of constructing it in the first place. Now wait, come on, so even if we did have the magical cable for free it still wouldn’t be worth using the space elevator? That’s going too far. If laser beaming has such a low efficiency, this just means we should use some other method for powering the climber. Delivering the power via copper wire is out of question: the wire would be extremely massive, and over such a long distance the electrical resistance would also make it uneconomical. Of course we could use a superconductor to get rid of resistance, but then we have to cool the damn thing as well, making the weight and the cost even worse. No no no, I want to find a solution for powering the climber without needing any more magical technology, the only magic I’ll allow is the space elevator cable itself. The obvious solution is solar panels. In space they work very reliably, there’s no atmosphere to filter the sun, and even the shadow of the Earth quickly stops being a problem as you get higher2. They don’t generate much power, though, so it would take a long time to climb to geostationary orbit. How long exactly? It turns out this is an interesting physics problem. We’re not moving with a constant speed, or a constant force, but with a constant power. I didn’t study this problem either in school or university, so I wanted to find out how to do it. The principle is easy: kinetic energy is given by $E = \frac12 mv^2,$ so we differentiate that and get $\frac{d}{dt}E = P = m v a,$which results in the differential equation $a = \frac{P}{mv},$ that even I can solve. Life gets hard when we put that together with the forces acting on the climber: gravitational$GMm/r^2$and centrifugal$m\omega^2 r$(the famous “fictitious” force that hurts as much as any “real” force). The resulting acceleration is $a = \frac{P}{mv} – \frac{GM}{r^2} + \omega^2 r,$a really nasty differential equation that I have no idea how to solve. No matter, my laptop can solve it numerically without even turning on the cooling fan, I just need the values for the parameters. Most of them are fixed:$G$is the gravitational constant,$M$is the mass of the Earth, and$\omega$is its angular speed. I just need to determine$P/m$, the power per mass available from the solar panels. Annoyingly enough, I couldn’t find solid numbers anywhere, but a good estimate can be obtained by considering the solar panels of the ISS. They produce roughly 30 kW of power (before degradation) and weight roughly 1.1 tons, giving as a power density of 27 W/kg. To compensate for degradation, they’re about to install new ones, that promise a 20% weight reduction for the same performance. Again, I couldn’t find solid numbers, but given that they are going to be installed next month, and do have a plausible method for reducing weight, I’ll believe that 20% and put a power density of 34 W/kg solidly in the “not magic” category. The climber is not made only of solar panels, though. Let’s reserve 80% of its mass for payload, structure, and so on, and use 20% for solar panels, which lowers the power density to 6.8 W/kg. Solving the equation numerically, letting the initial position be the equatorial radius of the Earth and initial velocity 1 m/s2, we see that 82.5 days are needed to reach geostationary orbit. But wait! At geostationary orbit the gravitational and centrifugal forces balance out, so we’ll just have the motors working with no resistance, and the speed of the climber will quickly diverge! We have to check if we’re not accidentally breaking relativity. Solving numerically for speed, we see that at geostationary height it is roughly 300 m/s. Yeah, no danger of breaking relativity here. Still, that’s a little too fast to be plausible, so let’s apply the brakes at 100 m/s. This only lengthens the journey to 82.7 days, so it’s really not worth considering; there’s a bigger error anyway from not taking into account the time the climber spends in Earth’s shadow, which will make the journey a couple of days longer.3 That takes a lot of patience, but it’s really not an issue for geostationary satellites or interplanetary missions that take years anyway. It is an issue for manned missions, we really don’t want to spend 3 months loitering in space getting bathed in radiation. Can we speed it up somehow? Well, we can send power up via a copper wire for a short distance, say 100 km. In fact we need to do this, as we can’t deploy this massive solar array inside the atmosphere. Transmission over such distances is completely routine, so I’m not going to bother calculating the specifics. Just assume that we send enough power for a constant speed of 100 m/s, and solve the differential equation numerically again with these new initial conditions. The result is 81 days. Well, yeah, we’ll leave the space elevator for cargo and send people up in rockets. Posted in Uncategorised | 1 Comment ## Being Referee #2 It has happened to all of us. You spend months refining your latest result, carefully writing it up as a well-polished manuscript. You argue with your coauthors about everything: from which version of the proof is more enlightening, to the order the ideas should be presented, and even whether$\epsilon$or$\varepsilon$has more typographical merit. Finally content with the manuscript, you submit it to the journal of your desires, and the long wait begins for the referee reports. When they finally arrive, there’s a short and shallow report from Referee #1 saying it’s fine, publish it, just add a citation to Refereey McRefereeFace. But then comes the long, thorny report of Referee #2. This shrivelled soul writes that your bombastic claims in the abstract are not supported by the results in the text. This arrogant asshole thinks the problem you solved is of no interest whatsoever. This heartless pedant says that your proof is nothing but a handwavy argument. This presumptuous prick argues that the whole research program was misguided to start with. And anyway it was already done a long time ago in Russia. Well, I have a confession to make. I am Referee #2. Contrary to common misconception, I am not a sadist. I have received myself countless negative referee reports, and I know very well how much it hurts to see your work trashed like this. It breaks my heart to reject a paper, specially when I see it’s the first paper a student has ever written. Nevertheless, it is my duty. I do believe in peer-review, and I think readers should be able to trust what they read in a reputable journal. This is specially important in our age of widespread disinformation. The most common problem a paper has is claiming too much in the abstract, what is known as overselling the result4. This is what I find most frustrating as a referee because almost always the authors know what they wrote is bullshit, they just think it’s harmless, and I should just let it slide. The argument is that anybody that reads the paper will understand what the result is actually about and disregard the abstract, which is just colourful marketing. This is true, but misses the point that almost nobody reads the paper; the vast majority of the readers will be lay people that know of the result through a press release, that credulously repeats the claims of the abstract. Great, now you’re fooling the general public that paid for the research. Even among the physicists that read the paper, most of them will not be experts, and have to take your claims at face value. So no, I will not let it slide, and I will tell the authors to cut the crap without the least bit of remorse. When there is a technical mistake in the paper there is usually very little drama; of course it’s bad news, but the mistake was already there, and I helped find it. The authors usually just fix it and thank me in the acknowledgments. Once the authors refused to fix the mistake, the paper was rejected, and they simply resubmitted it to a different journal. Which asked me again to referee the paper. Then I got furious about wasting my time trying to help the authors, and simply copy-pasted my previous referee report. But in the other cases? Yes, sometimes the problem you have worked on indeed interests nobody but you. I do feel sorry for the authors, but I will write that in the referee report if that’s my opinion. And when the result was in fact done a long time ago in Russia? It’s very frustrating, I know, I have also rediscovered stuff that was published in the Siberian Journal of Mathematics. Life sucks, but what can I do? We have to respect the priority of Soviet physicists. And when the research program was a terrible idea in the first place? Well, I wrote that in my referee report only once, and it was really heartbreaking, as it was also the first paper of a PhD student. I decided to say it anyway, because otherwise the student would probably spent the rest of their PhD working on a nonsensical idea, and their career would go down the drain. I tried to be gentle about it, but sometimes being gentle doesn’t help when the news is that your mother has died. Perhaps you’re not convinced, and want anyway to punish Referee #2 for the negative reports. Well, I have a feeling that editors know that I’m strict with my referee reports, so they send me papers they want to be rejected. Every day I see really nice papers getting published, but somehow they never come to me for refereeing. So rest assured, being Referee #2 is its own punishment. EDIT: And just after I posted this, I got rejected from TQC with this paper. The killing blow came in fact from Referee #2: “Overall, I enjoyed reading this work. It starts with a clear and well-motivated question and mixes up wisely several ingredients from the theory of non-locality in order to tackle it. However, I did not find the results that exciting.” Oh, that hurts! Posted in Uncategorised | Comments Off on Being Referee #2 ## Why I am unhappy about all derivations of the Born rule (including mine) There are a lot of derivations of the Born rule in the literature. To cite just a few that I’m most familiar with, and illustrate different strategies: by Everett, Deutsch, Wallace, Żurek, Vaidman, Carroll and Sebens, me, and Mandolesi. I’m intentionally leaving out frequentist derivations, who should just be ashamed of themselves, and operationalist derivations, because I don’t think you can explain anything without going into ontology. I’m not happy with any of them. Before going into detail why, I need to say what are my criteria for a satisfactory derivation of the Born role: 1. It needs to use postulates that either come from the non-probabilistic part of quantum mechanics, or are themselves better motivated than the Born rule itself. 2. The resulting Born rule must predict that if you make repeated measurements on the state$\alpha\ket{0}+\beta\ket{1}$the fraction of outcome 0 will often be close to$|\alpha|^2/(|\alpha|^2+|\beta|^2)$. The first criterion is hopefully uncontroversial: if the derivation of the Born rule depends on the probabilistic part of quantum mechanics it is circular, and if it uses more arcane postulates than the Born rule itself, why bother? The second criterion tries to sidestep the eternal war about what is the definition of probability, focussing on the one point that everyone should agree about: probabilities must predict the frequencies. Should, but of course they don’t. Subjectivists insist that probabilities are the degrees of belief of a rational agent, leaving the connection with frequencies at best indirect. To the extent that this is a matter of semantics, fine, I concede the point, define probabilities to be subjective, but then I don’t care about them. I care about whatever objective property of the world that causes the frequencies to come out as they do. Call it propensity, call it chance, call it objective probability, doesn’t matter. This thing is what the Born rule is about, not about what is going on in the heads of rational agents. And there is such a thing, there is a regularity in Nature that we have to explain. If you pass 1000 photons through a polariser, then a half-wave plate oriented at$\pi/4$radians, and then measure their polarisation, you will often find the number of photons with horizontal polarisation to be close to 500.2 To simulate this experiment,2 I generated 8000 hopefully random bits from /dev/urandom, divided them into sets of 1000, and counted the number of ones in each set, obtaining$504, 495, 517, 472, 504, 538, 498$, and$488$. We often got something close to 500. This is the regularity in Nature, this is the brute fact that a derivation of the Born rule must explain. Everything else is stamp collection.3 I’ve emphasized the second criterion because it’s where most derivations fail. Consider the most fashionable one, by Deutsch and Wallace, which I reviewed here. They have some axioms about what it means to be a rational agent, some axioms about what is a measurement, and some axioms about how rational agents should be indifferent to some transformations done on the quantum state. From these axioms they conclude that rational agents should bet on the outcomes of a quantum experiment with probabilities given by the Born rule. Who cares? Should we really believe that the statistics of an experiment will be constrained by rationality axioms? And conversely, if you can show that the statistics of a quantum experiment follow the Born rule, doesn’t it become obvious that rational agents should bet that they do, making the whole decision-theoretic argument superfluous? It’s worth noting that this same criticism applies to my derivation, as it is just a cleaned up version of the Deutsch-Wallace argument. Let’s move on to Vaidman, Carroll, and Sebens. Their derivations differ on several important points, but I’m interested here in their common point: they passionately argue that probability is about uncertainty, that a genuine source of uncertainty in Many-Worlds is self-locating uncertainty, and that locality implies that your self-locating uncertainty must be given by the Born rule. Arguing about whether probability is uncertainty is a waste of time4, but their second point is well-taken: after a measurement has been done and before you know the outcome, you are genuinely uncertain about in which branch of the wavefunction you are. I just don’t see how could this be of fundamental relevance. I can very well do the experiment with my eyes glued to the screen of the computer, so that I’m at first aware that all possible outcomes will happen, and then aware of what the outcome in my branch is, without ever passing through a moment of uncertainty in between. Decoherence does work fast enough for that to happen.5 What now? No probability anymore? And then it appears when I close my eyes for a few seconds? That makes sense if probability is only in my head, but then you’re not talking about how Nature works, and I don’t care about your notion of probability. Żurek’s turn, now. His proof follows the pattern of all the ones I talked about until now: it argues that probabilities should be invariant under some transformations on entangled states, and from a bunch of equalities gets the Born rule. An important different is that he refuses to define what probabilities are, perhaps in order to avoid the intractable debate. I think it’s problematic for his argument, though. One can coherently argue about what agents can know about, what they should care about, as the other authors did. But an undefined notion? It can do anything! Without saying what probability is how can you say it is invariant under some transformation? Also, how can this undefined notion explain the regularity of the frequencies? Maybe it has nothing to do with frequencies, who knows? Let’s go back to Everett’s original attempt. It is of a completely different nature: he wanted to derive a measure$\mu$on the set of worlds, according to which most worlds would observe statistics close to what the Born rule predicted. I think this makes perfect sense, and satisfies the second criterion. The problem is the first. He assumed that$\mu$must be additive on a superposition of orthogonal worlds, i.e., that if$\ket{\psi} = \sum_i \ket{\psi_i}$then $\mu(\ket{\psi}) = \sum_i \mu(\ket{\psi_i}).$ I have no problem with this. But then he also assumed that$\mu$must depend only on the 2-norm of the state, i.e., that$\mu(\ket{\psi}) = f(\langle \psi|\psi\rangle)$. This implies that $f\left( \sum_i \langle \psi_i|\psi_i\rangle \right) = \sum_i f(\langle \psi_i|\psi_i\rangle),$ which is just Cauchy’s equation. Assuming that$\mu$is continuous or bounded is enough to imply that$\mu(\ket{\psi}) = \alpha \langle \psi|\psi\rangle$for some positive constant$\alpha$, that can be taken equal to 1, and we have a measure that gives rise to the Born rule. But why assume that$\mu$must depend only on the 2-norm of the state? This unjustified assumption is what is doing all the work. If we assume instead that$\mu$depends only on the$p$-norm of the state, the same argument implies that$\mu(\ket{\psi}) = \|\ket{\psi}\|_p^p$, an absolute value to the$p$rule, instead of an absolute value squared rule. Everett would be better off simply postulating that$\mu(\ket{\psi}) = \langle \psi|\psi\rangle$. Had he done that, it would be clear that the important thing was replacing the mysterious the Born rule, with its undefined probabilities, with the crystal clear Born measure. Instead, the whole debate was about how proper was his derivation of the Born measure itself. The last proof I want to talk about, by Mandolesi, agrees with Everett’s idea of probability as measure theory on worlds, and as such satisfies the second criterion. The difficulty is deriving the Born measure. To do that, he uses not only the quantum states$\ket{\psi}$, but the whole rays$R_\psi = \{c \ket{\psi}; c \in \mathbb C \setminus 0\}$. He then considers a subset$R_{\psi,U} = \{c \ket{\psi}; c \in U\}$of the ray, for some measurable subset$U$of the complex numbers, to which he assigns the Lebesgue measure$\lambda(R_{\psi,U})$. Given some orthogonal decomposition of a ray$c \ket{\psi} = \sum_i c \ket{\psi_i}$– which is most interesting when the states$\ket{\psi_i}$correspond to worlds – the fraction of worlds of kind$i$is given by $\frac{\lambda(R_{\psi_i,U})}{\lambda(R_{\psi,U})} = \frac{\langle \psi_i|\psi_i \rangle}{\langle \psi|\psi \rangle},$ which does not depend on$U$, and coincides with the Born measure. I think he’s definitely using postulates more arcane than the Born measure itself. He needs not only the rays and their subsets, by also the Lebesgue measure on the complex line, for which his only justification is that it is natural. No argument here, it is very natural, but so is the Born measure! It is just an inner product, and is there anything more natural than taking inner products in a Hilbert space? To conclude, I’m skeptical that there can even be a satisfactory derivation of the Born rule. It obviously doesn’t follow from the non-probabilistic axioms alone; we need anyway an additional postulate, and it’s hard to think of a more natural one than$\mu(\ket{\psi}) = \langle \psi|\psi\rangle\$. Perhaps we should focus instead in showing that such an axiomatization already gives us everything we need? In this vein, there’s this curious paper by Saunders, which doesn’t try to derive the Born rule, but instead carefully argues that the mod-squared amplitudes play all the roles we require from objective probabilities. I find it curious because he never mentions the interpretation of the mod-squared amplitudes as the amounts of worlds with a given outcome, which is what makes the metaphysics intelligible.

Posted in Uncategorised | 15 Comments