First Valladolid paper is out!

A couple of days ago I finally released the first Julia project I had alluded to, a technique to compute key rates in QKD using proper conic methods. The paper is out, and the github repository is now public. It’s the first paper from my new research group in Valladolid, and I’m very happy about it. First because of the paper, and secondly because now I have students to do the hard work for me.

The inspiration for this paper came from the Prado museum in Madrid. I was forced to go there as a part of a group retreat (at the time I was part of Miguel Navascués’ group in Vienna), and I was bored out of my mind looking at painting after painting1. I then went to the museum cafe and started reading some papers on conic optimization to pass the time. To my great surprise, I found out that there was an algorithm capable of handling the relative entropy cone, and moreover it had already been implemented in the solver Hypatia, which to top it off was written in Julia! Sounded like Christmas had come early. ¿Or maybe I had a jamón overdose?

Life wasn’t so easy, though: the relative entropy cone was implemented only for real matrices, and the complex case is the only one that matters2. I thought no problem, I can just do the generalization myself. Then I opened the source code, and I changed my mind. This cone is a really nasty beast. The PSD cone is a child’s birthday in comparison. I was too busy with other projects at the time to seriously dedicate to it, so I wrote to the developers of Hypatia, Chris Coey and Lea Kapelevich, asking whether they were interested in doing the complex case. And they were! I just helped a little bit with testing and benchmarking.

Now I can’t really publish a paper based only on doing this, but luckily the problem turned out to be much more difficult: I realized that the relative entropy cone couldn’t actually be used to compute key rates. The reason is somewhat technical: in order to solve the problem reliably one cannot have singular matrices, it needs to be formulated in terms of their support only (the technical details are in the paper). But if one reformulates the problem in terms of the support of the matrices, it’s no longer possible to write it in terms of the relative entropy cone3.

I had to come up with a new cone, and implement it from scratch. Now that’s enough material for a paper. To make things better, by this time I was already in Valladolid, so my students could do the hard work. Now it’s done. ¡Thanks Andrés, thanks Pablo, thanks Miguel!

Posted in Uncategorised | Leave a comment

I got a Ramón y Cajal!

I’m quite happy, this is pretty much the best grant available in Spain, it gives me a lot of money for 5 years, including a PhD student and a postdoc. But the reason I’m posting about it here is to share some information about the grant system that I believe is not widely known.

My grant proposal was evaluated with 98.73 points out of 100. Sounds very high, until you learn that the cutoff was 97.27. I sincerely believe that my grant proposal was excellent and deserved to be funded, as self-serving as this belief may be, but I can’t believe there was a meaningful difference between my proposal and one that got 97 points. There was clearly too many good proposals, and the reviewers had to somehow divide a bounded budget between them. I think it’s unavoidable that the result is somewhat random.

I have been on the other side before: I’ve had before grants that had been highly evaluated and nevertheless rejected. I think now I can say that it was just bad luck. I have also been on the reviewing side: twice I received some excellent grants to evaluate, and gave very positive evaluations to them, sure that they would be funded. They weren’t.

Everyone that has applied to a grant knows how much work it is, and how frustrating is it to be rejected after all. Still, one should keep in mind that rejection doesn’t mean you are a bad researcher. It is the norm, there’s just way too little money available to fund everyone that deserves it.

Posted in Uncategorised | Leave a comment

MATLAB is dead, long live Julia!

Since I’ve first used MATLAB I have dreamt of finding a replacement for it. Not only it is expensive, proprietary software, but also a terrible programming language. Don’t get me wrong, I’m sure it was amazing when it was invented, but this was in the 70s. We know better now. I’ve had to deal with so many fascinating bugs due to its poor design decisions!

Most recently, I had code that was failing because 'asdf' and "asdf" are very similar, but not exactly the same. The former is a character vector, and the latter is a string. Almost always you can use them interchangeably, but as it turns out, not always. Another insane design decision is that you don’t need to declare variables to work on them. I declared a matrix called constraints, worked on it a bit, and then made an assignment with a typo contraints(:,1) = v. Instead of throwing an error like any sane programming language, MATLAB just silently created a new variable contraints. Perhaps more seriously, MATLAB does not support namespaces. If you are using two packages that both define a function called square, you have to be careful about the order in which they appear in the MATLAB path to get the correct one. If you need both versions? You’re just out of luck.

Perhaps I should stop ranting at this point, but I just can’t. Another thing that drives me mad is that loop indices are always global, so you must be very careful about reusing index names. This interacts greatly with another “feature” of MATLAB, that i is both the imaginary unity and a valid variable name. If you have for i=1:3, i, end followed by a = 2+3*i you’re not getting a complex number, you’re getting 114. The parser is downright stone age, it can’t handle simple operators like +=, or double indexing like a(2)(4). To vectorize a matrix there’s no function, just the operator :, so if you want to vectorize a(2) you have to either call reshape(a,[],1), or define x = a(2) and then do x(:). Which of course leads to everyone and their dog defining a function vec() for convenience, which then all conflict with each other because of the lack of namespaces.

I wouldn’t be surprised to find out that the person that designed the function length() tortured little animals as a child. If you call it on a vector, it works as expected. But if you call it on an $m \times n$ matrix, what should it do? I think the most sensible option is to give the number of elements, $mn$, but it’s also defensible to give $m$ or $n$. MATLAB takes of course the fourth option, $\max(m,n)$. I could also mention the lack of support for types, Kafkaesque support for optional function arguments, mixing row and column vectors… It would keep me ranting forever. But enough about MATLAB. What are the alternatives?

The first one I looked at is Octave. It is open source, great, but its fundamental goal is to be compatible with MATLAB, so it cannot fix its uncountable design flaws. Furthermore, it isn’t 100% compatible with MATLAB2, so almost always when I have to use MATLAB because of a library, this library doesn’t work with Octave. If I give up on compatibility, then I can use the Octave extensions that make the programming language more tolerable. But it’s still a terrible programming language, and is even slower than MATLAB, so there isn’t much point.

Then came Python. No hope of compatibility here, but I accept that, no pain no gain. The language is a joy to program with, but I absolutely need some optimization libraries (that I’m not going to write myself). There are two available, CVXPY and PICOS. Back when I first looked at them, about a decade ago, neither supported complex numbers, so Python was immediately discarded. In the meanwhile they have both added support, so a few years ago I gave it a shot. It turns out both are unbearably slow. CVXPY gets an extra negative point for demanding its own version of partial trace and partial transposition3, but that’s beside the point, I can’t use them for any serious problem anyway. I did end up publishing a paper using Python code, but it was only because the optimization problem I was solving was so simple that performance wasn’t an issue.

After that I gave up for several years, resigned to my fate of programming with MATLAB until it drove me to suicide. But then Sébastien Designolle came to visit Vienna, and told me of a programming language called Julia that was even nicer to program than Python, almost as fast as C++, and had an optimization library supporting every solver under the Sun, JuMP. I couldn’t believe my ears. Had the promised land been there all along? After all, I knew Julia, it just had never occurred to me that I could do optimization with it.

I immediately asked Sébastien if it supported complex numbers, and if it needed funny business to accept partial transpositions. Yes, and no, respectively. Amazing! To my relief JuMP had just added support for complex numbers, so I hadn’t suffered all these years for nothing. I started testing the support for complex numbers, and it turned out to be rather buggy. However, the developers Benoît Legat and Oscar Dowson fixed the bugs as fast as I could report them, so now it’s rock solid. Dowson in particular seemed to never sleep, but as it turned out he just lives in New Zealand.

Since then I have been learning Julia and writing serious code with it, and I can confirm, the language is all that Sébastien promised and more. Another big advantage is the extensive package ecosystem, where apparently half of academia has been busy solving the problems I need. The packages can be easily installed from within Julia itself and have proper support for versions and dependencies4. Also worth mentioning is the powerful type system, that makes it easy to write functions that work differently for different types, and switch at runtime between floats and complex floats and double floats and quadruple floats and arbitrary precision floats. This makes it easy to do optimization with arbitrary precision, as JuMP in fact allows for the solvers that support it (as far as I know they are Hypatia, COSMO, and Clarabel). As you might know this is a nightmare in MATLAB.5

Now Julia is not perfect. It has some design flaws. Some are because it wants to be familiar to MATLAB users, such as having 1-indexed arrays and col-major orientation. Some are incomprehensible (why is Real not a subtype of Complex? Why is M[:,1] a copy instead of a view?). It’s not an ideal language, it’s merely the best that there exists. Maybe in a couple of decades someone will release a 0-indexed version called Giulia and we’ll finally have flying cars and world peace.

It’s a bit ironic to write this blog post after I have released a paper that is based on a major MATLAB library that I wrote together with Andy, Moment6. In my defence, Andy wrote almost all of the code, the vast majority is in C++, and we started it before Sébastien’s visit. And it demonstrated beyond any doubt that MATLAB is completely unsuitable for any serious programming7. I promise that when I have time (haha) I’ll rewrite it in Julia.

But the time for irony is over. My new projects are all in Julia, and I’ll start releasing them very soon. In the meanwhile, I wrote a tutorial to help refugees from MATLAB to settle in the promised land.

Posted in Uncategorised | 4 Comments

The smallest uninteresting number is 198

A well-known joke/theorem is that all natural numbers are interesting. The proof goes as follows: assume that there exists a non-empty set of uninteresting natural numbers. Then this set has a smallest element. But that makes it interesting, so we have a contradiction. Incidentally, this proof applies to the integers and, with a bit of a stretch, to the rationals. It definitely does not applies to the reals, though, no matter how hard you believe in the axiom of choice.8

I was wondering, though, what is the smallest uninteresting number. It must exist, because we fallible humans are undeterred by the mathematical impossibility and simply do not find most natural numbers interesting.

Luckily, there is a objective criterion to determine whether a natural number is interesting: is there a Wikipedia article written about it? I then went through the Wikipedia articles about numbers, and found the first gap at 198. But now since this number became interesting, surely we should write a Wikipedia article about it?

This gives rise to another paradox: if we do write a Wikipedia article about 198 it will cease to be interesting, and of course we should delete the Wikipedia article about it. But this will make it interesting again, and we should again write the article.

You can see this paradox playing out in the revision history of the Wikipedia page: the article is indeed being repeatedly created and deleted.

Posted in Uncategorised | 7 Comments

SDPs with complex numbers

For mysterious reasons, some time ago I found myself reading SeDuMi’s manual. To my surprise, it claimed to support SDPs with complex numbers. More specifically, it could handle positive semidefiniteness constraints on complex Hermitian matrices, instead of only real symmetric matrices as all other solvers.

I was very excited, because this promised a massive increase in performance for such problems, and in my latest paper I’m solving a massive SDP with complex Hermitian matrices.

The usual way to handle complex problems is to map them into real ones via the transformation
\[ f(M) = \begin{pmatrix} \Re(M) & \Im(M) \\ \Im(M)^T & \Re(M) \end{pmatrix}. \]The spectrum of the $f(M)$ consists of two copies of the spectrum of $M$, and $f(MN) = f(M)f(N)$, so you can see that one can do an exact mapping. The problem is that the matrix is now twice as big: the number of parameters it needs is roughly twice what was needed for the original complex matrix2, so this wastes a bit of memory. More problematic, the interior-point algorithm needs to calculate the Cholesky decomposition, which has complexity $O(d^3)$, so we are slowing the algorithm down by a factor of 8!

I wrote then a trivial SDP to test SeDuMi, and of course it failed. A more careful reading of the documentation showed that I was formatting the input incorrectly, so I fixed that, and it failed again. Reading the documentation again and again convinced me that the input was now correct: it must have been a bug in SeDuMi itself.

Lured by the promise of a 8 times speedup, I decided to dare the dragon, and looked into the source code of SeDuMi. It was written more than 20 years ago, and the original developer is dead, so you might understand why I was afraid. Luckily the code had comments, otherwise how could I figure out what it was supposed to do when it wasn’t doing it?

It turned out to be a simple fix, the real challenge was only understanding what was going on. And the original developer wasn’t to blame, the bug had been introduced by another person in 2017.

Now with SeDuMi working, I proceeded to benchmarking. To my despair, the promised land wasn’t there: there was no difference at all in speed between the complex version and the real version. I was at the point of giving up, when Johan Löfberg, the developer of YALMIP kindly pointed out to me that SeDuMi also needs to do a Cholesky decomposition of the Hessian, a $m \times m$ matrix where $m$ is the number of constraints. The complexity of Sedumi is then roughly $O(m^3 + d^3)$ using complex numbers, and $O(m^3 + 8d^3)$ when solving the equivalent real version. In my test problem I had $m=d^2$ constraints, so no wonder I couldn’t see any speedup.

I wrote then another test SDP, this time with a single constraint, and voilà! There was a speedup of roughly 4 times! Not 8, probably because computing the Cholesky decomposition of a complex matrix is harder than of a real matrix, and there is plenty of other stuff going on, but no matter, a 4 times speedup is nothing to sneer at.

The problem now that this was only when calling SeDuMi directly, which requires writing the SDP in canonical form. I wasn’t going to do that for any nontrivial problem. It’s not hard per se, but requires the patience of a monk. This is why we have preprocessors like YALMIP.

To take advantage of the speedup, I had to adapt YALMIP to handle complex problems. Löfberg is very much alive, which makes things much easier.

As it turned out, YALMIP already supported complex numbers but had it disabled, presumably because of the bug in SeDuMi. What was missing was support for dualization of complex problems, which is important because sometimes the dualized version is much more efficient than the primal one. I went to work on that.

Today Löberg accepted the pull request, so right now you can enjoy the speedup if you use the latest git of SeDuMi and YALMIP. If that’s useful to you please test and report any bugs.

What about my original problem? I benchmarked it, and using the complex version of SeDuMi did give me a speedup of roughly 30%. Not so impressive, but definitely welcome. The problem is that SeDuMi is rather slow, and even using the real mapping MOSEK can solve my problem faster than it.

I don’t think it was pointless going through all that, though. First because there are plenty of people that use SeDuMi, as it’s open source, unlike MOSEK. Second because now the groundwork is laid down, and if another solver appears that can handle complex problems, we will be able to use that capability just by flipping a switch.

Posted in Uncategorised | 3 Comments

SDPs are not cheat codes

I usually say the opposite to my students: that SDPs are the cheat codes of quantum information. That if you can formulate your problem as an SDP you’re in heaven: there will be an efficient algorithm for finding numerical solutions, and duality theory will often allow you to find analytical solutions. Indeed in the 00s and early 10s one problem after the other was solved via this technique, and a lot of people got good papers of out it. Now the low-hanging fruit has been picked, but SDPs remain a powerful tool that is routinely used.

I’m just afraid that people have started to believe this literally, and use SDPs blindly. But they don’t always work, you need to be careful about their limitations. It’s hard to blame them, though, as the textbooks don’t help. The usually careful The Theory of Quantum Information by Watrous is silent on the subject. It simply states Slater’s condition, which is bound to mislead students into believing that if Slater’s condition is satisfied the SDP will work. The standard textbook, Boyd and Vandenberghe’s Convex Optimization is much worse. It explicitly states

Another advantage of primal-dual algorithms over the barrier method is that they can work when the problem is feasible, but not strictly feasible (although we will not pursue this).

Which is outright false. I contacted Boyd about it, and he insisted that it was true. I then gave him examples of problems where primal-dual algorithms fail, and he replied “that’s simply a case of a poorly specified problem”. Now that made me angry. First of all because it amounted to admitting that his book is incorrect, as it has no such qualification about “poorly specified problems”, and secondly because “poorly specified problems” is rather poorly specified. I think it’s really important to tell the students for which problems SDPs will fail.

One problem I told Boyd about was to minimize $x$ under the constraint that
\[ \begin{pmatrix} x & 1 \\ 1 & t \end{pmatrix} \ge 0.\]Now this problem satisfies Slater’s condition. The primal and dual objectives are bounded, and the problem is strictly feasible, i.e., there are values for $x,t$ such that the matrix there is positive semidefinite (e.g. $x=t=2$). Still, numerical solvers cannot handle it. Nothing wrong with Slater, he just claimed that if this holds then we have strong duality, that is, the primal and dual optimal values will match. And they do.

The issue is very simple: the optimal value is 0, but there is no $x,t$ where it is attained, you only get it in the limit of $x\to 0$ with $t=1/x$. And no numerical solver will be able to handle infinity.

Now this problem is so simple that the failure is not very dramatic. SeDuMi gives something around $10^{-4}$ as an answer. Clearly wrong, as usually it gets within $10^{-8}$ of the right answer, but still, that’s an engineer’s zero.

One can get a much more nasty failure with a slightly more complicated problem (from here): let $X$ by a $3\times 3$ matrix, and minimize $X_{22}$ under the constraints that $X \ge 0, X_{33} = 0$, and $X_{22} + 2X_{13} = 1$. It’s easy enough to solve it by hand: the constraint $X_{33} = 0$ implies that the entire column $(X_{13},X_{23},X_{33})$ must be equal to zero, otherwise $X$ cannot be positive semidefinite2. In turns this implies that $X_{22} = 1$, and we’re done. That’s nothing to optimize. If you give this to SeDuMi it goes crazy, and gives 0.1319 as an answer, together with the message that it had numerical problems.

Now my point is not that SeDuMi should be able to solve nasty problems like this. It’s that we should teach the students to identify this nastiness so they don’t get bitten in the ass when it’s not so obvious.

And they are being bitten in the ass. I’m writing about this because I just posted a comment on the arXiv, correcting a paper that had mistakenly believed that when you add constraints to the NPA hierarchy the answers are still trustworthy. Don’t worry, it’s still possible to solve the constrained NPA hierarchy, you just need to be careful. To learn how, read the comment. Here I want to talk about how to identify nasty problems.

One might think that looking at the manual of a specific solver would help. After all, who could better tell which problems can’t be solved than the people who actually implemented the algorithm? Indeed it does help a bit. In the MOSEK Cookbook they give several examples of nasty problems it cannot handle. At least this dispels Boyd’s naïveté that everything can be solved. But they are rather vague, there’s no characterization of nasty or well-behaved problems.

The best I could find was a theorem in Nesterov and Nemirovskii’s ancient book “Interior-Point Polynomial Algorithms in Convex Programming”, which says that if the primal is strictly feasible and its feasible region is bounded, or if both the primal and the dual are strictly feasible, then there will exist primal and dual solutions that reproduce the optimal value (i.e., the optimum will not be reached only in the limit). Barring the usual limitations of floating point numbers, this should indeed be a sufficient condition for the SDP to be well-behaved. Hopefully.

It’s not a necessary condition, though. To see that, consider a primal-dual pair in standard form
\begin{equation*}
\begin{aligned}
\min_X \quad & \langle C,X \rangle \\
\text{s.t.} \quad & \langle \Gamma_i, X \rangle = -b_i \quad \forall i,\\
& X \ge 0
\end{aligned}
\end{equation*}\begin{equation*}
\begin{aligned}
\max_{y} \quad & \langle b, y \rangle \\
\text{s.t.} \quad & C + \sum_i y_i \Gamma_i \ge 0
\end{aligned}
\end{equation*}and assume that they are both strictly feasible, so that there exist primal and dual optimal solutions $X^*,y^*$ such that $\langle C,X^* \rangle = \langle b, y^* \rangle$. We can then define a new SDP by redefining $C’ = C \oplus \mathbf{0}$ and $\Gamma_i’ = \Gamma_i \oplus \mathbf{0}$, where $\oplus$ is the direct sum, and $\mathbf{0}$ is an all-zeros matrix of any size you want. Now the dual SDP is not strictly feasible anymore2, but it remains as well-behaved as before; the optimal dual solution doesn’t change, and an optimal primal solution is simply $X^* \oplus \mathbf{0}$. We can also do a change of basis to mix this all-zero subspace around, so the cases where it’s not necessary are not so obvious.

Still, I like this condition. It’s rather useful, and simple enough to teach. So kids, eat your vegetables, and check whether your primal and dual SDPs are strictly feasible.

Posted in Uncategorised | 7 Comments

Redefining classicality

I’m in a terrible mood. Maybe it’s just the relentless blackness of Austrian winter, but I do have rational reasons to be annoyed. First is the firehose of nonsense coming from the wormhole-in-a-quantum-computer people, that I wrote about in my previous post. Second are two talks that I attended to here in Vienna in the past couple of weeks. One by Spekkens, claiming that he can explain interference phenomena classically, and another by Perche, claiming that a classical field can transmit entanglement, and therefore that the Bose-Marletto-Vedral proposed experiment wouldn’t demonstrate that the gravitational field must be quantized.

These talks were about very different subjects, but both were based on redefining “classical” to be something completely divorced from our understanding of classicality in order to reach their absurd conclusions. One might object that this is just semantics, you can define “classical” to be whatever you want, but I’d like to emphasize that semantics was the whole point of these talks. They were not trying to propose a physically plausible model, they only wanted to claim that some effect previously understood as quantum was actually classical.

The problem is that “classical” is not well-defined, so each author has a lot of freedom in adapting the notion to their purposes. One could define “classical” to strictly mean classical physics, in the sense of Newtonian mechanics, Maxwell’s equations, or general relativity. That’s not an interesting definition, though, first because you can’t explain even a rock with classical physics, and secondly because the context of these discussion is whether one could explain some specific physical effect with a new, classical-like theory, not whether current classical physics explains it (as the answer is always no).

One then needs to choose the features one wishes this classical-like theory to have. Popular choices are to have local dynamics, deterministic evolution, and trivial measurements (i.e., you can just read off the entire state without complications).

Spekkens’s “classical” theory violates two of these desiderata, it’s not local and you can’t measure the state. The entire theory is based on an “epistemic restriction”, that you have some incompatible variables that by fiat you can’t measure simultaneously. For me that already kills the motivation for studying such a theory: you’re copying the least appealing feature of quantum mechanics! And quantum mechanics at least has an elegant theory of measurement to determine what you can or can’t measure simultaneously, here you have just a bare postulate. But what makes the whole thing farcical is the nonlocality of the theory. In the model of the Mach-Zehnder interferometer, the “classical” state must pick up the phase of the right arm of the interferometer even if it actually went through the left arm. This makes the cure worse than the disease, quantum mechanics is local and if the particle went through the left it won’t pick up any phase from the right.

When I complained to Spekkens about this, he replied that one couldn’t interpret the vacuum state as implying that the particle was not there, that we should interpret the occupation number as just an abstract degree of freedom without consequence to whether the mode is occupied or not. Yeah, you can do that, but can you seriously call that classical? And again, this makes the theory stranger than quantum mechanics.

Let’s turn to Perche’s theory now. Here the situation is more subtle: we’re not trying to define what a classical theory is, but what a hybrid quantum-classical theory is. In a nutshell, the Bose-Marletto-Vedral proposal is that if we entangle two particles via the gravitational interaction, this implies that the gravitational field must be quantized, because classical fields cannot transmit entanglement.

The difficulty with this argument is that there’s no such thing as a hybrid quantum-classical theory where everything is quantum but the gravitational field is classical (except in the case of a fixed background gravitational field). Some such Frankesteins have been proposed, but always as strawmen that fail spectacularly. To get around this, what people always do is abstract away from the physics and examine the scenario with quantum information theory. Then it’s easy to prove that it’s not possible to create entanglement with local operations and classical communication (LOCC). The classical gravitational field plays the role of classical communication, and we’re done.

Perche wanted to do a theory with more meat, including all the physical degrees of freedom and their dynamics. A commendable goal. What he did was to calculate the Green function from the classical gravitational interaction (which subsumes the fields), and postulate that it should also be the Green function when everything else is quantum. The problem is that you don’t have a gravitational field anymore, and no direct way to determine whether it is quantum or classical. The result he got, however, was that this classical Green function was better at producing entanglement than the quantum one. I think that’s a dead giveaway that his (implicit) field was not classical.

The audience would have none of that, and complained several times that his classical field was anything but. Perche would agree that “quantum-controlled classical” would better describe his gravitational field, but would defend anyway calling it just “classical field” as an informal description.

If you want a theory with more meat, my humble proposal is to not treat classical systems as fundamentally classical, but accept reality: the world is quantum, and “classical” systems are quantum systems that are in a state that is invariant under decoherence. And to make them invariant under decoherence we simply decohere them. In this way we can start with a well-motivated and non-pathological quantum theory for the whole system, and simply decohere the “classical” subsystems as often as needed.

It’s easy to prove that the classical subsystems cannot transmit entanglement in such a theory. Let’s say you have a quantum system $|\psi\rangle$ and a classical mediator $|C\rangle$. After letting them interact via any unitary whatsoever, you end up in the state
\[ \sum_{ij} \alpha_{ij}|\psi_i\rangle|C_j\rangle. \] Now we decohere the classical subsystem (in the $\{|C_j\rangle\}$ basis, without loss of generality), obtaining
\[ \sum_{ijk} \alpha_{ij}\alpha_{kj}^*|\psi_i\rangle\langle\psi_k|\otimes|C_j\rangle\langle C_j|. \] This is equal to
\[ \sum_j p_j \rho_j \otimes |C_j\rangle\langle C_j|,\] where $p_j := \sum_i |\alpha_{ij}|^2$ and $\rho_j := \frac1{p_j}\sum_{ik} \alpha_{ij}\alpha_{kj}^*|\psi_i\rangle\langle\psi_k|$, which is an explicitly separable state, which therefore has no entanglement to transmit to anyone.

Posted in Uncategorised | 2 Comments

The death of Quanta Magazine

Yesterday Quanta Magazine published an article written by Natalie Wolchover, Physicists Create a Wormhole Using a Quantum Computer. I’m shocked and disappointed. I thought Quanta Magazine was the most respectable source of science news, they have published several quality, in-depth articles in difficult topics. But this? It falls so far below any journalistic standard that the magazine is dead to me. The problem is, if they write such bullshit about topics that I do understand, how can I trust their reporting on topics that I do not?

Let’s start with the title. No, scientists haven’t created a wormhole using a quantum computer. They haven’t even simulated one. They simulated some aspects of wormhole dynamics under the crucial assumption that the holographic correspondence of the Sachdev–Ye–Kitaev model holds. Without this assumption they just have a bunch of qubits being entangled, no relation to wormholes.

The article just takes this assumption for granted, and cavalierly goes on to say nonsense like “by manipulating the qubits, the physicists then sent information through the wormhole”. Shortly afterwards, though, it claims that “the experiment can be seen as evidence for the holographic principle”. But didn’t you just assume it was true? And how on Earth can this test the holographic principle? It’s not as if we can do experiments with actual wormholes in order to check if their dynamics match the holographic description.

The deeper problem, though, is that the article never mentions that this simulation can easily be done in a classical computer. Much better, in fact, than in a quantum computer. The scientific content of the paper is not about creating wormholes or investigating the holographic principle, but about getting the quantum computer to work.

As bizarre and over-the-top the article is, it is downright sober compared to the cringeworthy video they released. While the article correctly points out that one needs negative energy to make a wormhole traversable, and that negative energy does not exist, and that the experiment merely simulated a negative energy pulse, the video has no such qualms. It directly stated that the experiment created a negative energy shockwave and used it to transmit qubits through the wormhole.

For me the worst part of the video was at 11:53, where they showed a graph with a bright point labelled “negative energy peak” on it. The problem is that this is not a plot of data, it’s just a drawing, with no connection to the experiment. Lay people will think they are seeing actual data, so this is straightforward disinformation.

Now how did this happen? It seems that Wolchover just published uncritically whatever bullshit Spiropulu told her. Instead of, you know, checking with other people whether it made sense? The article does quote two critics, Woit and Loll. Woit mentions that the holographic correspondence simulates an anti-de Sitter space, whereas our universe is a de Sitter space. Loll mentions that the experiment simulates 2d spacetime, whereas our universe is 4d. Both criticisms are true, of course, but they don’t touch the reason why the Quanta article is nonsense.

EDIT: Quanta has since then changed the title of the article to add the qualification that the wormhole is holographic, and deleted the tweet that said “Physicists have built a wormhole and successfully sent information from one end to the other”. I commend them for taking a step in the right direction, but they haven’t addressed the main problem, which is the content of the article and the video, so this is not enough to get back on my list of reliable sources. Wolchover herself is unrepentant, explicitly denying that she was fooled by the scientists behind the research. Well, the bullshit is her fault then.

Posted in Uncategorised | 13 Comments

Doing induction like a physicist

If something is true for dimension 2, it doesn’t mean much. We know that 2 is very special. The set of valid quantum states is a sphere, we can have a basis of unitary and Hermitian matrices for the Hilbert space, extremal quantum correlations can always be produced by projective measurements, you can have a noncontextual hidden variable model, and so on. None of that holds for larger dimensions.

If something is true for dimensions 2 and 3, that’s already much better, but by no means conclusive. There exists a simple formula for SIC-POVMs for these dimensions, that doesn’t work for 4 onwards. If we go beyond dimensions, there are more interesting examples: for 2 qubits, there exists a single class of entangled states, namely the $|00\rangle+|11\rangle$ class. For 3 qubits, there are two classes, namely the $|000\rangle + |111\rangle$ and $|001\rangle + |010\rangle + |100\rangle$ classes. One could hope that for 4 qubits we would have three classes, but no, there are infinitely many. The same non-pattern happens for Bell inequalities. For bipartite inequalities with two outcomes per party, if each has 2 settings then there exists only one facet inequality, the CHSH. If each party has 3 settings, then there are two facets, CHSH and I3322. If they have 4 settings, though, there are 175 different facets. Ditto if you fix the number of settings to be 2, and increase the number of outcomes. For 2 outcomes, again only CHSH, for 3 outcomes you have CHSH and CGLMP, and for 4 outcomes at least 34 facets.

If something is true for dimensions 2, 3, and 4, then it will be also true for dimension 5, so we skip this one.

If something is true for dimensions 2, 3, 4, and 5, it is very good evidence that it will be true for all dimensions, but it is still not enough for a proof by physicist induction. We have even primes, odd primes, and prime powers, but no non-trivial composite numbers. MUBs are a good example, they exist for dimensions 2, 3, 4, and 5, but not 63.

If something is true for dimensions 2, 3, 4, 5, and 6, that’s it. It will be true for all dimensions. SIC-POVMs are a good example. It is not too hard to construct analytical examples for dimensions 2, 3, 4, 5, and 6, and from that we know that they always exist2.

This is of course not true in mathematics, which is a demanding and capricious mistress. The most horrifying example I know is the logarithmic integral. Quantum mechanics, on the other hand, is a mother. She will not humiliate you, she will not lead you astray. She only wants you to do a bit of honest work with small dimensions, and she will reward you with the truth.

The only potential counterexample I know is the Tsirelson bound of the I3322 inequality, which is supposed to be 0.85 for dimensions 2 to 8, and from dimension 9 onwards it starts increasing. I don’t count it as an actual counterexample because nobody managed to actually prove that the Tsirelson bound is 0.85 for dimensions 2 to 6, there is just numerical evidence. And I do demand a proof for this part of physicist induction, the reasoning is already flimsy enough as it is.

Posted in Uncategorised | Comments Off on Doing induction like a physicist

Do not project your relative frequencies onto the non-signalling subspace

It happens all the time. You make an experiment on nonlocality or steering, and you want to test whether the data you collected is compatible with hidden variables. You plug them into the computer and the answer is no, they are not. You examine them a bit more closely, and you see that they are also incompatible with quantum mechanics, because they are signalling. After a bit of cold sweating, you realize that they are very close to non-signalling, all the trouble happened because the computer needs them to be exactly non-signalling. You then relax, project them onto the non-signalling subspace, and call it a day.

Never do this. Experimental data is sacred. You can’t arbitrarily chop it off to fit your Procrustean bed.

First of all, remember that even if your probabilities are strictly non-signalling, the probability of obtaining relative frequencies that respect the no-signalling equations exactly is effectively zero. There’s nothing wrong with “signalling” frequencies. On the contrary, if some experimentalist reported relative frequencies that were exactly non-signalling I’d be very suspicious. What you should get in a real experiment are frequencies that are very close to non-signalling, but not exactly3.

“That doesn’t help me”, you reply. “I can accept signalling frequencies all day long, but the computer still needs them to be non-signalling in order to test hidden variable models.”

Sure, but what the computer needs are non-signalling probabilities, that you should infer from the signalling frequencies.

“Exactly, and to infer non-signalling probabilities I just project the frequencies onto the non-signalling subspace.”

No! Inferring probabilities from frequencies is the oldest problem in statistics. People have studied this problem to death, and came up with several respectable methods. There’s no point in reinventing the wheel. And if you do insist in reinventing the wheel, you’d better be damn sure that it’s round.

To make it clear that this projection technique is a square wheel, I’ll examine in detail a toy version of the problem of getting non-signalling probabilities. The simplest case of the real problem involves getting from a 12-dimensional space of frequencies to a 8-dimensional non-signalling subspace, which is too much to do by hand for even the most dedicated PhD students2. Instead I’ll go for the minimal scenario, a 2-dimenionsal space of frequencies that goes down to a 1-dimensional subspace.

Consider then an experiment with 3 possible outcomes, 0,1, and 2, where our analogue of the no-signalling assumption is that $p_1 = 2p_0$. The possible relative frequencies we can observe are in triangle bounded by $p_0 \ge 0$, $p_1 \ge 0$, and $p_0 + p_1 \le 1$. The possible probabilities are just the line $p_1 = 2p_0$ inside this triangle. Again, if we generate data according to these probabilities they will almost surely not fall in the $p_1 = 2p_0$ line. Let’s say we observed $n_0$ outcomes 0, $n_1$ outcomes 1, and $n_2$ outcomes 2. What is the probability $p_0$ we should infer from this data?

Let’s start with the projection technique. Compute the relative frequencies $f_0 = n_0/n$ and $f_1 = n_1/n$, and project the point $(f_0,f_1)$ onto the line $p_1 = 2p_0$. Which projection, though? There are infinitely many. The most natural one is an orthogonal projection, but that already weirds me out. Why on Earth are we talking about angles between probability distributions? They are vectors of real numbers, sure, we can compute angles, but we shouldn’t expect them to mean anything. Doing it anyway, we get that
\[ p_0 = \frac15(f_0 + 2f_1)\quad\text{and}\quad p_1 = \frac25(f_0 + 2f_1),\]which do not respect positivity: if $f_0=0$ and $f_1=1$ we have that $p_0+p_1 = 6/5$, which implies that $p_2 = -1/5$.3 What now? Arbitrarily make the probabilities positive? Invent some other method, such as minimizing the distance from the point $(f_0,f_1)$ to the line $p_1 = 2p_0$? Which distance then? Euclidean? Total variation? No, it’s time to admit that it was a bad idea to start with and open a statistics textbook.

You’ll find there a very popular method, maximum likelihood. We write the likelihood function
\[L(p_0) = p_0^{n_0} (2 p_0)^{n_1} (1-3p_0)^{n_2},\]which is just the probability of the data given the parameter $p_0$, and maximize it, finding
\[p_0 = \frac13(f_0 + f_1)\quad\text{and}\quad p_1 = \frac23(f_0+f_1).\]Now maximum likelihood is probably the shittiest statistical method one can used, but at least the answer makes sense. The resulting probabilities are normalized, and they mean something: they are those which assigned the highest probability to the observed data. My point is that even the worst statistical method is better than arbitrarily chopping off your data. Moreover, it’s very easy to do, so there’s no excuse.

If you want to do things properly, though, you have to do Bayesian inference. You have to multiply the likelihood function by the prior, normalize that, and compute the expected $p_0$ from the posterior in order to obtain a point estimate. It’s a bit more work, but in this case is still easy, and for a flat prior it gives
\[p_0 = \frac13\frac{n_0 + n_1+1}{n+1}\quad\text{and}\quad p_1 = \frac23\frac{n_0 + n_1+1}{n+1}.\]Besides getting a more sensible answer and the ability to change the prior, the key advantage of Bayesian inference is that it gives you the whole posterior distribution. It naturally provides you a confidence region around your estimate, the beloved error bars any experimental paper must include. It’s harder to do, sure, but none of you got into physics because it was easy, right?

Posted in Uncategorised | Comments Off on Do not project your relative frequencies onto the non-signalling subspace