Recovering solutions from non-commutative polynomial optimization problems

If you have used the NPA hierarchy to bound a Tsirelson bound, and want to recover a state and projectors that reproduce the computed expectation values, life is easy. The authors provide a practical method to do so, just compute a projector onto the span of the appropriate vectors. Now if you’re using its generalization, the PNA hierarchy, and want to recover a state and operators, you’re out of luck. The authors only cared about the case where full convergence had been achieved1, i.e., when the operator constraints they wanted to impose like $A \ge 0$ or $[A,B] = 0$ were respected. They didn’t use a nice little result by Helton, which implies that as long as you’re using full levels of the hierarchy2 you can always recover a solution respecting all the moment constraints, which are stuff like $\mean{A} \ge 0$ or $\mean{A^2 B} = 2$. This in turn implies that if you only have moment constraints, no operator constraints, then the hierarchy always converges at a finite level! This is the only sense in which non-commutative polynomial optimization is simpler than the commutative case, so it is a pity to lose it.

In any case, going from Helton’s proof to a practical method to recover a solution requires shaving yaks. Therefore, I decided to write it up in a blog post, to help those in a similar predicament, which most likely include future me after I forget how to do it.

For concreteness, suppose we have a non-commutative polynomial optimization problem with a single3 Hermitian operator $A$. Suppose we did a complete level 2, constructing the moment matrix associated to the sequences $(\id, A, A^2)$ with whatever constraints you want (they don’t matter), and solved the SDP, obtaining a 3×3 Hermitian matrix4
\[ M = \begin{pmatrix} \mean{\id} & \mean{A} & \mean{A^2} \\
& \mean{A^2} & \mean{A^3} \\
& & \mean{A^4} \end{pmatrix} \]Now we want to recover the solution, i.e., we want to reconstruct a state $\ket{\psi}$ and operator $A$ such that e.g. $\bra{\psi}A^3\ket{\psi} = \mean{A^3}$.

The first step is to construct a matrix $K$ such that $M = K^\dagger K$. This can always be done since $M$ is positive semidefinite, one can for example take the square root of $M$. That’s a terrible idea, though, because $M$ is usually rank-deficient, and using the square root will generate solutions with unnecessarily large dimension. To get the smallest possible solution we compute $M$’s eigendecomposition $M = \sum_i \lambda_i \ketbra{m_i}{m_i}$, and define $K = \sum_{i;\lambda_i > 0} \sqrt{\lambda_i}\ketbra{i}{m_i}$. Of course, numerically speaking $\lambda_i > 0$ is nonsense, you’ll have to choose a threshold for zero that is appropriate for your numerical precision.

If we label each column of $K$ with an operator sequence, i.e., $K = (\ket{\id}\ \ket{A}\ \ket{A^2})$, then their inner products match the elements of the moment matrix, i.e., $\langle \id | A^2 \rangle = \langle A | A \rangle = \mean{A^2}$. This means that we can take $\ket{\psi} = \ket{\id}$, and our task reduces to constructing an operator $A$ such that
\begin{gather*}
A\ket{\id} = \ket{A} \\
A\ket{A} = \ket{A^2} \\
A = A^\dagger
\end{gather*}This is clearly a linear system, but not in a convenient form. The most irritating part is the last line, which doesn’t even look linear. We can get rid of it by substituting it in the first two lines and taking the adjoint, which gives us
\begin{gather*}
A\ket{\id} = \ket{A} \\
A\ket{A} = \ket{A^2} \\
\bra{\id}A = \bra{A} \\
\bra{A}A = \bra{A^2}
\end{gather*}To simplify things further, we define the matrices $S_A = (\ket{\id}\ \ket{A})$ and $L_A = (\ket{A}\ \ket{A^2})$ to get
\begin{gather*}
A S_A = L_A \\
S_A^\dagger A = L_A^\dagger
\end{gather*}which is nicer but not quite solvable. To turn this into a single equation I used my favourite isomorphism, the Choi-Jamiołkowski. Usually it’s used to represent superoperators as matrices, but it can also be used one level down to represent matrices as vectors. If $X$ is a $m \times n$ matrix, its Choi representation is
\[ |X\rangle\rangle = \id_n \otimes X |\id_n\rangle\rangle,\] where
\[ |\id_n\rangle\rangle = \sum_{i=0}^{n-1}|ii\rangle. \] This is the same thing as the vec function in col-major programming languages. We also need the identity
\[ |XY\rangle\rangle = \id \otimes X |Y\rangle\rangle = Y^T \otimes \id |X\rangle\rangle \] with which we can turn our equations into
\[ \begin{pmatrix} S_A^T \otimes \id_m \\
\id_m \otimes S_A^\dagger \end{pmatrix} |A\rangle\rangle = \begin{pmatrix} |L_A\rangle\rangle \\
|L_A^\dagger\rangle\rangle \end{pmatrix} \]where $m$ is the number of rows of $S_A$. Now the linear system has the form $Ax = b$ that any programming language can handle. In Julia for instance you just do A \ b .

It’s important to emphasize that a solution is only guaranteed to exist if the vectors come from a moment matrix coming from a full level of the hierarchy. And indeed we can find a counterexample when this is not the case. If we were dealing instead with
\[ M = \begin{pmatrix} \mean{\id} & \mean{A} & \mean{B} & \mean{BA} \\
& \mean{A^2} & \mean{AB} & \mean{ABA} \\
& & \mean{B^2} & \mean{B^2A} \\
& & & \mean{AB^2A} \end{pmatrix} \] then a possible numerical solution for it is
\[ M = \begin{pmatrix} 1 & 1 & 0 & 0 \\
& 1 & 0 & 0 \\
& & 1 & 0 \\
& & & 1 \end{pmatrix} \] This solution implies that $\ket{\id} = \ket{A}$, but if we apply $B$ to both sides of the equation we get $\ket{B} = \ket{BA}$, which is a contradiction, as the solution also implies that $\ket{B}$ and $\ket{BA}$ are orthogonal.

I also want to show how to do it for the case of variables that are not necessarily Hermitian. In this case a full level of the hierarchy needs all the operators and their conjugates, so even level 2 is annoying to write down. I’ll do level 1 instead:
\[ M = \begin{pmatrix} \mean{\id} & \mean{A} & \mean{A^\dagger} \\
& \mean{A^\dagger A} & \mean{{A^\dagger}^2} \\
& & \mean{AA^\dagger} \end{pmatrix} \] The fact that this is a moment matrix implies that $\mean{A} = \overline{\mean{A^\dagger}}$, which is crucial for a solution to exist. As before we construct $K$ such that $M = K^\dagger K$, and label its columns with the operator sequences $K = (\ket{\id}\ \ket{A}\ \ket{A^\dagger})$. The equations we need to respect are
\begin{gather*}
A\ket{\id} = \ket{A} \\
A^\dagger \ket{\id} = \ket{A^\dagger}
\end{gather*} or more conveniently
\begin{gather*}
A\ket{\id} = \ket{A} \\
\bra{\id} A = \bra{A^\dagger}
\end{gather*} We use again the Choi-Jamiołkowski isomorphism to turn this into a single equation
\[ \begin{pmatrix} \ket{\id}^T \otimes \id_m \\
\id_m \otimes \bra{\id} \end{pmatrix} |A\rangle\rangle = \begin{pmatrix} \ket{A} \\
\overline{\ket{A^\dagger}} \end{pmatrix} \]and we’re done.

Posted in Uncategorised | Leave a comment

Announcing Ket

I’m happy to announce the 0.3 release of Ket.jl. This is a library to do quantum information and optimization in Julia; it is the second project I alluded to in my Julia post. The goal is to free the new generations from the tyranny of MATLAB by providing the means of production in a completely open-source ecosystem.

You might have noticed that this is version 0.3, and correctly assumed that this is not the first release. It has been publicly accessible, yet unannounced for 6 months already, due to a fundamental problem: it still needed MATLAB to do the crucial task of upperbounding Tsirelson bounds. This has now been solved by replacing the dependence Moment with Erik Woodhead’s package QuantumNPA. Another problem that has been solved is that the entire package is designed to be capable of handling arbitrary precision when desired. This was not possible with Moment due to MATLAB being an antediluvian abomination, but was rather easy with QuantumNPA.

With these problems solved, Ket is still far from ready, but it is in a good state to be built upon. Therefore I think this is a good moment to announce it publicly, to attract users and collaborators. I want to emphasize that this is a collaborative effort; I haven’t programmed even half of the library. It wouldn’t have happened without the invaluable help of Sébastien Designolle, Carlos de Gois, Lucas Porto, and Peter Brown.

We have written Ket to use it for our own research, but we’d like it to be useful for the wider community. Just not so wide that we don’t understand the code anymore. Also, we’d like to keep our focus on quantum information and optimization; we’re intentionally not including anything to do quantum simulation, since this job is already well taken care of by QuantumOptics.jl and QuantumToolbox.jl. We’re also not going to do quantum computing, that’s Yao.jl‘s job.

Now it’s time to hear from you: found a bug? Want a feature? Please open an issue in Github. Want to contribute some code? Also please open an issue to discuss it before doing a pull request.

Posted in Uncategorised | 4 Comments

Writing off APS

My review about semidefinite programming was accepted by Reviews of Modern Physics. Great! After acceptance, nothing happens for three months. Then, the tragedy: we get the proofs. Now I understand what took them so long. It takes time to thoroughly destroy the text, the equations, and the references.

First, the text. They changed everything to US spelling, which is understandable. What is less understandable is that they eliminated any adjectives such as “unfortunate”, “convenient”, “brief”, “important”, “independent”, “natural”, “interesting”, etc. Apparently the text was not dry enough, they must scrub it clean of any hint that this review was written by human beings that might have opinions about the subject. For some mysterious reason, “above” is a forbidden word, only “aforementioned” is good enough. They also made a point of changing absolutely everything to passive voice, sentence structure or understandability be damned. This is just the tip of the iceberg, they have a huge list of arcane style rules that include not writing “we” in abstracts, not writing “e.g.” anywhere, replacing slashes with and/or, and some bizarre rule about hyphens that I don’t understand but ruined half of the hyphenated expressions5. The result is that the text now reads as if it was written by a robot having an epileptic attack.

The worst part, though, is that they wantonly rewrote sentences in the middle of proofs of theorems, presumably because they felt their formulation was more elegant. The only disadvantage is that it made the proofs wrong. I would have thought it is obvious that you shouldn’t rewrite text that you don’t understand, but ignoring this is at least consistent with their pattern of breathtaking incompetence.

Second, the equations. LaTeX is not good enough for them. No, they use some proprietary abomination to typeset the paper for printing, and have some conversion script to map LaTeX into their format. Which will randomize the alignment of the equations, and rewrite every inline fraction $\frac{a}{b}$ as $a/b$. Which wouldn’t be so bad, if it didn’t change $\frac{a}{b}c$ to $a/b\,c$. But hey, what’s a little ambiguity next to conforming to the style rules?

Then, the bibliography. The pricks have some really strange rules about linking to the published versions only by DOIs, that somehow involve randomly removing some of the DOIs we had included, and removing links that are not DOIs. Such as the links to the solvers and libraries the readers can use to implement the algorithms we describe. Who would care about that, right? Certainly not the people who would read a review about SDPs?

As a bonus point, these morons still haven’t figured out Unicode in bloody 2024. Apparently é is their favourite glyph, so I work at the department of “Fésica”, Antonio Acín is sometimes named Acén, Máté Farkas is Mété Farkas, García-Sáez became Garcéa-Séez, Károly Pál is Kéroly Pél, and so on, and so on, and so on.

So no, I give up. I have neither the time nor the will to go through this huge review again and correct everything they fucked up. My intention was to just let it stay wrong, but thankfully I have a young and energetic co-author, Alex, who was determined to go through the review word-by-word and fix all the errors they introduced. The text can’t be fixed, though, as the mutilation there was intentional. So I’m officially writing off the APS version. The “published” version on the APS website will be the pile of shit that they wrote. The carefully written and typeset version that we wrote is the one on the arXiv.

In the future, I hope to never publish with APS again. My dream typesetting is the one done by Quantum, which is none at all. I don’t need to pay some ignorant to butcher my paper, nor do I need to waste my time putting Humpty Dumpty together again.

Posted in Uncategorised | Comments Off on Writing off APS

Sharing the refereeing burden

I’ve just finished writing yet another referee report. It’s not fun. It’s duty. Which got me wondering: am I doing my part, or am I a parasite? I get much more referee requests than I have the time to do, and I always feel a bit guilty to decline one. So the question has a practical implication, can I decline with a clear conscience, or should I grit my teeth and try to get more refereeing done?

To answer that, first I have to find out how many papers I have refereed. That’s impossible, I’m not German. My records are spotty and chaotic. After a couple of hours of searching, I managed to find 77 papers. These are certainly not all, but I can’t be missing much, so let’s stick with 77.

Now, I need to compute the refereeing burden I have generated. I have submitted 33 papers for publication, and each paper usually gets 2 or 3 referees. Let’s call it 2.5. Then the burden is 82.5, right? Well, not so fast, because my coauthors share the responsibility for generating this refereeing burden. Should I divide by the average number of coauthors then? Again, not so fast, because I can’t put this responsibility on the shoulders of coauthors that are still not experienced enough to referee. On the same light, I should exclude from my own burden the papers I published when I shouldn’t be refereeing. Therefore I exclude 3 papers. From the remaining 30, I count 130 experienced coauthors, making my burden $30*2.5/(130/30) \approx 17.3$.

Wow. That’s quite the discrepancy. I feel like a fool. I’m doing more than 4 times my fair share. Now I’m curious: am I the only one with such a unbalance, or does the physics community consists 20% of suckers and 80% parasites?

More importantly, is there anything that can be done about it? This was one of the questions that were discussed in a session about publishing in the last Benasque conference, but we couldn’t find a practicable solution. Even from the point of view of a journal it’s very hard to know who the parasites are, because people usually publish with several different journals, and the numbers of papers in any given journal is too small for proper statistics.

For example, let’s say you published 3 papers in Quantum, with 4 (experienced) coauthors on average, and each paper got 2 referee reports. This makes your refereeing burden 1.5. Now let’s imagine that during this time the editors of Quantum asked you to referee 2 papers. You declined them both, claiming once that you were too busy, and another time that it was out of your area of expertise. Does this make you a parasite? Only you know.

Let’s imagine then an egregious case, of someone that published 10 papers with Quantum, got 20 requests for refereeing from them, and declined every single one. That’s a $5\sigma$ parasite. What do you do about it? Desk reject their next submission, on the grounds of parasitism? But what about their coauthors? Maybe they are doing their duty, why should they be punished as well? Perhaps one should compute a global parasitism score from the entire set of authors, and desk reject the paper if it is above a certain threshold? It sounds like a lot of work for something that would rarely happen.

Posted in Uncategorised | 2 Comments

A superposition is not a valid reference frame

I’ve just been to the amazing Quantum Redemption conference in Sweden, organized by my friend Armin Tavakoli. I had a great time, attended plenty of interesting talks, and had plenty of productive discussions outside the talks as well. I’m not going to write about any of that, though. Having a relentlessly negative personality, I’m going to write about the talk that I didn’t like. Or rather, about its background. The talk was presenting some developing ideas and preliminary results, it was explicitly not ready for publication, so I’m not going to publish it here2. But the talk didn’t make sense because its background doesn’t make sense, and that is well-published, so it’s fair game.

I’m talking about the paper Quantum mechanics and the covariance of physical laws in quantum reference frames by my friends Flaminia, Esteban, and Časlav. The basic idea is that if you can describe a particle in a superposition from the laboratory’s reference frame, you can just as well jump to the particle’s reference frame, from which the particle is well-localized and the laboratory is in a superposition. The motivations for doing this are impeccable: the universality of quantum mechanics, and the idea that reference frames must be embodied in physical systems. The problem is that you can’t really attribute a single point of view to a superposition.

By linearity, the members of a superposition will evolve independently, so why would they have a joint identity? In general you affect some members of a superposition without affecting the others, there is no mechanism transmitting information across the superposition so that a common point of view could be achieved. The only sort of “interaction” possible is interference, and that necessitates erasing all information that differentiates the members of the superposition, so it’s rather unsatisfactory.

In any case, any reference frame worth of the name will be a complex quantum system, composed of a huge amount of atoms. It will decohere very very quickly, so any talk of interfering a superposition of reference frames is science fiction. Such gedankenexperimente can nevertheless be rather illuminating, so I’d be curious about how they describe a Wigner’s friend scenario, as there the friend is commonly described as splitting in two, and I don’t see a sensible of attributing a single point of view to the two versions. Alas, as far as I understand their quantum reference frames formalism was not meant to describe such scenarios, and as far as I can tell they have never done so.

This is all about interpretations, of course. Flaminia, Esteban, and Časlav are all devout single-worlders, and pursue with religious zeal the idea of folding back the superpositions into a single narrative. I, on the other hand, pray at the Church of the Larger Hilbert Space, so I find it heresy to see these highly-decohered independently-evolving members of a superposition as anything other than many worlds.

People often complain that all this interpretations talk has no consequences whatsoever. Well, here is a case where it unquestionably does: the choice of interpretation was crucial to their approach to quantum reference frames, which is crucial to their ultimate goal of tackling quantum gravity. Good ideas tend to be fruitful, and bad ideas sterile, so whether this research direction ultimately succeeds is an indirect test of the underlying interpretation.

You might complain that this is still on the metatheoretical level, and is anyway just a weak test. It is a weak test indeed: the Big Bang theory was famously created by a Catholic priest, presumably looking for a fiat lux moment. Notwithstanding its success, I’m still an atheist. Nevertheless, weak evidence is still evidence, and hey, if you don’t like metaphysics interpretations are really not for you. If you do like metaphysics, however, you might also be interest in metatheory ;)

Posted in Uncategorised | Comments Off on A superposition is not a valid reference frame

First Valladolid paper is out!

A couple of days ago I finally released the first Julia project I had alluded to, a technique to compute key rates in QKD using proper conic methods. The paper is out, and the github repository is now public. It’s the first paper from my new research group in Valladolid, and I’m very happy about it. First because of the paper, and secondly because now I have students to do the hard work for me.

The inspiration for this paper came from the Prado museum in Madrid. I was forced to go there as a part of a group retreat (at the time I was part of Miguel Navascués’ group in Vienna), and I was bored out of my mind looking at painting after painting2. I then went to the museum cafe and started reading some papers on conic optimization to pass the time. To my great surprise, I found out that there was an algorithm capable of handling the relative entropy cone, and moreover it had already been implemented in the solver Hypatia, which to top it off was written in Julia! Sounded like Christmas had come early. ¿Or maybe I had a jamón overdose?

Life wasn’t so easy, though: the relative entropy cone was implemented only for real matrices, and the complex case is the only one that matters2. I thought no problem, I can just do the generalization myself. Then I opened the source code, and I changed my mind. This cone is a really nasty beast. The PSD cone is a child’s birthday in comparison. I was too busy with other projects at the time to seriously dedicate to it, so I wrote to the developers of Hypatia, Chris Coey and Lea Kapelevich, asking whether they were interested in doing the complex case. And they were! I just helped a little bit with testing and benchmarking.

Now I can’t really publish a paper based only on doing this, but luckily the problem turned out to be much more difficult: I realized that the relative entropy cone couldn’t actually be used to compute key rates. The reason is somewhat technical: in order to solve the problem reliably one cannot have singular matrices, it needs to be formulated in terms of their support only (the technical details are in the paper). But if one reformulates the problem in terms of the support of the matrices, it’s no longer possible to write it in terms of the relative entropy cone3.

I had to come up with a new cone, and implement it from scratch. Now that’s enough material for a paper. To make things better, by this time I was already in Valladolid, so my students could do the hard work. Now it’s done. ¡Thanks Andrés, thanks Pablo, thanks Miguel!

Posted in Uncategorised | Comments Off on First Valladolid paper is out!

I got a Ramón y Cajal!

I’m quite happy, this is pretty much the best grant available in Spain, it gives me a lot of money for 5 years, including a PhD student and a postdoc. But the reason I’m posting about it here is to share some information about the grant system that I believe is not widely known.

My grant proposal was evaluated with 98.73 points out of 100. Sounds very high, until you learn that the cutoff was 97.27. I sincerely believe that my grant proposal was excellent and deserved to be funded, as self-serving as this belief may be, but I can’t believe there was a meaningful difference between my proposal and one that got 97 points. There was clearly too many good proposals, and the reviewers had to somehow divide a bounded budget between them. I think it’s unavoidable that the result is somewhat random.

I have been on the other side before: I’ve had before grants that had been highly evaluated and nevertheless rejected. I think now I can say that it was just bad luck. I have also been on the reviewing side: twice I received some excellent grants to evaluate, and gave very positive evaluations to them, sure that they would be funded. They weren’t.

Everyone that has applied to a grant knows how much work it is, and how frustrating is it to be rejected after all. Still, one should keep in mind that rejection doesn’t mean you are a bad researcher. It is the norm, there’s just way too little money available to fund everyone that deserves it.

Posted in Uncategorised | Comments Off on I got a Ramón y Cajal!

MATLAB is dead, long live Julia!

Since I’ve first used MATLAB I have dreamt of finding a replacement for it. Not only it is expensive, proprietary software, but also a terrible programming language. Don’t get me wrong, I’m sure it was amazing when it was invented, but this was in the 70s. We know better now. I’ve had to deal with so many fascinating bugs due to its poor design decisions!

Most recently, I had code that was failing because 'asdf' and "asdf" are very similar, but not exactly the same. The former is a character vector, and the latter is a string. Almost always you can use them interchangeably, but as it turns out, not always. Another insane design decision is that you don’t need to declare variables to work on them. I declared a matrix called constraints, worked on it a bit, and then made an assignment with a typo contraints(:,1) = v. Instead of throwing an error like any sane programming language, MATLAB just silently created a new variable contraints. Perhaps more seriously, MATLAB does not support namespaces. If you are using two packages that both define a function called square, you have to be careful about the order in which they appear in the MATLAB path to get the correct one. If you need both versions? You’re just out of luck.

Perhaps I should stop ranting at this point, but I just can’t. Another thing that drives me mad is that loop indices are always global, so you must be very careful about reusing index names. This interacts greatly with another “feature” of MATLAB, that i is both the imaginary unity and a valid variable name. If you have for i=1:3, i, end followed by a = 2+3*i you’re not getting a complex number, you’re getting 114. The parser is downright stone age, it can’t handle simple operators like +=, or double indexing like a(2)(4). To vectorize a matrix there’s no function, just the operator :, so if you want to vectorize a(2) you have to either call reshape(a(2),[],1), or define x = a(2) and then do x(:). Which of course leads to everyone and their dog defining a function vec() for convenience, which then all conflict with each other because of the lack of namespaces.

I wouldn’t be surprised to find out that the person that designed the function length() tortured little animals as a child. If you call it on a vector, it works as expected. But if you call it on an $m \times n$ matrix, what should it do? I think the most sensible option is to give the number of elements, $mn$, but it’s also defensible to give $m$ or $n$. MATLAB takes of course the fourth option, $\max(m,n)$. I could also mention the lack of support for types, Kafkaesque support for optional function arguments, mixing row and column vectors… It would keep me ranting forever. But enough about MATLAB. What are the alternatives?

The first one I looked at is Octave. It is open source, great, but its fundamental goal is to be compatible with MATLAB, so it cannot fix its uncountable design flaws. Furthermore, it isn’t 100% compatible with MATLAB2, so almost always when I have to use MATLAB because of a library, this library doesn’t work with Octave. If I give up on compatibility, then I can use the Octave extensions that make the programming language more tolerable. But it’s still a terrible programming language, and is even slower than MATLAB, so there isn’t much point.

Then came Python. No hope of compatibility here, but I accept that, no pain no gain. The language is a joy to program with, but I absolutely need some optimization libraries (that I’m not going to write myself). There are two available, CVXPY and PICOS. Back when I first looked at them, about a decade ago, neither supported complex numbers, so Python was immediately discarded. In the meanwhile they have both added support, so a few years ago I gave it a shot. It turns out both are unbearably slow. CVXPY gets an extra negative point for demanding its own version of partial trace and partial transposition3, but that’s beside the point, I can’t use them for any serious problem anyway. I did end up publishing a paper using Python code, but it was only because the optimization problem I was solving was so simple that performance wasn’t an issue.

After that I gave up for several years, resigned to my fate of programming with MATLAB until it drove me to suicide. But then Sébastien Designolle came to visit Vienna, and told me of a programming language called Julia that was even nicer to program than Python, almost as fast as C++, and had an optimization library supporting every solver under the Sun, JuMP. I couldn’t believe my ears. Had the promised land been there all along? After all, I knew Julia, it just had never occurred to me that I could do optimization with it.

I immediately asked Sébastien if it supported complex numbers, and if it needed funny business to accept partial transpositions. Yes, and no, respectively. Amazing! To my relief JuMP had just added support for complex numbers, so I hadn’t suffered all these years for nothing. I started testing the support for complex numbers, and it turned out to be rather buggy. However, the developers Benoît Legat and Oscar Dowson fixed the bugs as fast as I could report them, so now it’s rock solid. Dowson in particular seemed to never sleep, but as it turned out he just lives in New Zealand.

Since then I have been learning Julia and writing serious code with it, and I can confirm, the language is all that Sébastien promised and more. Another big advantage is the extensive package ecosystem, where apparently half of academia has been busy solving the problems I need. The packages can be easily installed from within Julia itself and have proper support for versions and dependencies4. Also worth mentioning is the powerful type system, that makes it easy to write functions that work differently for different types, and switch at runtime between floats and complex floats and double floats and quadruple floats and arbitrary precision floats. This makes it easy to do optimization with arbitrary precision, as JuMP in fact allows for the solvers that support it (as far as I know they are Hypatia, COSMO, and Clarabel). As you might know this is a nightmare in MATLAB.5

Now Julia is not perfect. It has some design flaws. Some are because it wants to be familiar to MATLAB users, such as having 1-indexed arrays and col-major orientation. Some are incomprehensible (why is Real not a subtype of Complex? Why is M[:,1] a copy instead of a view?). It’s not an ideal language, it’s merely the best that exists. Maybe in a couple of decades someone will release a 0-indexed version called Giulia and we’ll finally have flying cars and world peace.

It’s a bit ironic to write this blog post after I have released a paper that is based on a major MATLAB library that I wrote together with Andy, Moment6. In my defence, Andy wrote almost all of the code, the vast majority is in C++, and we started it before Sébastien’s visit. And it demonstrated beyond any doubt that MATLAB is completely unsuitable for any serious programming7. I promise that when I have time (haha) I’ll rewrite it in Julia.

But the time for irony is over. My new projects are all in Julia, and I’ll start releasing them very soon. In the meanwhile, I wrote a tutorial to help refugees from MATLAB to settle in the promised land.

EDIT: Because of Moment I had to program in MATLAB again now, 7 months after this post. A nightmare. It took me a bloody week to find a bug in a large program. As it turns out, MATLAB gets very creative when you divide an integer by a float. I think every other programming language in the world gives you a float as an answer. MATLAB gives an integer. Which integer, though? Not the floor or the ceiling of the division, no, that would be too simple and predictable. MATLAB rounds to the nearest integer. The result of these two fascinating design decisions is that floor(5/2) is 3 when 5 is encoded as an integer. I can hear the madman who designed this cackling hysterically alone in the night thinking of the curse he laid down for generations of programmers.

Posted in Uncategorised | 5 Comments

The smallest uninteresting number is 198

A well-known joke/theorem is that all natural numbers are interesting. The proof goes as follows: assume that there exists a non-empty set of uninteresting natural numbers. Then this set has a smallest element. But that makes it interesting, so we have a contradiction. Incidentally, this proof applies to the integers and, with a bit of a stretch, to the rationals. It definitely does not applies to the reals, though, no matter how hard you believe in the axiom of choice.8

I was wondering, though, what is the smallest uninteresting number. It must exist, because we fallible humans are undeterred by the mathematical impossibility and simply do not find most natural numbers interesting.

Luckily, there is a objective criterion to determine whether a natural number is interesting: is there a Wikipedia article written about it? I then went through the Wikipedia articles about numbers, and found the first gap at 198. But now since this number became interesting, surely we should write a Wikipedia article about it?

This gives rise to another paradox: if we do write a Wikipedia article about 198 it will cease to be interesting, and of course we should delete the Wikipedia article about it. But this will make it interesting again, and we should again write the article.

You can see this paradox playing out in the revision history of the Wikipedia page: the article is indeed being repeatedly created and deleted.

Posted in Uncategorised | 7 Comments

SDPs with complex numbers

For mysterious reasons, some time ago I found myself reading SeDuMi’s manual. To my surprise, it claimed to support SDPs with complex numbers. More specifically, it could handle positive semidefiniteness constraints on complex Hermitian matrices, instead of only real symmetric matrices as all other solvers.

I was very excited, because this promised a massive increase in performance for such problems, and in my latest paper I’m solving a massive SDP with complex Hermitian matrices.

The usual way to handle complex problems is to map them into real ones via the transformation
\[ f(M) = \begin{pmatrix} \Re(M) & \Im(M) \\ \Im(M)^T & \Re(M) \end{pmatrix}. \]The spectrum of the $f(M)$ consists of two copies of the spectrum of $M$, and $f(MN) = f(M)f(N)$, so you can see that one can do an exact mapping. The problem is that the matrix is now twice as big: the number of parameters it needs is roughly twice what was needed for the original complex matrix2, so this wastes a bit of memory. More problematic, the interior-point algorithm needs to calculate the Cholesky decomposition, which has complexity $O(d^3)$, so we are slowing the algorithm down by a factor of 8!

I wrote then a trivial SDP to test SeDuMi, and of course it failed. A more careful reading of the documentation showed that I was formatting the input incorrectly, so I fixed that, and it failed again. Reading the documentation again and again convinced me that the input was now correct: it must have been a bug in SeDuMi itself.

Lured by the promise of a 8 times speedup, I decided to dare the dragon, and looked into the source code of SeDuMi. It was written more than 20 years ago, and the original developer is dead, so you might understand why I was afraid. Luckily the code had comments, otherwise how could I figure out what it was supposed to do when it wasn’t doing it?

It turned out to be a simple fix, the real challenge was only understanding what was going on. And the original developer wasn’t to blame, the bug had been introduced by another person in 2017.

Now with SeDuMi working, I proceeded to benchmarking. To my despair, the promised land wasn’t there: there was no difference at all in speed between the complex version and the real version. I was at the point of giving up, when Johan Löfberg, the developer of YALMIP kindly pointed out to me that SeDuMi also needs to do a Cholesky decomposition of the Hessian, a $m \times m$ matrix where $m$ is the number of constraints. The complexity of Sedumi is then roughly $O(m^3 + d^3)$ using complex numbers, and $O(m^3 + 8d^3)$ when solving the equivalent real version. In my test problem I had $m=d^2$ constraints, so no wonder I couldn’t see any speedup.

I wrote then another test SDP, this time with a single constraint, and voilà! There was a speedup of roughly 4 times! Not 8, probably because computing the Cholesky decomposition of a complex matrix is harder than of a real matrix, and there is plenty of other stuff going on, but no matter, a 4 times speedup is nothing to sneer at.

The problem now that this was only when calling SeDuMi directly, which requires writing the SDP in canonical form. I wasn’t going to do that for any nontrivial problem. It’s not hard per se, but requires the patience of a monk. This is why we have preprocessors like YALMIP.

To take advantage of the speedup, I had to adapt YALMIP to handle complex problems. Löfberg is very much alive, which makes things much easier.

As it turned out, YALMIP already supported complex numbers but had it disabled, presumably because of the bug in SeDuMi. What was missing was support for dualization of complex problems, which is important because sometimes the dualized version is much more efficient than the primal one. I went to work on that.

Today Löberg accepted the pull request, so right now you can enjoy the speedup if you use the latest git of SeDuMi and YALMIP. If that’s useful to you please test and report any bugs.

What about my original problem? I benchmarked it, and using the complex version of SeDuMi did give me a speedup of roughly 30%. Not so impressive, but definitely welcome. The problem is that SeDuMi is rather slow, and even using the real mapping MOSEK can solve my problem faster than it.

I don’t think it was pointless going through all that, though. First because there are plenty of people that use SeDuMi, as it’s open source, unlike MOSEK. Second because now the groundwork is laid down, and if another solver appears that can handle complex problems, we will be able to use that capability just by flipping a switch.

Posted in Uncategorised | 3 Comments