Another day working with MATLAB, another nasty surprise. I was solving a SDP that was already working well, did a minor change, and suddenly it started taking minutes to reach a solution, when usually it took seconds.
After a long investigation, I realized that the problem was that my input was not exactly Hermitian anymore. I had switched from writing manually the components of a matrix to defining the matrix as $A = \ket{\psi}\bra{\psi}$ for a complex vector $\ket{\psi}$. Now how could that possibly be a problem? It is a very simple theorem to prove that the outer product of a vector with itself is always Hermitian. Well, not with MATLAB. I can’t figure out why, but probably MATLAB uses some smart algorithm for the outer product that ruins this property due to floating-point arithmetic. What I could determine is that if you compute the outer product the naïve way, then even with MATLAB the result will be exactly Hermitian. Also, with Python the output is exactly Hermitian.
To solve this problem, one can simply redefine $A$ as $(A+A^\dagger)/2$, which is very fast and gives an exactly Hermitian output. I don’t like doing that, though, as $(A+A^\dagger)/2$ is always exactly Hermitian, even when $A$ is not even close to Hermitian. If I fucked up the definition of $A$ this will hide my mistake, and make debugging harder. Instead, what I did was write my own outer product function, computing it as $\ket{\psi}\otimes\bra{\phi}$. It is slower than whatever black magic MATLAB is doing, sure, but it is fast enough, and leaves my mistakes easily visible.
It dawned on me that this subtlety was probably the source of many bugs and slowdowns in my codes over the years. I decided to go hunting, and find out the fascinating properties of MATLAB algebra that doesn’t preserve Hermiticity where it should. It turns out that if $A$ and $B$ are exactly Hermitian, then both $A+B$ and $A \otimes B$ are exactly Hermitian, as they should. The problem is really when you do matrix multiplication.
Which shouldn’t produce further problems, right? After all, $AB$ is in general not Hermitian, so we don’t have anything to worry about. Except, of course, that the Hilbert-Schmidt inner product $\operatorname{tr}(A^\dagger B)$ is real for Hermitian $A,B$, and this is not true in MATLAB. Argh! $\operatorname{tr}(A^\dagger B)$ appears very often in the objective of a SDP, and it really needs to be a real number, as you can’t minimize a complex function. It turns out that this is not MATLAB’s fault, it is a fundamental problem of floating-point arithmetic.
An well-known but often-forgotten fact is the floating-point addition is not associative. Let $a=10^{20}$ and $b=10^{-20}$. Then with floating point numbers we see that $b+(a-a) = b$ but $(b+a)-a = 0$. Which is the issue with the Hilbert-Schmidt inner product: we get stuff like $z+\bar{z}+w+\bar{w}+\ldots$, but not in order. $z+\bar{z}$ is of course real, and $w+\bar{w}$ as well, but $z+(\bar{z}+w)+\bar{w}$? Nope, not in floating-point arithmetic.
Here I don’t think a proper solution is possible. One can of course write $\Re[\operatorname{tr}(A^\dagger B)]$, but that will hide your mistakes when $A$ and $B$ are not in fact Hermitian. A better idea is to vectorize $A$ and $B$ to avoid matrix multiplication, writing $\Re[\langle A|B\rangle]$. This only helps with speed, though, it doesn’t touch the real problem.
What do you mean by floating-point arithmetic in the context of complex numbers? I didn’t use MATLAB, but in every language I know If $z$ has complex type then $z+\bar{z}$ also has complex type, even though imaginary part is 0.
It’s the problem with types, computer program can’t automatically convert the type of a number to a different type based on it’s value. You always have to do it explicitly. In other words, number $1$ as real number and number $1$ as complex number are two different things in programming.
BTW, have you tried Julia programming language? It’s really good for mathematical computations. It’s a bit harder than Python, but much more powerful.
I mean the usual implementation, where a complex number $a + ib$ is represented as two floating-point numbers $a$ and $b$. I’m not talking about types here, though, I’m talking about the imaginary part being exactly zero. For example, try implementing the Hilbert-Schmidt inner product in a strongly typed language, like C++. Give it two matrices that are exactly Hermitian. The result will have complex type, of course, but the problem is that its imaginary part should be exactly zero, and it’s not.
I’m familiar with Julia, I even posted some Julia code here. I find it easier than Python, to be honest, the interface is pretty neat and the type system easy to use.
Ah, you’re talking about exact values of floats. From my experience it’s better to never think about exact floats. You just don’t :) Write 0.1 * 0.2 in your browser console and see the result. Why even try? :)
If you expect a result to be Hermitian then, for example, you can compare the norm of $(A-A^\dagger)$ with some small threshold.
Sure, small errors in large computations can result in a big error in the end. The best way to fix this is probably to split the computation into parts and use some error correction in intermediate steps.
BTW, Julia just takes the upper triangle of a matrix when converting to Hermitian https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Hermitian
I’m not doing it for shits and giggles, I’m doing it because I need to. You do need a matrix to be exactly Hermitian (well, normal, to be precise) for the spectral theorem to hold. What happens when it’s not will depend on the specific implementation of the specific algorithm you’re working with, but it’s nothing good. For example, if you ask MATLAB for the eigenvalues of a non-Hermitian matrix, it will give you, but the algorithm it uses is much slower, because it can’t use the spectral theorem. Also, if you demand a non-Hermitian matrix to be positive semidefinite, as usual in semidefinite programming, the solver will either just give up, because you’re asking for something impossible, or reinterpret your wish as something else. CVX, for example, will interpret it as asking for all elements of the matrix to be non-negative, which is a completely different thing and will ruin your program.
When your programming language has a Hermitian data type, like Julia, life is much easier, because it’s then impossible for the matrix to not be exactly Hermitian.
The progression of ideas “Just take the Hermitian part. Ah crap but debugging will get cloudier.” could be restated as “snap back to reality, oh! there goes gravity.” What’s mom’s spaghetti in MATLAB though? :D
Bullshittery aside, I didn’t get why a function “Hermitize,” which takes the Hermitian part of the matrix and raises a flag if the anti-Hermitian part is too large, wouldn’t solve both problems at once?
Sure, that works, I’m just annoyed with MATLAB that such a thing is needed to start with. In this case, though, it was easier to compute the outer product as $\ket{\psi}\otimes\bra{\phi}$. In general, when there is no reason for the matrix to be exactly Hermitian, Hermitize is the way to go.