Recently two nice papers appeared on the arXiv, the most recent by Galley and Masanes, and the oldest by López Grande et al.. They are both – although a bit indirectly – about the age old question of the equivalence between proper and improper mixtures.

A proper mixture is when you prepare the states $\ket{0}$ and $\ket{1}$ with probability $p$ and $1-p$, obtaining the density matrix

\[ \rho_\text{proper} = p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1}.\] An improper mixture is when you prepare the entangled state $\sqrt{p}\ket{0}\ket{0} + \sqrt{1-p}\ket{1}\ket{1}$ and discard the second subsystem, obtaining the density matrix \[ \rho_\text{improper} = p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1}.\] The question is then why do these different preparation procedures give rise to the same statistics (and therefore it is legitimate to represent them with the same density matrix).

Well, do they? I’m not so sure about that! The procedure to prepare the proper mixture is rather vague, so we can’t really answer whether is it appropriate to represent it via the density matrix $\rho_\text{proper}$. To remove the vagueness, I asked an experimentalist how she prepared the state $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$ that was necessary for an experiment. “Easy”, she told me, “I prepared $n$ copies of $\ket{0}$, $n$ copies of $\ket{1}$, and then combined the statistics.

This sounds like preparing the state $\ket{0}^{\otimes n} \otimes \ket{1}^{\otimes n}$, not like preparing $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$. Do they give the same statistics? Well, if I measure all states in the $Z$ basis, exactly $\frac12$ of the results will be $0$. But if I measure $\frac12(\ket{0}\bra{0}+\ket{1}\bra{1})$ in the $Z$ basis $2n$ times, the probability that $\frac12$ of the results are $0$ is

\[ \frac{1}{2^{2n}} {2n \choose n} \approx \frac{1}{\sqrt{n\pi}},\] so just by looking at this statistic I can guess with high probability which was the preparation. It is even easier to do that if I disregard her instructions and look at the order of the results: getting $n$ zeroes followed by $n$ ones is a dead giveaway.

Maybe one should prepare these states using a random number generator instead? If one uses the function `rand()`

from MATLAB to decide whether to prepare $\ket{0}$ or $\ket{1}$ at each round one can easily pass the two randomness tests I mentioned above. Maybe it can even pass all common randomness tests available in the literature, I don’t know how good `rand()`

is. But it cannot, however pass *all* randomness tests, as `rand()`

is a deterministic algorithm using a finite seed, and is therefore restricted to outputting computable sequences of bits. One can, in fact, attack it, and this is the core of the paper of López Grande et al., showing how one can distinguish a sequence of bits that came from `rand()`

from a truly random one. More generally, even the best pseudorandom number generators we have are designed to be indistinguishable from truly random sources only by polynomial-time tests, and fail against exponential-time algorithms.

Clearly pseudorandomness is not enough to generate proper mixtures; how about true randomness instead? Just use a quantum random number generator to prepare bits with probabilities $p$ and $1-p$, and use these bits to prepare $\ket{0}$ or $\ket{1}$. Indeed, this is what people do when they are serious about preparing mixed states, and the statistics really are indistinguishable from those of improper mixtures. But why? To answer that, we need to model the quantum random number generator physically. We start by preparing a “quantum coin” in the state

\[ \sqrt{p}\ket{H}+\sqrt{1-p}\ket{T},\] which we should measure in the $\{\ket{H},\ket{V}\}$ basis to generate the random bits. Going to the Church of the Larger Hilbert Space, we model the measurement as

\[ \sqrt{p}\ket{H}\ket{M_H}+\sqrt{1-p}\ket{T}\ket{M_T},\] and conditioned on the measurement we prepare $\ket{0}$ or $\ket{1}$, obtaining the state

\[ \sqrt{p}\ket{H}\ket{M_H}\ket{0}+\sqrt{1-p}\ket{T}\ket{M_T}\ket{1}.\] We then discard the quantum coin and the measurement result, obtaining finally

\[ p\ket{0}\bra{0} + (1-p)\ket{1}\bra{1},\] which is just the desired state, but now it is an improper mixture. So, at least in the Many-Worlds interpretation, there is no mystery about why proper and improper mixtures are equivalent: they are physically the same thing!

(A closely related question, which has a closely related answer, is why is it equivalent to prepare the states $\ket{0}$ or $\ket{1}$ with probability $\frac12$ each, or the states $\ket{+}$ or $\ket{-}$, again with probability $\frac12$? The equivalence fails for pseudorandomness, as shown by López Grande et al.; if we use true randomness instead, we are preparing the states

\[ \frac1{\sqrt{2}}(\ket{H}\ket{0}+\ket{T}\ket{1})\quad\text{or}\quad\frac1{\sqrt{2}}(\ket{H}\ket{+}+\ket{T}\ket{-})\] and discarding the coin. But note that if one applies a Hadamard to the coin of the first state one obtains the second, so the difference between then is just a unitary on a system that is discarded anyway; no wonder we can’t tell the difference! More generally, any two purifications of the same density matrix must be related by a unitary on the purifying system.)

Galley and Masanes want to invert the question, and ask for *which* quantum-like theories proper and improper mixtures are equivalent. To be able to tackle this question, we need to define what improper mixtures even are in a quantum-like theory. They proceed by analogy with quantum mechanics: if one has a bipartite state $\ket{\psi}$, and are doing measurements $E_i$ only on the first system, the probabilities one obtains are given by

\[ p(i) = \operatorname{tr}( (E_i \otimes \mathbb I) \ket{\psi}\bra{\psi} ),\] and the improper mixture is defined as the operator $\rho_\text{improper}$ for which

\[ p(i) = \operatorname{tr}( E_i \rho_\text{improper})\] for all measurements $E_i$.

In their case, they are considering a quantum-like theory that is still based on quantum states, but whose probabilities are not given by the Born rule $p(i) = \operatorname{tr}(E_i \ket{\phi}\bra{\phi})$, but by some more general function $p(i) = F_i (\ket{\phi})$. One can then define the probabilities obtained by local measurements on a bipartite state as

\[ p(i) = F_i \star \mathbb I (\ket{\psi}),\] for some composition rule $\star$ and trivial measurement $\mathbb I$, and from that an improper mixture as the operator $\omega_\text{improper}$ such that

\[ p(i) = F_i (\omega_\text{improper})\] for all measurements $F_i$.

Defining proper mixtures, on the other hand, is easy: if one can prepare the states $\ket{0}$ or $\ket{1}$ with probabilities $p$ and $1-p$, their proper mixture is the operator $\omega_\text{proper}$ such that for all measurements $F_i$

\[ p(i) = F_i(\omega_\text{proper}) = p F_i(\ket{0}) + (1-p) F_i(\ket{1}).\] That is, easy if one can generate true randomness that is not reducible to quantum-like randomness. I don’t think this makes sense, as one would have to consider a world where reductionism fails, or at least one where quantum-like mechanics is not the fundamental theory. Such non-reducible probabilities are uncritically assumed to exist anyway by people working on GPTs all the time1.

Now with both proper and improper mixtures properly defined, one can answer the question of whether they are equivalent: the answer is a surprising no, for any alternative probability rule that respects some basic consistency conditions. This has the intriguing consequence that if we were to modify the Born rule while keeping the rest of quantum mechanics intact, a wedge would be driven between the probabilities that come from the fundamental theory and some “external” probabilities coming from elsewhere. This would put the Many-Worlds interpretation under intolerable strain.

But such an abstract “no” result is not very interesting; I find it much more satisfactory to exhibit a concrete alternative to the Born rule where the equivalence fails. Galley and Masanes propose the function

\[ F_i(\ket{\psi}) = \operatorname{tr}(\hat F_i (\ket{\psi}\bra{\psi})^{\otimes 2})\] for some positive matrices $\hat F_i$ restricted by their consistency conditions. It is easy to see that the proper mixture of $\ket{0}$ and $\ket{1}$ described above is given by2

\[ \omega_\text{proper} = p \ket{00}\bra{00} + (1-p)\ket{11}\bra{11}.\] In quantum mechanics one would try to make it by discarding half of the state $\sqrt{p}\ket{0}\ket{0} + \sqrt{1-p}\ket{1}\ket{1}$. Here it doesn’t work, as nothing does, but I want to know what it gives us anyway. It is not easy to see that the improper mixture is given by the weirdo

\begin{multline} \omega_\text{improper} = (p^2 + \frac{p(1-p)}{3})\ket{00}\bra{00} + \\ \frac{2p(1-p)}{3} (\ket{01}+\ket{10})(\bra{01}+\bra{10}) + ((1-p)^2 + \frac{p(1-p)}{3})\ket{11}\bra{11}.\end{multline}

Hey hey!

I would say it’s also valid to view proper mixtures as something epistemic: maybe there is no physical process which leads to a proper maximally mixed state, but this just encodes that you don’t know anything about what the (say) spin is up to.

With this point of view, the non-purifiability result seems (correct me in the probable case I’m wrong) to imply a cute corollary.

Consider two scenarios: in one you have a perfectly isolated quantum system but you’re unsure what it’s initial state is, so you describe it by a proper mixed state. On the other scenario, you have a quantum system which is heavily interacting with degrees of freedom you cannot observe, so you partial trace over them and obtain an improper mixed state. Let’s just say in both cases you have zero knowledge about the state, so you describe it by the maximally mixed state.

In these scenarios you have ignorance of two types: in the first scenario you have ignorance about the system itself and in the second you have ignorance about how it interacts with other systems. Then from the result of Galley and Masanes it follows that one could distinguish between these two types of ignorance statistically, if we replaced the Born rule with something else.

I think we can already distinguish between these two types of ignorance statistically: the epistemic probabilities are rather weak. If the state is either $\ket{0}$ or $\ket{1}$ but I don’t know which, testing a few copies will be enough to distinguish this from the maximally mixed state obtained by tracing out half of an entangled state.

This is why I didn’t even mention this kind of probability. If you are serious about statistical indistinguishably, you need to tell me which process do you use to generate several copies of the state. What would be your proposal to have an epistemic proper mixture that is statistically indistinguishable from an improper mixture?

Let me be honest that I post here, because of your discussion with two philosophers on Scott Aaronson’s blog. The philosophers were tiring and wordy, and said at least one or two blatantly wrong things. Those blatantly wrong things were clarified by you (and others), and I think the philosophers more or less accepted that they had been wrong. But you got tired of them and replied with “discussion killers,” which showed you in an unnecessarily bad light. Which is sad, because you have read and thought much about the subject of quantum mechanics and probability, and you would have deserved a better result of the discussion.

In the epilogue at the end of very long essay on the History of Computation, Logic and Algebra the question “Do aliens have LISP or Scheme?” is raised, and a clear answer is given:

If we look from such a concrete perspective on probability and logic, we get similar surprises: The interesting thing is that repetition of nearly identical experiments is very common in both dead and living nature. That seems to indicate that probabilistic reasoning is very appropriate for nature, whereas strict logic has a lesser role to play. But where does strict logic occur in nature? It is the logic of language, and language is the means for communication. You need conventions for language to work. Bohr compared classical (non-quantum) physics to language, it is what you can communicate such that somebody else can try to reproduce your experiment.

You might wonder what this has to do with your post, and whether I even read your post. Yes, I have read your post, and it is nice to see how internally consistent the description of probability by quantum mechanics turns out to be. But still, there is a huge number of identical electrons and atoms and experiments which can be repeated under nearly identical conditions nearly infinitely often, and I think this plays a role why probability is so appropriate, and why it allows such a high precision. A paper by Ole Peters and Murray Gell-Mann made me wonder whether probability will always allow such a high precision. Of course, I read the paper because I admire Murray Gell-Mann, but I quote it because I was shocked by a negative public review of it on reddit, which even cited this comic. I am an amateur philosopher, who leans towards continental philosophy, and values the connection of abstract theories to the physical world as it appears to us.

Hi Mateus,

This is probably a too stupid question, as I am not sophisticated enough in QM, but I don’t see how do you get your statistics from measuring the proper density matrix 2n times in the Z axis. You prepared an entangled state and are ready to do a sequence of 2n local measurements. The first measurement collapses the wave function. The 2n-1 remaining measurements should give the same eigenvalue. So one half of the times you get 0,0, … 0 and one half you should get 1,1,…1. Disregarding noise, what am I overlooking here?

Hi Max,

This is what you get if you make repeated measurements on the same particle, but I’m talking about preparing many copies of the entangled state and doing one measurement in each copy.