I have just been to Perimeter Institute, by generous invitation of Thomas Galley. I gave a talk there about my recent-ish paper, Probability in two deterministic universes. Since I have already blogged about it here, I’m not writing about it again, but rather what I discussed with Thomas about his derivations of the Born rule.

I was interested in his most recent derivation, that besides structural assumptions about measurements and probabilities, needs two substantial assumptions: no-signalling and the possibility of state estimation, or *state estimation* for brevity. No-signalling is well-motivated and well-understood, but I was curious about *state estimation*. What does it mean? How does a theory that violates it looks like?

The precise definition is that *state estimation* is true if there is a finite set of measurement outcomes1 whose probabilities completely determine the quantum state. Or conversely, if *state estimation* fails, then for any finite set of measurement outcomes there are two different quantum states that give the same probabilities for all these outcomes. This is clearly not obeyed by quantum mechanics in the case of infinite-dimensional systems — you need to know the probability at each point in space to completely determine the wavefunction, which is an infinite set of outcomes2 — so the authors require it only for finite-dimensional systems.

How bad is it violate it for finite-dimensional systems, then? What can you learn about the quantum state with a reasonably small number of measurement outcomes? A good approximation, or would you have little idea about what the quantum state is? It seems that the former is the case. To illustrate that, we came up with a rather artificial theory where the measurements allow you to deterministically read off bits from some representation of the quantum state; for the case of a qubit $\ket{\psi}=\cos\theta\ket{0}+e^{i\varphi}\sin\theta\ket{1}$ a measurement would tell you the $n$th bit of $\theta$ or $\varphi$. It is clear that this theory violates *state estimation*: for any finite set of measurements there will be a largest $n$ that they can reach, and therefore any pair of quantum states that differ on bits higher than $n$ will be indistinguishable for this set of measurements. It is also clear that this violation: with only $2n$ measurements we can get a $n$-bit approximation for any qubit, which is much better than what can be done in reality! In reality when need about $2^n$ measurements to estimate the probabilities, and therefore the amplitudes, with such an accuracy.

This already tells us that *state estimation* is too strong; it needs at least to be qualified somehow in order to exclude the deterministic theory above. What does it mean in probabilistic theories, though? An often considered toy theory is one where structure of quantum mechanics is kept as it is, but the exponent in the Born rule is changed from $2$ to some $n$. More precisely, let the probability of obtaining outcome $i$ when measuring the state $\psi$ in the orthogonal basis $\{\ket{e_i}\}$ be \[ p(i|\psi) = \frac{|\langle e_i|\psi\rangle|^n}{\sum_{i’}|\langle e_{i’}|\psi\rangle|^n}. \]An interesting feature of this theory is that a finite set of measurement outcomes can distinguish all pure states (in fact the same measurements that distinguishes them in quantum theory), so *state estimation* can only fail here for mixed states.

A nice example is the pair of ensembles

\[\omega_A = \{(p,\ket{0}),(1-p,\ket{1})\}\] and \[\omega_B = \{(1/2,p^\frac1n\ket{0}+(1-p)^\frac1n\ket{1}),(1/2,p^\frac1n\ket{0}-(1-p)^\frac1n\ket{1})\}.\] In quantum mechanics ($n=2$) they are equivalent, both being represented by the density matrix

\[ \rho = \begin{pmatrix} p & 0 \\ 0 & 1-p \end{pmatrix}. \] If $n\neq 2$, though, they are not equivalent anymore, even though they give the same probabilities for any measurements in the X, Y, and Z basis3. To distinguish them we just need to measure the ensembles in the basis \[\{p^\frac1n\ket{0}+(1-p)^\frac1n\ket{1},(1-p)^\frac1n\ket{0}-p^\frac1n\ket{1}\}.\] The probability of obtaining the first outcome for ensemble $\omega_A$ is $p^2 +(1-p)^2$, and for ensemble $\omega_B$ it is some complicated expression that depends on $n$.

Now this is by no means a proof4, but it makes me suspect that it will be rather easy to distinguish any two ensembles that are not equivalent, by making a measurement that contains one of the pure states that was mixed in to make the ensemble. Then if we divide the Bloch sphere in a number of regions, assigning a measurement to cover each such region, we do that with a good enough approximation. Unlike the deterministic theory explored above, in this toy theory it is clearly more laborious to do state estimation than in quantum mechanics, but is still firmly within the real of possibility.

What now, then? If the possibility of state estimation is not a good assumption from which to derive the Born rule, is there a derivation in this operational framework that follows from better assumptions? It turns out that Galley himself has such a derivation, based only on similar structural assumptions together with no-signalling and purification, with no need for *state estimation*. But rather ironically, here the roles flip: while I find purification an excellent axiom to use, Galley is not a fan.

Let me elaborate. Purification is the assumption that every mixed state (like the ensembles above) is obtained by ignoring part of a pure state. It implies then that there are no “external” probabilities in the theory; if you want to flip a coin in order to mix two pure states, you better model that coin inside the theory, and as a pure state. Now Galley doesn’t find purification so nice: for once, because classical theories fail purification, and also because it feels like postulating that your theory is universal, which is a big step to take, in particular when the theory in question is quantum mechanics.

Well, I find that classical theories failing purification is just one more example in a huge pile of examples of how classical theories are wrong. In this particular case they are wrong by being essentially deterministic, and only allowing for probabilities when they are put there by hand. About postulating the universality of the theory, indeed that is a big assumption, but so what? I don’t think good assumptions need to be self-evidently true, I just think they should be well-motivated and physically meaningful.

*Addendum*: A natural question to ask is whether both no-signalling and purification are necessary in such a derivation. It turns that the answer is yes: the toy theory where the exponent in the Born rule is $n$ respects purification, when extended in the obvious way for composite systems, but violates no-signalling, and Galley’s rule respects no-signalling but violates purification.