# Pure quantum operations

Everybody knows how to derive what are the most general operations one can apply to a quantum state. You just need to assume that a quantum operation

1. Is linear.
2. Maps quantum states to quantum states.
3. Still maps quantum states to quantum states when applied to a part of a quantum system.

And you can prove that such quantum operations are the well-known completely positive and trace preserving maps, which can be conveniently represented using the Kraus operators or the Choi-Jamiołkowski isomorphism.

But what if one does not want general quantum operations, but wants to single out pure quantum operations? Can one have such an axiomatic description, a derivation from intuitive[1] assumptions?

Well, the usual argument one sees in textbooks to show that the evolution of quantum states must be given by a unitary assumes that the evolution

1. Is linear.
2. Maps pure quantum states to pure quantum states.

From this, you get that a quantum state $\ket{\psi}$ is mapped to a quantum state $U\ket\psi$ for a linear operator $U$, and furthermore since by definition quantum states have 2-norm equal to 1, we need the inner product $\bra\psi U^\dagger U \ket\psi$ to be 1 for all $\ket\psi$, which implies that $U$ must be a unitary matrix.

The only problem with this argument is that it is false, as the map
$\mathcal E(\rho) = \ket\psi\bra\psi \operatorname{tr} \rho,$which simply discards the input $\rho$ and prepares the fixed state $\ket\psi$ instead is linear, maps pure states to pure states, and is not unitary. The textbooks are fine, as they usually go through this argument before density matrices are introduced, and either implicitly or explicitly state that the evolution takes state vectors to state vectors. But this is not good enough for us, as this restriction to state vectors is both unjustified, and does not satisfy our requirement of being an “intuitive assumption”.

Luckily, the fix is easy: we just need to add the analogue of the third assumption used in the derivation of general quantum operations. If we assume that a pure quantum operation

1. Is linear.
2. Maps pure quantum states to pure quantum states.
3. Still maps pure quantum states to pure quantum states when applied to a part of a quantum system.

then we can prove that pure quantum operations are just unitaries[2]. Since the proof is simple, I’m going to show it in full.

Let $\mathcal F$ be the pure quantum operation we are interested in. If we apply it to the second subsystem of a maximally entangled state, $\ket{\phi^+} = \frac1{\sqrt d}\sum_{i=1}^d \ket{ii}$, by assumption 3 the result will be a pure state, which we call $\ket{\varphi}$. In symbols, we have
$\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+}) = \ket{\varphi}\bra{\varphi},$where $\mathcal I$ represents doing nothing to the first subsystem. Now the beautiful thing about the maximally entangled state is that if $\mathcal F$ is a linear map then $\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})$ contains all the information about $\mathcal F$. In fact, if we know $\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})$ we can know how $\mathcal F$ acts on any matrix $\rho$ via the identity
$\mathcal F (\rho) = \operatorname{tr}_\text{in} [(\rho^T \otimes \mathbb I) \mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+})].$
This is the famous Choi-Jamiołkowski isomorphism[3]. Now let’s use the fact that the result $\ket{\varphi}\bra{\varphi}$ is a pure state. If we write it down in the computational basis
$\ket\varphi = \sum_{i,j=1}^d \varphi_{ij} \ket{i j},$we see that if we define a matrix $\Phi$ with elements $\Phi_{ij} = \varphi_{ji} \sqrt d$ then $\ket\varphi = \mathbb I \otimes \Phi \ket{\phi^+}$[4], so
$\mathcal I \otimes \mathcal F (\ket{\phi^+}\bra{\phi^+}) = (\mathbb I \otimes \Phi) \ket{\phi^+}\bra{\phi^+} (\mathbb I \otimes \Phi^\dagger).$
Using the identity above we have that
$\mathcal F(\rho) = \Phi \rho \Phi^\dagger,$and since $\operatorname{tr}(\mathcal F(\rho)) = 1$ for every $\rho$ we have that $\Phi^\dagger\Phi = \mathbb I$, so $\Phi$ is an isometry. If in addition we demand that $\mathcal F(\rho)$ has the same dimension as $\rho$, then $\Phi$ must be a square matrix, and therefore has a right inverse which is equal to its left inverse, so $\Phi$ is a unitary.

This result is so amazing, so difficult, and so ground-breaking that the referees allowed me to include it as a footnote in my most recent paper without bothering to ask for a proof or a reference. But joking aside, I’d be curious to know if somebody already wrote this down, as a quick search through the textbooks revealed me nothing.

But how about Wigner’s theorem, I hear you screaming. Well, Wigner was not concerned with deriving what were the quantum operations, but what were the symmetry transformations one could apply to quantum states. Because of this he did not assume linearity, which was not relevant to him (and in fact would make his theorem wrong, as one can have perfectly good anti-linear symmetries, such as time reversal). Also, he assumed that symmetry transformations preserve inner products, which is too technical for my purposes.

# What is the probability of an infinite sequence of coin tosses?

It’s 0, except on the trivial cases where it is 1.

But clearly this is the wrong way to formulate the question, as there are interesting things to be said about the probabilities of infinite sequences of coin tosses. The situation is analogous to uniformly sampling real numbers from the $[0,1]$ interval: the probability of obtaining any specific number is just 0. The solution, however, is simple: we ask instead what is the probability of obtaining a real number in a given subinterval. The analogous solution works for the case of coin tosses: instead of asking the probability of a single infinite sequence, one can ask the probability of obtaining an infinite sequence that starts with a given finite sequence.

To be more concrete, let’s say that the probability of obtaining Heads in a single coin toss is $p$, and for brevity let’s denote the outcome Heads by 1 and Tails by 0. Then the probability of obtaining the sequence 010 is $p(1-p)^2$, which is the same as the probability of obtaining the sequence 0100 or the sequence 0101, which is the same as the probability of obtaining a sequence in the set {01000, 01001, 01010, 01011}, which is the same as the probability of obtaining an infinite sequence that starts with 010.

There is nothing better to do with infinite sequences of zeroes and ones than mapping them into a real number in the interval $[0,1]$, so we shall do that. The set of infinite sequences that start with 010 are then very conveniently represented by the interval $[0.010,0.010\bar1]$, also known as $[0.010,0.011]$ for those who do not like infinite strings of ones, or $[0.25,0.375]$ for those who do not like binary. Saying then that the probability of obtaining a sequence in $[0.010,0.010\bar{1}]$ is $p(1-p)^2$ is assigning a measure to this interval, which we write as
$\rho([0.010,0.010\bar{1}]) = p(1-p)^2$
Now if we can assign a sensible probability to every interval contained in $[0,1]$ we can actually extend it into a proper probability measure over the set of infinite sequences of coin tosses using standard measure-theoretical arguments. For me this is the right answer to the question posed on the title of this post.

So, how do we go about assigning a sensible probability to every interval contained in $[0,1]$? Well, the argument of the previous paragraph can clearly be extended to any interval of the form $[k/2^n, (k+1)/2^n]$. We just need write $k$ in the binary basis, padded with zeroes on the left until it reaches $n$ binary digits, and count the number of 0s and 1s. In symbols:
$\rho\left(\left[\frac{k}{2^n}, \frac{k+1}{2^n}\right]\right) = p^{n_1(k,n)}(1-p)^{n_0(k,n)}$
The extension to any interval where the extremities are binary fractions is straightforward. We just break them down into intervals where the numerators differ by one and apply the previous rule. In symbols:
$\rho\left(\left[\frac{k}{2^n}, \frac{l+1}{2^n}\right]\right) = \sum_{i=k}^{l} p^{n_1(i,n)}(1-p)^{n_0(i,n)}$
We are essentially done, since we can approximate any real number as well as we want we want by using binary fractions [5]. But life is more than just binary fractions, so I’ll show explicitly how to deal with the interval
$[0,1/3] = [0,0.\bar{01}]$

The key thing is to choose a nice sequence of binary fractions $a_n$ that converges to $1/3$. It is convenient to use a monotonically increasing sequence, because then we don’t need to worry about minus signs. If furthermore the sequence starts with $0$, then $[0,1/3] = \bigcup_{n\in \mathbb N} [a_n,a_{n+1}]$ and
$\rho([0,1/3]) = \sum_{n\in \mathbb N} \rho([a_n,a_{n+1}])$ An easy sequence that does the job is $(0,0.01,0.0101,0.010101,\ldots)$. It lets us write the interval as
$[0,1/3] = [0.00, 0.00\bar{1}] \cup [0.0100, 0.0100\bar{1}] \cup [0.010100, 0.010100\bar{1}] \cup …$ which gives us a simple interpretation of $\rho([0,1/3])$: it is the probability of obtaining a sequence of outcomes starting with 00, or 0100, or 010100, etc. The formula for the measure of $[a_n,a_{n+1}]$ is also particularly simple:
$\rho([a_n,a_{n+1}]) = p^{n-1}(1-p)^{n+1}$ so the measure of the whole interval is just a geometric series:
$\rho([0,1/3]) = (1-p)^2\sum_{n\in\mathbb N} \big(p(1-p)\big)^{n-1} = \frac{(1-p)^2}{1-p(1-p)}$

It might feel like something is missing because we haven’t examined irrational numbers. Well, not really, because the technique used to do $1/3$ clearly applies to them, as we only need a binary expansion of the desired irrational. But still, this is not quite satisfactory, because the irrationals that we know and love like $1/e$ or $\frac{2+\sqrt2}4$ have a rather complicated and as far as I know patternless binary expansion, so we will not be able to get any nice formula for them. On the other hand, one can construct some silly irrationals like the binary Liouville constant
$\ell = \sum_{n\in\mathbb N} 2^{-n!} \approx 0.110001000000000000000001$whose binary expansion is indeed very simple: every $n!$th binary digit is a one, and the rest are zeroes. The measure of the $[0,\ell]$ interval is then
$\rho([0,\ell]) = \sum_{n\in \mathbb N} \left(\frac{p}{1-p}\right)^{n-1} (1-p)^{n!}$Which I have no idea how to sum (except for the case $p=1/2$ ;)

But I feel that something different is still missing. We have constructed a probability measure over the set of coin tosses, but what I’m used to think of as “the probability” for uncountable sets is the probability density, and likewise I’m used to visualize a probability measure by making a plot of its density. Maybe one can “derive” the measure $\rho$ to obtain a probability density over the set of coin tosses? After all, the density is a simple derivative for well-behaved measures, or the Radon-Nikodym derivative for more naughty ones. As it turns out, $\rho$ is too nasty for that. The only condition that a probability measure needs to satisfy in order to have a probability density is that it needs to attribute measure zero to every set of Lebesgue measure zero, and $\rho$ fails this condition. To show that, we shall construct a set $E$ such that its Lebesgue measure $\lambda(E)$ is zero, but $\rho(E)=1$.

Let $E_n$ be the set of infinite sequences that start with a $n$-bit sequence that contains at most $k$ ones[2]. Then
$\rho(E_n) = \sum_{i=0}^k \binom ni p^i (1-p)^{n-i}$ and
$\lambda(E_n) = 2^{-n} \sum_{i=0}^k \binom ni$ These formulas might look nasty if you haven’t fiddled with entropies for some time, but they actually have rather convenient bounds, which are valid for $p < k/n < 1/2$: $\rho(E_n) \ge 1 - 2^{-n D\left( \frac kn || p\right)}$ and $\lambda(E_n) \le 2^{-n D\left( \frac kn || \frac 12\right)}$ where $D(p||q)$ is the relative entropy of $p$ with respect to $q$. They show that if $k/n$ is smaller than $1/2$ then $\lambda(E_n)$ is rather small (loosely speaking, the number of sequences whose fraction of ones is strictly less than $1/2$ is rather small), and that if $k/n$ is larger than $p$ then $\rho(E_n)$ is rather close to one (so again loosely speaking, what this measure does is weight the counting of sequences towards $p$ instead of $1/2$: if $k/n$ were smaller than $p$ then $\rho(E_n)$ would also be rather small).

If we now fix $k/n$ in this sweet range (e.g. by setting $k = \lfloor n(p + 0.5)/2\rfloor$)[3] then
$E = \bigcap_{i \in \mathbb N} \bigcup_{n \ge i} E_n,$
is the set we want, some weird kind of limit of the $E_n$. Then I claim, skipping the boring proof, that
$\rho(E) = 1$and
$\lambda(E) = 0$

But don’t panic. Even without a probability density, we can still visualize a probability measure by plotting its cumulative distribution function
$f(x) = \rho([0,x])$which for $p = 1/4$ is this cloud-like fractal: