I started writing the previous post as an introduction to another subject, but it got too long and I decided to publish it separately. What I actually wanted to talk about is the following question: what happens if you are doing device-independent quantum key distribution (DIQKD) experimentally, and you violate the Tsirelson bound?
I don’t meant in the sense of violating quantum mechanics, but doing it in the way quantum mechanics allows. If you do the experiment perfectly, then the probability of winning each round of the CHSH game is exactly the Tsirelson bound $\omega = \frac{2+\sqrt{2}}{4}$. Then the probability of getting a number of victories $v$ out of $n$ rounds of the CHSH game such that
\[\frac{v}{n} > \frac{2+\sqrt{2}}{4}\] is given by
\[ \sum_{v = \lceil n \omega \rceil}^n \binom{n}{v} \omega^v(1-\omega)^{n-v}. \] This is not small at all, it is equal to $\omega$ for $n=1$, and goes to 1/2 for large $n$.
So yeah, it’s perfectly possible to violate the Tsirelson bound, and it is not a matter of experimental error or doing too few rounds1. On the contrary, experimental error is precisely what makes it very unlikely to win the CHSH game too often. This is very unsatisfactory, though, we are relying on experimental error to sweep the problem under the rug. Clearly DIQKD must also work in the ideal case.
Even if you only care about the realistic case, there’s a different scenario where this matters: as proposed by Brown and Fawzi$^{\otimes 2}$, one can use an estimate of the whole probability distribution to do DIQKD instead of only the probability of winning the CHSH game. This makes it harder for the eavesdropper to cheat, and thus gives us better key rates. The problem is that we’re now dealing with a high-dimensional object instead of a one-dimensional parameter, and we need the estimates of all the parameters to land in the quantum-allowed region. The probability that at least one falls outside is appreciable. It’s hard to give a precise statement about this, because it will depend on the quantum state and the measurements you are doing, but the fact of the matter is that experimentalists routinely get estimates outside of the quantum-allowed region2. For simplicity, though, we’ll focus on the one-dimensional case.
Why wouldn’t it work, though? What is the problem with violating the Tsirelson bound? The idea of DIQKD is that Alice and Bob play the CHSH game, calculate the frequency of winning $\frac{v}n$, and do an optimization over all quantum states with winning probability equal to $\frac{v}n$, picking up the worst, that is, the one that gives Eve the most information about the key they’re trying to generate. Well, if $\frac{v}n > \frac{2+\sqrt{2}}{4}$ there’s just no quantum state with this winning probability, so you can’t figure out how much information Eve can have in this way.
What can we do then? One obvious solution is to say that the winning probability $p$ is equal to $\frac{2+\sqrt{2}}{4}$. After all, this is the closest we can get to the frequency $\frac{v}n$ while staying in the range allowed by quantum mechanics. That’s not a good idea though. We would be assuming that Eve has no information whatsoever about Alice and Bob’s key, while it is perfectly possible that $p$ is slightly smaller than $\frac{2+\sqrt{2}}{4}$, which would give her a bit of information. In fact, the probability that $p$ is exactly $\frac{2+\sqrt{2}}{4}$ is zero, just because $p$ is a continuous parameter. It is very likely, on the other hand, that $p$ is close to $\frac{2+\sqrt{2}}{4}$. This is what you should assume. And this is not even related to violating the Tsirelson bound. Even if you find that $\frac{v}n = 0.8$, it would be stupid to assume that $p=0.8$, as it’s almost certainly not. Assuming a flat prior over the quantum-allowed region, the probability density of $p$ is given by
\[f(p|v,n) = \frac{p^v(1-p)^{n-v}}{\int_{1-\omega}^{\omega}\mathrm{d}q\, q^v(1-q)^{n-v}},\] for $p \in [1-\omega,\omega]$ and zero otherwise.
Which finally brings us to the DIQKD papers I mentioned in the previous post. How did they deal with this problem? It turns out, they did something completely different. They set some expected winning probability $p_0$ and some tolerance $\delta$, and if the measured frequency $v/n$ is at least $p_0-\delta$, they assume that the actual probability $p$ is also at least $p_0-\delta$.
I find that very strange, they are not using the measured frequency for anything other than this test, the key rate is calculate solely based on $p_0-\delta$. This is not wrong, I must emphasize, the probability that the winning probability is at least $p_0-\delta$ given that the frequency is at least $p_0-\delta$ is indeed very high, and they have a proper security proof. I just find it bizarre that they are discarding valuable information, using the measured frequency can give you a much better idea on what the actual winning probability is. For example, if the measured frequency is very close to $p_0-\delta$, then the probability that the winning probability is at least $p_0-\delta$ is close to 1/2. Not as high as we’d like. On the other hand, if the measured frequency is much higher than $p_0-\delta$, the winning probability is likely much higher, and you’re needlessly lowering your key rate by being so pessimistic.
Just to throw in my perspective, as someone working on security proofs. :) Let’s focus on the IID case so that the parameter “the true winning probability” is actually a well-defined value in the first place. The basic problem we face is that from a frequentist perspective, the question “what is the probability that the true parameter value at least q, conditioned on the observed frequency being at least p?” is not well-posed in the first place, for the usual frequentist reason that the parameter is not a random variable. On the other hand, a Bayesian perspective would require a prior on the distribution of the “true winning probability”, but this is not something we have in an adversarial context.
So to work around this, we instead ask the valid frequentist question of “what is the probability that the observed frequency is at least p, conditioned on the true parameter value at most q?” (the reversal of bound direction in the second half of the statement is intentional; essentially the case being analyzed here is the abort-with-high-probability case), and construct a security proof based on that. However, it’s not completely trivial to construct an argument using only this direction of the conditional probabilities, and life is much easier if (in this statement) p and q are fixed constants rather than functions of the observed frequency (which could introduce an unpleasant risk of “cyclic” dependencies in the argument).
Off the top of my head, I believe that trying to get bounds that depend on the observed frequencies would be possible in principle, but one would face the unpleasant task of handling an entire family of such statements parametrized by carefully chosen values for p and q, and generally I try to avoid such headaches. :)
Thanks for chiming in! I think this is yet another example in a long pile of examples about how frequentism doesn’t make any sense. What you actually want to know, and what actually matters for security, is the conditional probability $P(\omega \ge q| f \ge p)$. You can’t base your security on the “frequentist question” $P(f \ge p|\omega \le q)$. The parameter that enters your optimization problem (minimize the key rate over the quantum states compatible with your statistics) is the true probability, and that’s what you need to estimate. Yes, you do require a prior distribution on $\omega$ in order to estimate it based on $f$. But that’s the point, you do require it, there’s no way around that. Why do you claim we don’t have that in an adversarial context? Of course we do, and we use a probability distribution precisely because we don’t know whether an adversary is listening or not. If we are pretty sure an adversary is listening, then we will require much better statistics to believe we have a security key. On the other hand, if we are pretty sure there’s no adversary, a few rounds suffice. There is no way to get a secure key without making any assumptions about the adversary, because it’s always possible to get any statistics at all just by chance. When you pretend that this is possible by doing the calculations the frequentist way you are just making an arbitrary implicit assumption about what the prior is. I think it’s much better to make your assumptions explicit.
Funny you mention the non-IID case, it is something that I’m working on right now. Conceptually it’s easy, you just redefine the $n$ rounds of the CHSH game as a single nonlocal game. You win at the combined game if you win more than $(3/4+\delta)n$ rounds of the individual games ($\delta$ is a fixed parameter). Then this winning probability is well-defined, even in the non-IID case, and if $(3/4+\delta)$ is close to the Tsirelson bound, a victory in the combined game does imply that you have a very entangled state and therefore Eve has very little information. Technically, though, it is a nightmare to transform this into a proper security proof.
Ah, I don’t agree with the claim that we “are just making an arbitrary implicit assumption about what the prior is”. To clarify, my preference is to work within things like the Abstract Cryptography framework of Maurer/Renner/Portmann etc, for perhaps obvious reasons. :) In this framework, one proves security statements that really do hold for all adversarial strategies, genuinely without a prior (apart from the required “background” (DI)QKD assumptions like Alice and Bob having separate labs, etc – maybe one could consider those part of a prior, but I usually would not). It is just that the statements are, by necessity, not of the form “conditioned on observing this result, the probability that [bad event happened] is at most x”. Of course, one could ask what value there is in proving the security statements used in that framework – but I would say that the statements are indeed valuable, because they can be operationally interpreted as implying that the maximum probability of any “bad” event (which can be entirely arbitrary, as long as it is a well-defined event!) increases by at most some small epsilon when replacing an idealized key generation functionality with the actual (DI)QKD protocol in some larger protocol.
In the Bayesian version, I don’t see a comfortable way to address the question of how to handle the prior being “Eve always implements a completely insecure classical behaviour in the devices”, because in that case, as you said, conditioning on the protocol accepting still does not let one bound the probability of basically any “bad” event as less than 1. One could claim that this is a stupid prior, and…well, it is “obviously” stupid, but really, why? It seems hard to give a clear reason why it should be ignored, because it is something Eve could really choose to do. In contrast, the trace-distance security property under the Abstract Cryptography framework is really proven to hold for all strategies by Eve, or in this language, for all priors on Eve’s behaviour – we genuinely do not choose a prior when proving this property holds. (To discuss how it handles the specific “Eve always implements an entirely classical behaviour” prior/strategy might be a bit too much of a diversion, but basically the high abort probability ensures that it remains sufficiently indistinguishable from the ideal functionality. One could perhaps incorporate similar ideas into a Bayesian approach to handle the “stupid prior”, but if we’re at the point where we’re proving statements that hold for various choices of prior, we might as well go all the way and prove statements that hold for all choices of prior.)
And as you observe, I feel the situation regarding the priors gets worse when you consider the non-IID case, because the set of strategies increases dramatically. Indeed one could partition the set into families parametrized by a smaller number of parameters, such as what you described, but again it doesn’t seem particularly clear to me why a particular parametrization might be more “natural” or what prior one should take on the parameters. (I also agree that the security proofs here would be much more of a pain…to my knowledge, any self-testing results that might be candidates for what you described are not very robust. Although perhaps that’s not your goal.)
Indeed, I don’t take these security claims seriously. The framework is claiming to prove things that are logically impossible. It’s just using bad assumptions.
I’m not sure I understood what you meant by “Eve always implements a completely insecure classical behaviour in the devices”. Are you saying that Alice and Bob’s devices screw up the privacy amplification part of the protocol, thus making it insecure? Well you just need to assume that they don’t, otherwise the protocol is insecure period, independent of any security proofs you’re using.
Let me give you a concrete example about why a prior is necessary. Suppose you are given a coin with bias towards heads promised to be either $L=3/4$ or $Q=(2+\sqrt{2})/4$, and you want to know which is the case. You then flip a coin 100 times, and observe 85 heads. It must have bias $(2+\sqrt{2})/4$, right? Well, no, it depends on the prior. If the prior of having bias $3/4$ is $p$, then
\[\frac{p(L)}{p(Q)} = \frac{L^{85}(1-L)^{15}}{Q^{85}(1-Q)^{15}}\frac{p}{1-p},\]which will be larger than 1 if $p \ge 0.96$.
How would you solve this the “frequentist” way? Simply taking the likelihood ratio and pretending there is no prior? Well, this is just arbitrarily and implicitly assuming that the prior is flat. Which I’m sure is what the Abstract Cryptography framework is doing. Not because I’ve looked at it, but because every time I’ve seen a frequentist claiming that they are doing away with the prior they were simply assuming it is flat.
Let’s change the description of the problem. Instead of simply being promised that the prior is either $L$ or $Q$, we’re also promised that it was chosen by a genuine QRNG with bias $p$ towards $L$. Would you then accept using the prior?
Oh, to rephrase the prior I was describing, I meant the prior of “the devices’ winning probability is always w=0.75, and Eve gets a perfect copy of all the outputs” (this is possible because for w=0.75 the devices could just be classical). Then even conditioned on accepting, Eve still knows everything about the key.
And ah, I am very familiar with the default limitations of the frequentist framework, and I certainly understand that it may not give good answers to the question you picked – but this doesn’t mean it can’t give good answers to different questions. I suppose this is not the easiest format for discussing how Abstract Cryptography avoids these issues, so maybe we can just discuss it if we have a chance to meet at a workshop sometime (…eventually? hopefully?). :) But to take a very brief shot at it, the security definitions we use specifically avoid answering anything like the question of “what strategy is the adversary using?”, because that runs into the default issues you mentioned. Instead, we prove statements of the form (informally) “for any strategy by Eve, the actual state produced in the protocol will be close in trace distance to a state that has the same probability of producing the abort outcome, but otherwise produces a perfect secret key” – the formal expression is basically just page 20 of https://arxiv.org/pdf/1409.3525.pdf (specifically Eq. (12), though the definitions of the individual terms are on the rest of the page).
Hopefully, it at least looks possible to you that such a statement could be rigorously proven over all strategies (or in other words, any prior! I must insist we are not using a flat prior!) by Eve – the trick here is that the abort probability is a function of the strategy/prior, and when this probability is high, the abort event completely dominates the state and so the trace distance discussed above can still be small. (In particular, this is why it holds even for the “stupid prior” of Eve always setting w=0.75.) But certainly, one can question whether such a trace-distance bound is practically useful – that would be the much longer discussion, though I have to say I am myself surprised by how powerful a statement one can eventually extract from such a bound. :)
Ok, now I understand what you are talking about, this is prior distribution over the true probability of winning. You are considering the case where the prior is $\delta(\omega-\frac34)$. In this case Alice and Bob will always conclude that the probability is $\frac34$, regardless of the statistics they observe, and thus will always abort. It doesn’t compromise security, but it does make the protocol fail completeness. A more insidious prior would be $\delta(\omega-\frac{2+\sqrt2}4)$. Now Alice and Bob will always believe they have a perfect key, regardless of their statistics, and thus the protocol fails soundness. There’s a general prohibition against using these inflexible priors in Bayesian statistics, because they imply in simply ignoring your statistics.
I’ve taken a look at the paper, and the explanation for our disagreement is very simple: you’re not doing statistics at all, you’re only working in the asymptotic limit. Well then of course the prior does not matter, you are always assuming that you simply know the exact probability. It is not about Bayesian versus frequentist statistics.
The prior matters when we’re talking about a realistic scenario, where you made a finite number of measurements, and you want to make a conclusion about your security from that. This blog post is about the finite case.