# The horrifying world of confidence intervals

We often see experimental results reported with some “error bars”, such as saying that the mass of the Higgs boson is $125.10 \pm 0.14\, \mathrm{GeV/c^2}$. What do these error bars means, though? I asked some people what they thought it was, and the usual answer was that the true mass was inside those error bars with high probability. A very reasonable thing to expect, but it turns out that this is not true. Usually these error bars represent a frequentist confidence interval, which has a very different definition: it says that if you repeat the experiment many times, a high proportion of the confidence intervals you generate will contain the true value.

Fair enough, one can define things like this, but I don’t care about hypothetical confidence intervals of experiments I didn’t do. Can’t we have error bars that represent what we care about, the probability that the true mass is inside that range? Of course we can, that is a Bayesian credible interval. Confusingly enough, credible intervals will coincide with confidence intervals in most cases of interest, even though they answer a different question and can be completely different in more exotic problems.

Let’s focus then on the Bayesian case: is the intuitive answer people gave correct then? Yes, it is, but it doesn’t help us define what the credible interval is, as there will be infinitely many intervals that contain the true value with probability (e.g.) 0.95. How do we pick one? A nice solution would be to demand the credible interval to be symmetric around the estimate, so that we could have the usual $a\pm b$ result. But think about the most common case of parameter estimation: we want to predict the vote share that some politician will get in an election. If the poor candidate was estimated to get 2% of the votes, we can’t have the error bars to be $\pm$4%. Even if we could do that, there’s no reason why it should be symmetric: it’s perfectly possible that a 3% vote share is more probable than a 1% vote share.

A workable, if more cumbersome, definition is the Highest Posterior Region: it is a region where all points inside it have a posterior distribution larger than the points outside it. It is well-defined, except for some pathological cases we don’t care about, and is also the smallest possible region containing the true value with a given confidence. Great, no? What could go wrong with that?

Well, for starters it’s a region, not an interval. Think of a posterior distribution that has two peaks: the highest posterior region will be two intervals, each centred around one of the peaks. It’s not beautiful, but it’s not really a problem, the credible region is accurately summarizing your posterior. Your real problem is having a posterior with two peaks. How did that even happen!?

But this shows a more serious issue: the expectation value of a two-peaked distribution might very well be in the value between the peaks, and this will be almost certainly outside the highest posterior region. Can this happen with a more well-behaved posterior, that has a single peak? It turns out it can. Consider the probability density
$p(x) = (\beta-1)x^{-\beta},$ defined for $x \ge 1$ and $\beta > 2$. To calculate the highest posterior region for some confidence $1-\alpha$, note that $p(x)$ is monotonically decreasing, so we just need to find $\gamma$ such that
$\int_1^\gamma \mathrm{d}x\, (\beta-1)x^{-\beta} = 1-\alpha.$Solving that we get $\gamma = \frac1{\sqrt[\beta-1]{\alpha}}$. As for our estimate of the (fictitious) parameter we take the mean of $p(x)$, which is $\frac{\beta-1}{\beta-2}$. For the estimate to be outside the credible interval we need than that
$\frac{\beta-1}{\beta-2} > \frac1{\sqrt[\beta-1]{\alpha}},$which is a nightmare to solve exactly, but easy enough if we realize that the mean diverges as $\beta$ gets close to 2, whereas the upper boundary of the credible interval grows to a finite value, $1/\alpha$. If we take then choose $\beta$ such that the mean is $1/\alpha$ it will always be outside the credible interval!

A possible answer is “deal with it, life sucks. I mean, there’s a major war going on in Ukraine, estimates lying outside the credible interval is the least of our problems”. Fair enough, but maybe this means we chose our estimate wrong? If we take our estimate as the mode of the posterior then by definition it will always be inside the highest posterior region. The problem is there’s no good justification for using the mode as the estimate. The mean can be justified as the estimate that minimizes the mean squared error, which is quite nice, but I know of no similar justification for the mode. Also, the mode is rather pathological: if our posterior again has two peaks, but one of them is very tall and has little probability mass, the mode will be there but will be a terrible estimate.

A better answer is that sure, life sucks, we have to deal with it, but note that the probability distribution $(\beta-1)x^{-\beta}$ is very pathological. It will not arise as a posterior density in any real inference problem. That’s fine, it just won’t help against Putin. Slava Ukraini!

This entry was posted in Uncategorised. Bookmark the permalink.

### 5 Responses to The horrifying world of confidence intervals

1. Danylo says:

I think the straightforward answer is to keep track of the whole posterior distribution instead of just the credible interval in complex cases (that have two peaks, for example).

And thank you for supporting Ukraine.
Heroyam slava!

2. Hi Mateus, in principle the mode of posterior can be justified as the estimate that minimises the mean abs-value |…| rather than mean squared error (…)^2. However, for your pathological pdf one would need to go through the derivation, as non-regularity at x->0 may be a problem. Maybe we do can it quickly in Vienna for fun, if Miguel allows me to come the Tsirelson workshop.

Slava Ukraini!

Janek

P.S. Danylo, if you or any other researcher from Ukraine working on quantum info/quantum physics needs a funded research visit/full-time contract in Poland, feel free to let me know.

3. Mateus Araújo says:

Hi Jan,

The mode of the posterior does not minimise the mean abs-value, even in non-pathological cases. Take for example the posterior to be the well-behaved $12 p^2(1-p)$, for $p\in[0,1]$. The mode is 2/3, the mean is 3/5, but the value that minimises the mean abs-value is some horrible constant approximately equal to 0.61.

4. Mea culpa. I mixed up the cost functions for Bayesian estimators. What you got is actually the median (just checked, your 0.61 is a solution to 4th-order equation), i.e. the integral from 0 to 0.61 of your posterior is equal to integral from 0.61 to 1. In order to arrive at the mode (maximum) of the posterior as the optimal estimator, one must choose instead the \”hit-or-miss\” error, i.e. C(p-p_true)=0 if |p-p_true| < $\delta$ for some small $\delta$ >0. This will correctly give you the mode (maximum), as its minimisation corresponds to maximising an integral of the posterior over an interval of fixed width that you can shift left or right. In summary, (…)^2 gives mean, |…| gives median, \”hit-or-miss\” gives mode (maximum) at least for non-pathological distributions. I should have known this correctly by heart. Apologies. The question is under what circumstances you would interpret \”hit-or-miss\” error as well-motivated in inference problems.
Ok, that works, in the limit $\delta \to 0$. Thanks, I didn’t know that. Seems like a terrible cost function, though, you only care about getting the estimate exactly right, and don’t care about how wrong you are in the case you miss. I’ll stay with the mean.