We often see experimental results reported with some “error bars”, such as saying that the mass of the Higgs boson is $125.10 \pm 0.14\, \mathrm{GeV/c^2}$. What do these error bars means, though? I asked some people what they thought it was, and the usual answer was that the true mass was inside those error bars with high probability. A very reasonable thing to expect, but it turns out that this is not true. Usually these error bars represent a frequentist confidence interval, which has a very different definition: it says that if you repeat the experiment many times, a high proportion of the confidence intervals you generate will contain the true value.
Fair enough, one can define things like this, but I don’t care about hypothetical confidence intervals of experiments I didn’t do. Can’t we have error bars that represent what we care about, the probability that the true mass is inside that range? Of course we can, that is a Bayesian credible interval. Confusingly enough, credible intervals will coincide with confidence intervals in most cases of interest, even though they answer a different question and can be completely different in more exotic problems.
Let’s focus then on the Bayesian case: is the intuitive answer people gave correct then? Yes, it is, but it doesn’t help us define what the credible interval is, as there will be infinitely many intervals that contain the true value with probability (e.g.) 0.95. How do we pick one? A nice solution would be to demand the credible interval to be symmetric around the estimate, so that we could have the usual $a\pm b$ result. But think about the most common case of parameter estimation: we want to predict the vote share that some politician will get in an election. If the poor candidate was estimated to get 2% of the votes, we can’t have the error bars to be $\pm$4%. Even if we could do that, there’s no reason why it should be symmetric: it’s perfectly possible that a 3% vote share is more probable than a 1% vote share.
A workable, if more cumbersome, definition is the Highest Posterior Region: it is a region where all points inside it have a posterior distribution larger than the points outside it. It is well-defined, except for some pathological cases we don’t care about, and is also the smallest possible region containing the true value with a given confidence. Great, no? What could go wrong with that?
Well, for starters it’s a region, not an interval. Think of a posterior distribution that has two peaks: the highest posterior region will be two intervals, each centred around one of the peaks. It’s not beautiful, but it’s not really a problem, the credible region is accurately summarizing your posterior. Your real problem is having a posterior with two peaks. How did that even happen!?
But this shows a more serious issue: the expectation value of a two-peaked distribution might very well be in the value between the peaks, and this will be almost certainly outside the highest posterior region. Can this happen with a more well-behaved posterior, that has a single peak? It turns out it can. Consider the probability density
\[p(x) = (\beta-1)x^{-\beta},\] defined for $x \ge 1$ and $\beta > 2$. To calculate the highest posterior region for some confidence $1-\alpha$, note that $p(x)$ is monotonically decreasing, so we just need to find $\gamma$ such that
\[\int_1^\gamma \mathrm{d}x\, (\beta-1)x^{-\beta} = 1-\alpha.\]Solving that we get $\gamma = \frac1{\sqrt[\beta-1]{\alpha}}$. As for our estimate of the (fictitious) parameter we take the mean of $p(x)$, which is $\frac{\beta-1}{\beta-2}$. For the estimate to be outside the credible interval we need than that
\[\frac{\beta-1}{\beta-2} > \frac1{\sqrt[\beta-1]{\alpha}},\]which is a nightmare to solve exactly, but easy enough if we realize that the mean diverges as $\beta$ gets close to 2, whereas the upper boundary of the credible interval grows to a finite value, $1/\alpha$. If we take then choose $\beta$ such that the mean is $1/\alpha$ it will always be outside the credible interval!
A possible answer is “deal with it, life sucks. I mean, there’s a major war going on in Ukraine, estimates lying outside the credible interval is the least of our problems”. Fair enough, but maybe this means we chose our estimate wrong? If we take our estimate as the mode of the posterior then by definition it will always be inside the highest posterior region. The problem is there’s no good justification for using the mode as the estimate. The mean can be justified as the estimate that minimizes the mean squared error, which is quite nice, but I know of no similar justification for the mode. Also, the mode is rather pathological: if our posterior again has two peaks, but one of them is very tall and has little probability mass, the mode will be there but will be a terrible estimate.
A better answer is that sure, life sucks, we have to deal with it, but note that the probability distribution $(\beta-1)x^{-\beta}$ is very pathological. It will not arise as a posterior density in any real inference problem. That’s fine, it just won’t help against Putin. Slava Ukraini!