The fallacy of confidence in confidence intervals

The counter-intuitiveness of statistical theory

Richard D. Morey (Twitter @richarddmorey)
(ESRC) Bayesian Data Analysis in the Social Sciences Curriculum // September 2017

Neyman (1937) and theory of confidence intervals

What is a confidence procedure?

  1. Sample data from a population.
  2. Compute two numbers from the data $(L, U)$ using some procedure.
  3. Say "The parameter is inside the interval $(L, U)$."

If the procedure is such that the statement in (3) is true \(X\%\) of the time in repeated samples, the procedure is an \(X\%\) confidence procedure.

What is a confidence interval?

An \(X\%\) confidence interval is an interval between two numbers, resulting from applying a \(X\%\) confidence procedure.

What, again, is a confidence interval?

  • Does a confidence interval indicate precision of an estimate of a parameter?

  • Does a confidence interval contain "likely" values of a parameter?

  • Does a confidence interval have an \(X\%\) probability of containing the true value?

Properties of CI?

  • Cumming (2014): "[l]ong confidence intervals (CIs) will soon let us know if our experiment is weak and can give only imprecise estimates"

  • Cumming (2014): "[w]e can be 95% confident that our interval includes [the parameter] and can think of the lower and upper limits as likely lower and upper bounds for [the parameter]."

  • Young and Lewis (1997): "[t]he width of the CI gives us information on the precision of the point estimate."

  • Masson and Loftus (2003): "[t]he interpretation of the confidence interval constructed around that specific mean would be that there is a 95% probability that the interval is one of the 95% of all possible confidence intervals that includes the population mean. Put more simply, in the absence of any other information, there is a 95% probability that the obtained confidence interval includes the population mean."

  • Kalinowski (2010): "A good way to think about a CI is as a range of plausible values for the population mean (or another population parameter such as a correlation)."

The fallacies of confidence intervals

The precision fallacy

"The width of a CI indicates the precision of the estimate of the parameter. Narrow CIs indicate precise knowledge, while wide CIs indicate imprecise knowledge."

The likelihood/plausibility fallacy

"The values inside a CI are plausible/likely, or at least more plausible/likely than those outside the CI."

The fundamental confidence fallacy

"If the a CI was computed from an \(X\%\) confidence procedure, then the probability that the CI contains the true value is \(X\%\)."

Neyman (1937) and theory of confidence intervals

The Fundamental Confidence Fallacy

"Consider now the case when a sample ... is already drawn and the [confidence interval] given...Can we say that in this particular case the probability of the true value of [the parameter] falling between [$L$ and $U$] is equal to [$1-\alpha$]?

The answer is obviously in the negative."

Neyman (1937), p. 349

Neyman (1937) and theory of confidence intervals

Don't feel guilty: a seminar with Jerzy Neyman (1937).

Young Milton Friedman: "[Professor Neyman,] [y]our statement of probability that he will be correct in 99 per cent of the cases is also equivalent to the statement, is it not, that the probability is 99 out of 100 that [the true parameter] lies between the limits [$L$] and [$U$]?"

Jerzy Neyman: "No. This is the point I tried to emphasize in my first two lectures both in theoretical discussions and in examples."

A simple example from first year statistics

Suppose we draw \(N=10\) normal observations with known variance. \[ y_i \sim \mbox{Normal}(\mu, 15^2) \]

  • \(CI_1\): \(\bar{y}\pm0.674\frac{15}{\sqrt{10}}\) (50% \(z\) interval)
  • \(CI_2\): \(\bar{y}\pm0.688\frac{s}{\sqrt{10}}\) (50% \(t\) interval)

A simple example from first year statistics

Both have the same long-run coverage probability!

plot of chunk unnamed-chunk-3

The cannot both have the same probability of containing the true value.

CI on ANOVA effect size

3 groups, \(N = 10\) in each group

plot of chunk unnamed-chunk-5

True effect size \(\omega^2\)

Consider the "variance" of the standardized true means, \(V\):

\[ V = \frac{\sum_{j=1}^J\left(\frac{\mu_j - \mu_{grand}}{\sigma_\epsilon}\right)^2}{J} \]

Then \(\omega^2\) is

\[ \omega^2 = \frac{V}{1 + V} \]

  • If no effects, \(\omega^2=0\)
  • \(0 \leq\omega^2\leq 1\)
  • \(F\) statistics are larger on average if \(\omega^2\) is larger

Building a 50% CI

Strategy: Two one-sided significance tests at \(\alpha=.25\)

  • Reject \(\omega^2\) values for which the observed is "suprisingly large"
  • Reject \(\omega^2\) values for which the observed is "suprisingly small"
  • Each one sided test has Type I error rate .25
  • Total error rate .5, for 50% CI.

See Steiger (2004) for details about this CI.

Finding the lower bound

Finding the upper bound

How does the CI work?

Two one-sided significance tests.

  • The observed \(F=5\) is surprisingly large (\(\alpha=0.25\)) when \(\omega^2<0.139\).
  • The observed \(F=5\) is surprisingly small (\(\alpha=0.25\)) when \(\omega^2>0.316\).
  • Every \(\omega^2\) outside \((0.139,0.316)\) is rejected by one of the one-sided tests
  • The 50% confidence interval is all values not rejected at \(\alpha=0.25\).

CI on \(\omega^2\)

plot of chunk unnamed-chunk-8

CI on \(\omega^2\)

plot of chunk unnamed-chunk-9

CI on \(\omega^2\)

plot of chunk unnamed-chunk-10

What happened?

The confidence interval has "disappeared"!

"In some cases...the value of the observed statistic is so low that it is not possible to find a [parameter value] that places it at the required percentage point. Standard procedure in this case is to arbitrarily set the confidence limit at 0..." (Steiger & Fouladi, 1997)

"In extreme cases, a confidence interval might actually have 0 as both endpoints. This zero-width confidence interval obviously does not imply that effect size was determined with perfect precision." (Steiger, 2004)

If CI is \((0,0)\), the \(F\) statistic is "surprisingly low" under all possible \(\omega^2\) values!

CI advocacy confusion

CIs are not "plausibility" or "precision" intervals!

  • Very low \(F\) signals possible data problem: between variance too small
  • BUT: whether CI is empty depends on confidence level!
  • You cannot understand the CI unless you understand significance tests!

\(p\) value gives interpretable, continuous information

Not a trivial problem

This CI is often used, with potentially problematic CIs are reported without note!

e.g., Cumming, Sherar, Gammon, Standage, & Malina, 2012; Gilroy & Pearce 2014; Hamerman & Morewedge,2015; Lahiri, Maloney, Rogers, & Ge, 2013; Hamerman & Morewedge, 2015; Todd, Vurbic, & Bouton, 2014; Winter et al., 2014

Do users know how to interpret it?

A Bayesian prior/posterior

plot of chunk unnamed-chunk-11

A Bayesian prior/posterior

plot of chunk unnamed-chunk-12

Features of the procedures

\(p\) value

  • Tells you when an observed value is high or low relative to a sampling distribution
  • Can warn you of model fit issues
  • Can be interpreted in a continuous manner; doesn't depend on \(\alpha\)
  • Has monotone relationship with the \(F\) statistic

Bayesian posterior

  • Conditions on data and model to obtain reasonable parameter estimates
  • Allows inclusion of prior information
  • Provides for computation of mutually consistent credible intervals
  •   ...and all other Bayesian benefits...

Features of the procedures

Confidence intervals

  • Provides dichotomous, difficult to interpret inference
  • Not unique (may be good, may be bad)
  • Cannot be interpreted as "plausibility" or "precision" intervals
  • Hidden relationship with \(F\) statistic
  • Hidden relationship with underlying significance tests

CIs lack the desired properties of \(p\) values and posteriors

There is no free lunch!

Advocates of CIs:

  • Want to replace significance tests with CIs
  • Don't want to interpret CIs as significance tests
  • Essentially promote CIs as Bayesian without priors

But you cannot understand CIs without significance tests!

Neyman (1957) and frequentist theory

Statistics is not about reasonable belief!

"[Statistical inferences are] certainly not any sort of 'reasoning', at least not in the sense in which this word is used in other instances; they are acts of will." (1957)
"it is not suggested that we can `conclude' that [the interval contains $\theta$], nor that we should 'believe' that [the interval contains $\theta$]...[we] decide to behave as if we actually knew that the true value [is in the interval]. This is done as a result of our decision and has nothing to do with 'reasoning' or 'conclusion'. The reasoning ended when the [CI procedure was derived]. The above process [of using CIs] is also devoid of any 'belief ' concerning the value [] of [$\theta$].'' (1941)

Neyman on statistical philosophy and CIs.

Two theories, very different logic.

  • Frequentist CI theory
    • Gives you a single dichotomous decision with fixed error rate.
    • MIGHT give you good average rates of rejecting false values
    • ...and nothing more.
  • Bayesian theory
    • MUST respect the precision in the data (likelihood)
    • CAN be interpreted in terms of rational belief (with limits)
    • Does NOT impose dichotomous actions on its users (unless they want them)

Treating frequentist intervals as Bayesian intervals ignores limitations of both.

When does it matter?

Often the CIs are similar. When they differ:

  • Non-regular problems
    • Data bounds depend on parameters
  • Bounded parameter space
  • When conditional/unconditional analysis mismatch

But no general theory (see Casella, 1992)


What does this mean for teaching statistics?

  • If you want Bayes, teach Bayes!
  • If you want to teach CIs, ground them in significance tests
  • Avoid any CI-centric approach.

Confidence intervals cannot replace Bayes or significance tests.