Screen-2BShot-2B2015-05-10-2Bat-2B09.22.12

Visualizing statistical distributions with javascript

For the past few years, I’ve been developing and using a library I created that allows me to easily generate visualizations of statistical distributions for teaching. One can specify a distribution along with a parametrization, and the library sees it and generates a table containing all the distributions, which gives links to interactive plots that allow anyone to see how changing the parameters affects the distribution. In addition, clicking on the plot allows finding areas under the distribution. Users can switch between PDF and CDF views. I’ve now opened the code on github.

You can also link directly to a visualization using URL parameters. For instance:

http://learnbayes.org/demo/stat-distributions-js/distributionDisplay.html?dist=normal&ptzn=2&plotxrng=50,150&rangesLo=50,3&rangesHi=150,45&starts=100,15

See the live demo and the github repository for more details.

Example screenshots:

The fallacy of placing confidence in confidence intervals (version 2)

I, with my coathors, have submitted a new draft of our paper “The fallacy of placing confidence in confidence intervals”. This paper is substantially modified from its previous incarnation. Here is the main argument:

“[C]onfidence intervals may not be used as suggested by modern proponents because this usage is not justified by confidence interval theory. If used in the way CI proponents suggest, some CIs will provide severely misleading inferences for the given data; other CIs will not. Because such considerations are outside of CI theory, developers of CIs do not test them, and it is therefore often not known whether a given CI yields a reasonable inference or not. For this reason, we believe that appeal to CI theory is redundant in the best cases, when inferences can be justified outside CI theory, and unwise in the worst cases, when they cannot.”

The document, source code, and all supplementary material is available here on github.

Guidelines for reporting confidence intervals

I’m working on a manuscript on confidence intervals, and I thought I’d share a draft section on the reporting of confidence intervals. The paper has several demonstrations of how CIs may, or may not, offer quality inferences, and how they can differ markedly from credible intervals, even ones with so-called “non-informative” priors.

Guidelines for reporting confidence intervals

Report credible intervals instead. We believe any author who chooses to use confidence intervals should ensure that the intervals correspond numerically with credible intervals under some reasonable prior. Many confidence intervals cannot be so interpreted, but if the authors know they can be, they should be called “credible intervals”. This signals to readers that they can interpret the interval as they have been (incorrectly) told they can interpret confidence intervals. Of course, the corresponding prior must also be reported. This is not to say that one can’t also call them confidence intervals if indeed they are; however, readers are likely more interested in the post-data properties of the procedure — not the coverage — if they are interested arriving at substantive conclusions from the interval.

Do not use procedures whose Bayesian properties are not known. As Casella (1992) pointed out, the post-data properties of a procedure are necessary for understanding what can be inferred from an interval. Any procedure whose Bayesian properties have not been explored can have properties that make it unsuitable for post-data inference. Procedures whose properties have not been adequately studied are inappropriate for general use.

Warn readers if the confidence procedure does not correspond to a Bayesian procedure. If it is known that a confidence interval does not correspond to a Bayesian procedure, warn readers that the confidence interval cannot be interpreted as having a X% probability of containing the parameter, that it cannot be interpreted in terms of the precision of measurement, and that cannot be said to contain the values that should be taken seriously: the interval is merely an interval that, prior to sampling, had a X% probability of containing the true value. Authors using confidence intervals have a responsibility to keep their readers from invalid inferences if they choose to use them, and it is almost sure that readers will misinterpret them without a warning (Hoekstra et al, 2014).

Never report a confidence interval without noting the procedure and the corresponding statistics. As we have described, there are many different ways to construct confidence intervals, and they will have different properties. Some will have better frequentist properties than others; some will correspond to credible intervals, and others will not. It is unfortunately common for authors to report confidence intervals without noting how they were constructed. As can be seen from the examples we’ve presented, this is a terrible practice because without knowing which confidence intervals was used, it is unclear what can be inferred. A narrow interval could correspond to very precise information or very imprecise information depending on which procedure was used. Not knowing which procedure was used could lead to very poor inferences. In addition, enough information should be presented so that any reader can compute a different confidence interval or credible interval. In most cases, this is covered by standard reporting practices, but in other cases more information may need to be given.

Consider reporting likelihoods or posteriors instead. An interval provides fairly impoverished information. Just as proponents of confidence intervals argue that CIs provide more information than a significance test (although this is debatable for many CIs), a likelihood or a posterior provides much more information than an interval. Recently, Cumming (2014) [see also here] has proposed so-called “cat’s eye” intervals which are either fiducial distributions or Bayesian posteriors under a “non-informative” prior (the shape is the likelihood, but he interprets the area, so it must be a posterior or a fiducial distribution). With modern scientific graphics so easy to create, along with the fact that likelihoods are often approximately normal, we see no reason why likelihoods and posteriors cannot replace intervals in most circumstances. With a likelihood or a posterior, the arbitrariness of the confidence or credibility coefficient is avoided altogether.