Category Archives: confidence intervals

Screen-2BShot-2B2015-08-10-2Bat-2B20.05.24

On radical manuscript openness

One of my papers that has attracted a lot of attention lately is “The Fallacy of Placing Confidence in Confidence Intervals,” in which we describe some of the fallacies held by the proponents and users of confidence intervals. This paper has been discussed on twitterreddit, on blogs (eg, here and here), and via email with people who found the paper in various places.  A person unknown to me has used the article as the basis for edits to the Wikipedia article on confidence intervals. I have been told that several papers currently under review cite it. Perhaps this is a small sign that traditional publishers should be worried: this paper has not been “officially” published yet.


I am currently wrapping up the final revisions on the paper, which has been accepted pending minor revisions at Psychonomic Bulletin & Review. The paper has benefited from an extremely public revision process. When I had a new major version to submit, I published the text and all code on github, and shared it via social media. Some of resulting discussions have been positive, others negative; some useful and enlightening, others not useful and frustrating. Most scientific publications almost exclusively reflect input from the coauthors and the editors and reviewers. This manuscript, in contrast, has been influenced by scores of people I’ve never met, and I think the paper is better for it.

This is all the result of my exploring ways to make my writing process more open, which led to the idea of releasing successive major versions of the text and R code on github with DOIs. But what about after it is published? How can manuscript openness continue after the magic moment of publication?

One of the downsides of the traditional scientific publishing model is that once the work is put into a “final” state, it becomes static. The PDF file format in which articles find their final form  and in which they are exchanged and read  enforces certain rigidity, a rigor mortis. The document is dead and placed behind glass for the occasional passerby to view. It is of course good to have a citable version of record; we would not, after all, want a document to be a moving target, constantly changing on the whim of the authors. But it seems like we can do better than the current idea of a static, final document, and I’d like to try.

I have created a website for the paper that, on publication, will contain the text of the paper in its entirety, free to read for anyone. It also contains extra material, such as teaching ideas and interactive apps to assist in understanding the material in the paper. The version of the website corresponding to the “published” version of the paper will be versioned on github, along with the paper. But unlike the paper at the journal, a website is flexible, and I intend to take advantage of this in several ways.

First, I have enabled hypothes.is annotation across the entire text. If you open part of the text and look in the upper right hand corner, you will see three icons that can be used to annotate the text:

The hypothes.is annotation tools.

Moreover, highlighting a bit of text will open up further annotation tools:

Highlighting the text brings up more annotation tools.

Anyone can annotate the document, and others can see the annotations you make. Am I worried that on the Internet, some people might not add the highest quality annotations? A bit. But my curiosity to see how this will be used, and the potential benefits, outweighs my trepidation.

Second, I will update the site with new information, resources, and corrections. These changes will be versioned on github, so that anyone can see what the changes were. Due to the fact that the journal will have the version of record, there is no possibility of “hiding” changes to the website. So I get the best of both worlds: the trust that comes with having a clear record of the process, with the ability to change the document as the need arises. And the entire process can be open, through the magic of github.
Third, I have enabled together.js across every page of the manuscript. together.js allows collaboration between people looking at the same website. Unlike hypothes.is, together.js is meant for small groups to privately discuss the content, not for public annotation. This is mostly to explore its possibilities for teaching and discussion, but I also imagine it holds promise for post-publication review and drafting critiques of the manuscript.
The together.js collaboration tools allow making your mouse movements and clicks visible to others, text chat, and voice chat.
Critics could discuss the manuscript using together.js, chatting about the content of the manuscript. The communication in together.js is peer-to-peer, ensuring privacy; nothing is actually being managed by the website itself, except for making the collaboration tools available.

The best part of this is that it requires no action or support from the publisher. This is essentially a sophisticated version of a pre-print, which I would release anyway. We don’t have to wait for the publishers to adopt policies and technologies friendly for post-publication peer review; we can do it ourselves. All of these tools are freely available, and anyone can use them. If you have any more ideas for tools that would be useful for me to add, let me know; the experiment hasn’t even started yet!

Check out “The Fallacy of Placing Confidence in Confidence Intervals,” play around with the tools, and let me know what you think.

The fallacy of placing confidence in confidence intervals (version 2)

I, with my coathors, have submitted a new draft of our paper “The fallacy of placing confidence in confidence intervals”. This paper is substantially modified from its previous incarnation. Here is the main argument:

“[C]onfidence intervals may not be used as suggested by modern proponents because this usage is not justified by confidence interval theory. If used in the way CI proponents suggest, some CIs will provide severely misleading inferences for the given data; other CIs will not. Because such considerations are outside of CI theory, developers of CIs do not test them, and it is therefore often not known whether a given CI yields a reasonable inference or not. For this reason, we believe that appeal to CI theory is redundant in the best cases, when inferences can be justified outside CI theory, and unwise in the worst cases, when they cannot.”

The document, source code, and all supplementary material is available here on github.

Guidelines for reporting confidence intervals

I’m working on a manuscript on confidence intervals, and I thought I’d share a draft section on the reporting of confidence intervals. The paper has several demonstrations of how CIs may, or may not, offer quality inferences, and how they can differ markedly from credible intervals, even ones with so-called “non-informative” priors.

Guidelines for reporting confidence intervals

Report credible intervals instead. We believe any author who chooses to use confidence intervals should ensure that the intervals correspond numerically with credible intervals under some reasonable prior. Many confidence intervals cannot be so interpreted, but if the authors know they can be, they should be called “credible intervals”. This signals to readers that they can interpret the interval as they have been (incorrectly) told they can interpret confidence intervals. Of course, the corresponding prior must also be reported. This is not to say that one can’t also call them confidence intervals if indeed they are; however, readers are likely more interested in the post-data properties of the procedure — not the coverage — if they are interested arriving at substantive conclusions from the interval.

Do not use procedures whose Bayesian properties are not known. As Casella (1992) pointed out, the post-data properties of a procedure are necessary for understanding what can be inferred from an interval. Any procedure whose Bayesian properties have not been explored can have properties that make it unsuitable for post-data inference. Procedures whose properties have not been adequately studied are inappropriate for general use.

Warn readers if the confidence procedure does not correspond to a Bayesian procedure. If it is known that a confidence interval does not correspond to a Bayesian procedure, warn readers that the confidence interval cannot be interpreted as having a X% probability of containing the parameter, that it cannot be interpreted in terms of the precision of measurement, and that cannot be said to contain the values that should be taken seriously: the interval is merely an interval that, prior to sampling, had a X% probability of containing the true value. Authors using confidence intervals have a responsibility to keep their readers from invalid inferences if they choose to use them, and it is almost sure that readers will misinterpret them without a warning (Hoekstra et al, 2014).

Never report a confidence interval without noting the procedure and the corresponding statistics. As we have described, there are many different ways to construct confidence intervals, and they will have different properties. Some will have better frequentist properties than others; some will correspond to credible intervals, and others will not. It is unfortunately common for authors to report confidence intervals without noting how they were constructed. As can be seen from the examples we’ve presented, this is a terrible practice because without knowing which confidence intervals was used, it is unclear what can be inferred. A narrow interval could correspond to very precise information or very imprecise information depending on which procedure was used. Not knowing which procedure was used could lead to very poor inferences. In addition, enough information should be presented so that any reader can compute a different confidence interval or credible interval. In most cases, this is covered by standard reporting practices, but in other cases more information may need to be given.

Consider reporting likelihoods or posteriors instead. An interval provides fairly impoverished information. Just as proponents of confidence intervals argue that CIs provide more information than a significance test (although this is debatable for many CIs), a likelihood or a posterior provides much more information than an interval. Recently, Cumming (2014) [see also here] has proposed so-called “cat’s eye” intervals which are either fiducial distributions or Bayesian posteriors under a “non-informative” prior (the shape is the likelihood, but he interprets the area, so it must be a posterior or a fiducial distribution). With modern scientific graphics so easy to create, along with the fact that likelihoods are often approximately normal, we see no reason why likelihoods and posteriors cannot replace intervals in most circumstances. With a likelihood or a posterior, the arbitrariness of the confidence or credibility coefficient is avoided altogether.