There is a currently fashionable way of describing Bayes factors that resonates with experimental psychologists. I hear it often, particularly as a way to describe a particular use of Bayes factors. For example, one might say, “I needed to prove the null, so I used a Bayes factor,” or “Bayes factors are great because with them, you can prove the null.” I understand the motivation behind this sort of language but please: stop saying one can “prove the null” with Bayes factors.
I also often hear other people say “but the null is never true.” I’d like to explain why we should avoid saying both of these things.
|Null hypotheses are tired of your jibber jabber
Why you shouldn’t say “prove the null”
Statistics is complicated. People often come up with colloquial ways of describing what a particular method is doing: for instance, one might say a significance tests give us “evidence against the null”; one might say that a “confidence interval tells us the 95% most plausible values”; or one might say that a Bayes factor helps us “prove the null.” Bayesians often are quick to correct misconceptions that people use to justify their use of classical or frequentist methods. It is just as important to correct misconceptions about Bayesian methods.
In order to understand why we shouldn’t say “prove the null”, consider the following situation: You have a friend who claims that they can affect the moon with their mind. You, of course, think this is preposterous. Your friend looks up at the moon and says “See, I’m using my abilities right now!” You check the time.
You then decide to head to the local lunar seismologist, who has good records of subtle moon tremors. You ask her whether about what happened at the time your friend was looking at the moon, and she reports back to you that lunar activity at that time was stronger than it typically is 95% of the time (thus passes the bar for “statistical significance”).
Does this mean that there is evidence for your friend’s assertion? The answer is “no.” Your friend made no statement about what one would expect from the seismic data. In fact, your friend’s statement is completely unfalsifiable (as is the case with the typical “alternative” in a significance test, (muneq0)).
But consider the following alternative statements your friend could have made: “I will destroy the moon with my mind”; “I will make very large tremors (with magnitude (Y))”; “I will make small tremors (with magnitude (X)).” How do we now regard your friend’s claims in light of the what happened?
- “I will destroy the moon with my mind” is clearly inconsistent with the data. You (the null) are supported by an infinite amount, because you have completely falsified his statement that he would destroy the moon (the alternative).
- “I will make very large tremors (with magnitude (Y))” is also inconsistent with the data, but if we allow a range of uncertainty around his claim, may not be completely falsified. Thus you (the null) are supported, but not by as much in the first situation.
- “I will make small tremors (with magnitude (X))” may support you (the null) or your friend (the alternative), depending on how the magnitude predicted and observed.
Here we can see that the support for the null depends on the alternative at hand. This is, of course, as it must be. Scientific evidence is relative. We can never “prove the null”: we can only “find evidence for a specified null hypothesis against a reasonable, well-specified alternative”. That’s quite a mouthful, it’s true, but “prove the null” creates misunderstandings about Bayesian statistics, and makes it appear that it is doing something it cannot do.
In a Bayesian setup, the null and alternative are both models and the relative evidence between them will change based on how we specify them. If we specify them in a reasonable manner, such that the null and alternative correspond to relevant theoretical viewpoints or encode information about the question at hand, the relative statistical evidence will be informative for our research ends. If we don’t specify reasonable models, then the relative evidence between the models may be correct, but useless.
We never “prove the null” or “compute the probability of the null hypothesis”. We can only compare a null model to an alternative model, and determine the relative evidence.
[See also Gelman and Shalizi (2013) and Morey, Romeijn and Rouder (2013)]
Why you shouldn’t say “the null is never true”
A common retort to tests including a point null (often called a ‘null’ hypothesis) is that “the null is never true.” This backed up by four sorts of “evidence”:
- A quote from an authority: “Tukey or Cohen said so!” (Tukey was smart, but this is not an argument.)
- Common knowledge / “experience”: “We all know the null is impossible.” (This was Tukey’s “argument”)
- Circular: “The area under a point in a density curve is 0.” (Of course if your model doesn’t have a point null, the point null will be impossible.)
- All models are “false” (even if this were true — I think it is actually a category error — it would equally apply to all alternatives as well)
The most attractive seems to be the second, but it should be noted that people almost never use techniques that allow finding evidence for null hypotheses. Under these conditions, how is one determining that the null is never true? If a null were ever true, we would not be able to accumulate evidence for it, so the second argument definitely has a hint of circularity as well.
When someone says “The null hypothesis is impossible/implausible/irrelevant”, what they are saying in reality is “I don’t believe the null hypothesis can possibly be true.” This is a totally fine statement, as long as we recognize it for what it is: an a priori commitment. We should not pretend that it is anything else; I cannot see any way that one can find universal evidence for the statement “the null is impossible”.
If you find the null hypothesis implausible, that’s OK. Others might not find it implausible. It is ultimately up to substantive experts to decide what hypotheses they want to consider in their data analysis, and not up to methodologists or statisticians to decide to tell experts what to think.
Any automatic behavior — either automatically rejecting all null hypothesis, or automatically testing null hypotheses — is bad. Hypothesis testing and estimation should be considered and deliberate. Luckily, Bayesian statistics allows both to be done in a principled, coherent manner, so informed choices can be made by the analyst and not by the restrictions of the method.