# Some thoughts on replication

In a recent blog post, Simine Vazire discusses the problem with the logic of requiring replicators to explain when they reach different conclusions to the original authors. She frames it, correctly, it as asking people to over-interpret random noise. Vazire identifies the issue as a problem with our thinking: that we under-estimate randomness. I’d like to explore other ways in which our biases interferes with clear thinking about replication, and perhaps suggest some ways we can clarify it.

I suggest two ways in which we fool ourselves in thinking about replication: the concept of “replication” is unnecessarily asymmetric and an example of overly-linear thinking, and lack of distinction in practice causing a lack of distinction in theory.

### Fooled by language: the asymmetry of “replication”

Imagine that a celebrated scientist, Dr. Smith, dies, and within her notes is discovered a half-written paper. Building on her previous work, this paper clearly lays out an creative experiment to test a theory. To avoid any complications such as post hoc theorising, assume the link between the theory and experiment is clear and follows from her previous work. On the Dr. Smith’s computer, along with the paper, is found a data set. Dr. Smith’s colleagues decide to finish the paper and publish it in her honor.
Given the strange circumstances of this particular paper’s history, another scientist, Dr. Jones, decides to replicate the study. Dr. Jones does his best to match the methods described in the paper, but obtains a different result. Dr. Jones tries to publish, but editors and reviewers demand an explanation: why is the replication different? Dr. Jones’ result is doubted until he can explain the difference.
Now suppose — unbeknownst to everyone — that the first experiment was never done. Dr. Smith simulated the data set as a pedagogical exercise to learn a new analysis technique. She never told anyone because she did not anticipate dying, of course, but everyone assumed the data was real. The second experiment is no replication at all; it is the first experiment done.
Does this change the evidential value of the Dr. Jones’ experiment at all? Of course not. The fact that the Dr. Smith’s experiment was not done is irrelevant to the evidence in Dr. Jones’ experiment. The evidence contained in a first experiment is the same, regardless of whether a second experiment is done (assuming, of course, that the methods are all sound). “Replication” is a useless label.
Calling the Dr. Jones’ experiment a “replication” focuses our attention on wrong relationship. One replicates an actual experiment that was done. However, the evidence that an experiment provides for a theory depends not on the relationship between the experiment’s methods and an experiment that was done in the past. Rather, the evidence depends on the relationship between the experiment’s methods and a hypothetical experiment that is designed to test the theory. One cannot replicate a hypothetical experiment, of course, because hypothetical experiments cannot be performed. Instead, one realizes a hypothetical experiment, and there may be several realizations of the same hypothetical experiment.
Thinking in this manner eliminates the asymmetric relationship between the two experiments. If both experiments can be realizations of the same hypothetical experiment designed to test a theory, which one came first is immaterial.* The burden is no longer on the second experimenter to explain why the results are different; the burden is on the advocates of the theory to explain the extant data, which now includes two differing results. (Vazire’s caution about random noise still applies here, as we still don’t want to over-explain differences; it is assumed that any post hoc explanation will be tested.)
 Three hypothetical experiments that are tests of the same theory, along with five actually-run experiments. Hypothetical experiments A and B may be so-called “conceptual replications” of A, or tests of other aspects of the theory.
The conceptual distinction between a hypothetical experiment — that is, the experiment that is planned — and the actual experiment is critical. That hypothetical experiment can be realized in many ways: different times, different labs, different participants, even different stimuli, if these are randomly generated or are selected from a large collection of interchangeable stimuli. Importantly, when the first realization of the hypothetical experiment is done, it does not get methodological priority. It is temporally first, but is simply one way in which the experiment could have been realized.
Conceptualizing the scientific process in this way prevents researchers who did an experiment first from claiming that their experiment takes priority. If you are “replicating” their actual experiment, then it makes sense that your results will get compared to theirs, in the same way a “copy” might be compared to the “original”. But conceptually, the two are siblings, not parent and child.

### Lack of distinction in practice vs. theory

The critical distinctions above is the distinction between a hypothetical experiment and an actual one. I think this is an instance where modern scientific practice causes problems. Although the idea of a hypothetical experiment arises in any experimental planning process, consider the typical scientific paper, which has an introduction, then a brief (maybe even just a few sentences!) segue describing the logic of the experiment, into the methods of an actually-performed experiment.
This structure means that the hypothetical experiment and the actual experiment are impossible to disentangle. This is one of the reasons, I think, why we talk about “replication” so much, rather than performing another realization of the hypothetical experiment. We have no hypothetical experiment to work from, because it is almost completely conflated with the actual experiment.
One initiative that will help with this problem is public pre-registration. A hypothetical experiment is laid out in an pre-registration document. Note that from a pre-registration document, the structure in the figure becomes clear. If someone posts a public pre-registration document, why does it matter who does the experiment first (aside from the ethical issue of “scooping”, etc)? No one is “replicating” anyone else; they are each separately realizing the hypothetical experiment that was planned.
But in current practice, which does not typically distinguish a hypothetical experiment and an actual one, the only way to add to the scientific literature about hypothetical experiment A is to try to “redo” one of its realizations. Any subsequent experiment is then logically dependent on the first actually performed experiment, and the unhelpful asymmetry crops up again.
I think it would be useful to have a different word than “replication”, because the connotation of the word “replication”, as a fascimile or a copy of something already existing, focuses our attention in unhelpful ways.
* Although logically which came first is immaterial, there may be statistical considerations to keep in mind, like the “statistical significance filter” that is more likely to affect a first study than a second. Also, as Vazire points out in the comments, the second study has fewer researcher degrees of freedom.

## 5 thoughts on “Some thoughts on replication”

1. That's a very refreshing take on the problem and it is actually far closer to my views than some people seem to realise. I don't think though that replication is a bad term. I think it is central to good science to repeat experiments, both fairly directly to test specific parameters and conceptually to test theories.

But you are completely right that the dichotomy hurts. As I said on my blog, there should be no "replicators" and whatever-the-others-are-called. All scientists should be doing replication and novel (insofar that exists) research as part of their daily routine. And it is *precisely* my point that we should put theories to the test. This can mean doing the same experiment as a previous study but more often than not it means doing a different, better experiment. To me the whole many labs thing is mostly cargo cult. It quite literally is with the exception that most "replicators" seem to not believe that the cargo planes will actually return…

I also completely agree with you that temporal precedent isn't critical – but you wouldn't know it from how this issue is debated. People talk about direct replication as some sort if holy grail which it really isn't. People should do the best experiment they can do under their circumstances.

And then you get strawman arguments like the one that inspired your post. I for one never said that replicators need to prove why they failed. That would be circular reasoning. (I think Jason Mitchell said something along those lines although I'm willing to give him the benefit of the doubt that this might have come out differently to how he intended it – but maybe not).

Replicators however do have to demonstrate that they can do the experiment properly. Now, since we established that I don't actually believe "replicators" exist, this really means *everyone* who publishes a study should provide evidence that the experiment is solid. It doesn't matter if it is a replication or not.

2. Why not use "realizations" rather than replications? It seems as if you naturally used realizations within this post anyways. And you are correct that realization avoids some of the undesirable implications of the word replication.

3. I think in the context of Richard's argument it would make most sense to just call them "experiments to test a hypothesis"?

4. Aha, I now know why you said "you agree with me of course" when I said something similar at the APS session last week on reproducibility. (I took you to mean that of course I would say sensible things!) Popper said something similar in his 1959 Logic of Scientific Discovery (p 66) – a "fact" or "result" is actually a low level theory of the conditions under which a regularity holds; he called this the "falsifying hypothesis." It needs to be tested and corroborated before being accepted. Thus, "direct replication" is actually theory testing.

5. Aha, I now know why you said "you agree with me of course" when I said something similar at the APS session last week on reproducibility. (I took you to mean that of course I would say sensible things!) Popper said something similar in his 1959 Logic of Scientific Discovery (p 66) – a "fact" or "result" is actually a low level theory of the conditions under which a regularity holds; he called this the "falsifying hypothesis." It needs to be tested and corroborated before being accepted. Thus, "direct replication" is actually theory testing.