# The p curve is not what you think it is

## What is a $$p$$ curve analysis?

“[The] p-curve [is] a way to distinguish between selective reporting and truth. P-curve is the distribution of statistically significant p values for a set of independent findings. Its shape is diagnostic of the evidential value of that set of findings. We say that a set of significant findings contains evidential value when we can rule out selective reporting as the sole explanation of those findings.” (Simonsohn et al 2014, p. 534)

## What is a $$p$$ curve analysis?

• Input: test statistics ($$z$$, $$t$$, $$F$$, etc.)
• Output: Histogram and two (main) tests
• Heuristic: if significant findings are too close to .05, something is wrong.

## Two main critiques

• Tests are constructed incorrectly
• Result: incorrect assessment of evidence
• No justification for meta-analytic grouping
• Result: Debates over “proper” groupings are undecidable
• (…but, other critiques too)

## Building a significance test

• Test statistic: contains evidence relevant to parameter of interest
• Sampling distribution: Distribution of test statistic under hypotheses
• p value: probability of obtaining as much evidence against a hypothesis, assuming it is true

## Two $$z$$ tests

Test 1: “evidential value”

• Is the test statistic surprisingly large among significant test statistics if $$\delta=0$$? $$\rightarrow$$ $$\delta$$ is larger than 0

Test 2: “Lack of evidential value”

• Is the test statistic surprisingly small among significant test statistics if $$\delta^2\geq 2.34/N$$? $$\rightarrow$$ $$\delta^2$$ is smaller than $$2.34/N$$

## Combining p values

Test 1: “evidential value”

1. Compute all right-tailed p values
2. Transform to $$\chi^2$$/normal deviates (under null)
3. Average
4. Compute overall one-tailed p value

Test 2: “Lack of evidential value”

1. Compute all left-tailed p values
2. Transform to $$\chi^2$$/normal deviates (under null)
3. Average
4. Compute overall one-tailed p value

## Statistical problems

• Problem: Failure to respect evidential asymmetry of p values
• Result: Over-sensitivity to values near $$\alpha$$
• Problem: Failure to use same test statistic for both tests
• Result: Evidence in data is not respected

## What is a “set”?

“If a set of studies can be meaningfully partitioned into subsets, it is the job of the individual who is p curving to determine if such partitioning should be performed, in much the same way that it is the job of the person analyzing experimental results to decide if a given effect should be tested on all observations combined or if a moderating factor is worth exploring. Heterogeneity, then, poses a challenge of interpretation, not of statistical inference.” (Simonsohn et al, 2014 p. 536)

## What is a “set”?

But what is the statistical inference?

• Study 1: Gravitational waves, $$z=3.1, p<.0025$$

p curve “Evidential value” “Lack of evidential value”
Their app (v. 4.052) p=0.039 p=0.826
LR test p=0.039 p=0.826

## What is a “set”?

But what is the statistical inference?

• Study 1: Gravitational waves, $$z=3.1, p<.0025$$
• Study 2: Power posing, $$z=2, p=.045$$

p curve “Evidential value” “Lack of evidential value”
Their app (v. 4.052) p=0.382 p=0.292
LR test p=0.175 p=0.486

## What is a “set”?

You have to justify the joining, not the splitting!

But without a process, all sets have equal claim. (Reference class problem; Venn, 1888)