One of the most frequently-asked questions about the BayesFactor package is how to do multiple comparisons; that is, given that some effect exists across factor levels or means, how can we test whether two specific effects are unequal. In the next two posts, I’ll explain how this can be done in two cases: in Part 1, I’ll cover tests for equality, and in Part 2 I’ll cover tests for specific order-restrictions.

Before we start, I will note that these methods are only meant to be used for pre-planned comparisons. They should not be used for post hoc comparisons.

## An Example

Suppose we are interested in the basis for feelings of moral disgust. One prominent theory, from the embodied cognition point of view, holds that feelings of moral disgust are extensions of more basic feelings of disgust: disgust for physical things, such as rotting meat, excrement, etc (Schnall et al, 2008; but see also Johnson et al., 2014 and Landy & Goodwin, in press). Under this theory, moral disgust is not only metaphorically related to physical disgust, but may share physiological responses with physical disgust.

Suppose we wish to experimentally test this theory, which predicts that feelings of physical disgust can be “transferred” to possible objects of moral disgust. We ask 150 participants to fill out a questionnaire that measures the harshness of their judgments of undocumented migrants. Participants are randomly assigned to one of three conditions, differing by the odor present in the room: a pleasant scent associated with cleanliness (lemon), a disgusting scent (sulfur), and a control condition in which no unusual odor is present. The dependent variable is the score on the questionnaire, which ranges from 0 to 50 with higher scores representing harsher moral judgment.

Hypothetical data, simulated for the sake of example, can be read into R using the `url()`

function:

# Read in the data from the learnbayes.org

disgust_data = read.table(url('http://www.learnbayes.org/disgust_example.txt'),header=TRUE)

A boxplot and means/standard errors reveal that the effects appear to be in the predicted direction:

(note that the axes are different in the two plots, so that the standard errors can be seen)

And we can perform a classical ANOVA on these data:

# ANOVA

summary(aov(score ~ condition, data = disgust_data))

`## Df Sum Sq Mean Sq F value Pr(>F) `

## condition 2 263 131.4 2.91 0.058 .

## Residuals 147 6635 45.1

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The classical test of the null hypothesis that all means are equal just fails to reach significance at (alpha=0.05).

## A Bayes factor analysis

We can easily perform a Bayes factor test of the null hypothesis using the `BayesFactor`

package. This assumes that the prior settings are acceptable; because this post is about multiple comparisons, we will not explore prior settings here. See `?anovaBF`

for more information.

The `anovaBF`

is a convenience function to perform Bayes factor ANOVA-like analyses. The code for the Bayes factor analysis is almost identical to the code for the classical test:

library(BayesFactor)

bf1 = anovaBF(score ~ condition, data = disgust_data)

bf1

`## Bayes factor analysis`

## --------------

## [1] condition : 0.7738 ±0.01%

##

## Against denominator:

## Intercept only

## ---

## Bayes factor type: BFlinearModel, JZS

The Bayes factor in favor of a condition effect is about `0.774`

, or `1/0.774 = 1.3`

in favor of the null (the “Intercept only” model). This is not strong evidence for either the null or the alternative, which given the moderate p value is perhaps not surprising. It should be noted here that even if the p value had just crept in under 0.05, the Bayes factor would not be appreciably different, which shows the inherent arbitrariness of significance testing.

## Many possible hypotheses?

This analysis is not the end of the story, however. The hypothesis tested above — that all means are different, but with no further specificity — was not really the hypothesis of interest. The hypothesis of interest was more specific. We might consider an entire spectrum of hypotheses, listed in increasing order of constraint:

- (most constrained) The null hypothesis (control = lemon = sulfur)
- (somewhat constrained) Unexpected scents cause the same effect, regardless of type (lemon = sulfur ≠ control; this might occur, for instance, both “clean” and “disgusting” scents prime the same underlying concepts)
- (somewhat constrained) Only disgusting scents have an effect (control = lemon ≠ sulfur)
- (somewhat constrained) Only pleasant scents have an effect (control = sulfur ≠ lemon)
- (unconstrained) All scents have unique effects (control ≠ sulfur ≠ lemon)

The above are all *equality* constraints. We can also specify *order* constraints, such as lemon < control < sulfur. The unconstrained model tested above (control ≠ sulfur ≠ lemon) does not give full credit to this ordering prediction. In the next section, I will show how to test equality constraints. In Part 2 of this post, I will show how to test order constraints.

## Testing equality constraints

To test equality constraints, we must first consider what an equality constraint means. Claiming that an equality constraint holds is the same as saying that your predictions for data would not change if the two conditions are supposed to be the same *had exactly the same label*. If want to to impose the constraint that lemon = sulfur ≠ control, we merely have to **give lemon and sulfur the same label**.

In practice, this means making a new column in the data frame with the required change:

# Copy the condition column that we will change

# We use 'as.character' to avoid using the same factor levels

disgust_data$lemon.eq.sulfur = as.character(disgust_data$condition)

# Change all 'lemon' to 'lemon/sulfur'

disgust_data$lemon.eq.sulfur[ disgust_data$condition == "lemon" ] = 'lemon/sulfur'

# Change all 'sulfur' to 'lemon/sulfur'

disgust_data$lemon.eq.sulfur[ disgust_data$condition == "sulfur" ] = 'lemon/sulfur'

# finally, make the column a factor

disgust_data$lemon.eq.sulfur = factor(disgust_data$lemon.eq.sulfur)

We now have a data column, called `lemon.eq.sulfur`

, that labels the data so that lemon and sulfur have the same labels. We can use this in Bayes factor test:

bf2 = anovaBF(score ~ lemon.eq.sulfur, data = disgust_data)

bf2

`## Bayes factor analysis`

## --------------

## [1] lemon.eq.sulfur : 0.1921 ±0%

##

## Against denominator:

## Intercept only

## ---

## Bayes factor type: BFlinearModel, JZS

The null hypothesis is now preferred by a factor of `1/0.192 = 5.2`

, which is expected given that lemon and sulfur were the *least* similar pair of three means. The null hypothesis accounts for the data better than this constraint.

One of the conveniences of using Bayes factors is if we have two hypotheses that are both tested against the same third hypothesis, we can test the two hypotheses against one another. The `BayesFactor`

package makes this easy; any two `BayesFactor`

objects compared against the same denominator — in this case, the intercept-only null hypothesis — can be combined together:

bf_both_tests = c(bf1, bf2)

bf_both_tests

`## Bayes factor analysis`

## --------------

## [1] condition : 0.7738 ±0.01%

## [2] lemon.eq.sulfur : 0.1921 ±0%

##

## Against denominator:

## Intercept only

## ---

## Bayes factor type: BFlinearModel, JZS

We could, for instance, put all equality-constraint tests into the same object, and then compare them like so:

bf_both_tests[1] / bf_both_tests[2]

`## Bayes factor analysis`

## --------------

## [1] condition : 4.029 ±0.01%

##

## Against denominator:

## score ~ lemon.eq.sulfur

## ---

## Bayes factor type: BFlinearModel, JZS

The fully unconstrained hypothesis, represented by `condition`

, is preferred to the lemon = sulfur ≠ control hypothesis by a factor of about 4.

In the next post, we will use the `posterior()`

function to draw from the posterior of the unconstrained model, which will allow us to test ordering constraints.