Open science for the busy researcher

Part 2: Tools

Richard D. Morey

Psychonomic Society Meeting 2017
Vancouver, BC

What we want


We want tools…

  • …that are useful,
  • …that have more than the gee whiz factor,
  • …that are actively supported,
  • …that increase research robustness.

Four points

  • Laying an open foundation: stubs/project websites
  • Using scripting/reproducible workflows
  • Version control, and not shooting yourself in the foot
  • Hiding in plain sight

Key idea

Project information should be…

  • findable
  • comprehensible
  • useful
  • persistent

…to all those who should have access.

Stubs/project websites

Sometimes you’re not quite ready to share everything.

  • You’re still learning
  • A reviewer just asked about it
  • …but you intend to!

Get a project site!

  • Link it on your poster
  • Link it in your manuscript
  • Put it on social media
  • Give it to colleagues

It doesn’t have to be complete (yet) to be useful.

What it should contain

  • Description of the project
  • Contact information
  • Project meta-data
    • Description of the data set
    • Description of materials
  • Poster, abstract, talk, manuscript draft
  • Stub: a promise of the data

Why a stub?

A stub is a promise.

  • If you can’t share now…
  • …prepare a place to share later…
  • …link it in the paper…
  • …then share when you can…
  • …or when the first person asks.

Blog post: https://goo.gl/ix1jY1

Where to make a stub?

  • GitHub Pages
    • Free website, wiki, storage
    • Linked to GitHub repository
    • Version controlled
  • Open Science Framework
    • Free wiki, storage
    • Version controlled

Data shaping/cleaning

Scripts are the only way to reliably, transparently, reproducibly clean/shape data.

R scripts

  • Read in data from internet
  • Clean data
  • Shape data
  • Plots/tables/analyses

…all automatically.

But what about time?

Benefits of scripts

  • More data; Clean/shape again?
  • Demands of reviewers
  • New data; same script!


To gain major benefits, learn some tidyverse!

R scripts

Rmarkdown

“How’d I make that figure again…?”

Keep your data safe…

Don’t be this person.

Repositories and version control

Online repositories

  • Data safe from you
  • Versioned
    • Simple: OSF
    • Complex: Github
  • Public or private

Example files

https://osf.io/n33es/

Reading data from OSF

## Read data into variable ansur_men

ansur_men = read.delim("https://osf.io/ejg4n/download?version=1")
nrow(ansur_men)
[1] 1774

Reading scripts from OSF

## Clean data

source("https://osf.io/9dahe/download?version=1")

## Analyze it

summary(ansur_men$HEAD_BRTH)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  128.0   148.0   152.0   151.7   155.0   173.0 

Transparency without openness

Research can be transparent without data being open.

Hiding in plain sight

Publicly encrypt data; share scripts!

Blog post: https://goo.gl/chaEKX

Hiding in plain sight

Full example available on OSF.

https://osf.io/vqfw8/

(also on github.)

Encryption

Encryption

Hiding in plain sight

## Define the (wrong) pass key
key_string = "This is the wrong pass key"

## Script to read in the encrypted data reads into variable 'exp1'
source("https://osf.io/73thx/download?version=3")
Error in data_decrypt(cipher, key, nonce): Failed to decrypt

Hiding in plain sight

## Define the right pass key
key_string = "This is a secret passphrase"

## Script to read in the encrypted data reads into variable 'exp1'
source("https://osf.io/73thx/download?version=3")

head(exp1)
      V1   V2     V3    V4   V5     V6    V7 V8   V9
1 sub010 blk0 trl000 cond2 set1 stim10  rsp3  0 9761
2 sub010 blk0 trl001 cond2 set1  stim6  rsp3  0 6420
3 sub010 blk0 trl002 cond2 set1  stim1  rsp1  1 3711
4 sub010 blk0 trl003 cond2 set1  stim3  rsp5  0 4082
5 sub010 blk0 trl004 cond2 set1 stim10 rsp11  0 2109
6 sub010 blk0 trl005 cond2 set1 stim11 rsp11  1 1667

Then: clean it, etc.

Transparency without openness

Now you can…

  • Put data online early (not on your hard drive!)
  • Share data/code selectively with collaborators
  • Be transparent about analyses without sharing data
  • Share with everyone by releasing pass key

Wrapping up

  • Robust practices save time
  • Robust practices grant confidence
  • Robust practices project confidence
  • If you organize your project well, sharing is trivial.