Monthly Archives: September 2014

Embedding RData files in Rmarkdown files for more reproducible analyses

For those of us interested in reproducible analysis, Rmarkdown is a great way of communicating our code to other researchers. Rstudio, in particular, makes it very easy to create attractive HTML document containing text, code, and figures, which can then be sent to colleagues or put on the internet for anyone to see. If you aren’t using Rmarkdown for your statistical analyses, I recommend you start; you’ll never go back to simple script files again (and your colleagues won’t want you to).

In this post, I describe how to improve your Rmarkdown by embedding data that can be downloaded by anyone viewing the document in a modern browser with javascript enabled. For a quick look, see the example Rmd file and resulting HTML file.

One of the drawbacks of Rmakdown, from a reproducible analysis perspective, is that the data is not a part of the document itself. Typically, an Rmarkdown file will use R code to load a file from your disk, and when you send the resulting HTML file to a colleague, or put it on the internet, that file is separate. It must be sent in an email or placed on a server to be downloaded.

This raises the possibility that the data could get separated from the code, and I think this is a terrible thing for reproducible analysis. In my mind, the data and the document and data should travel together as a single document. What we would like is a method of encoding R data into the HTML file such that any user who has access to the HTML file can download it, without even having access to the internet.

As it turns out, files can be encoded in an HTML document via the URI data scheme. All we need is an R function that encodes the data, and produces a link to enable downloading the data.

setDownloadURI = function(list, filename = stop("'filename' must be specified"), textHTML = "Click here to download the data.", fileext = "RData", envir = parent.frame()){
require(base64enc,quietly = TRUE)
divname = paste(sample(LETTERS),collapse="")
tf = tempfile(pattern=filename, fileext = fileext)
save(list = list, file = tf, envir = envir)
filenameWithExt = paste(filename,fileext,sep=".")

uri = dataURI(file = tf, mime = "application/octet-stream", encoding = "base64")
cat("<a style='text-decoration: none' id='",divname,"'></a>
var a = document.createElement('a');
var div = document.getElementById('",divname,"');
a.setAttribute('href', '",uri,"');
a.innerHTML = '",textHTML,"' + ' (",filenameWithExt,")';
if (typeof != 'undefined') {
a.setAttribute('download', '",filenameWithExt,"');
a.setAttribute('onclick', 'confirm("Your browser does not support the download HTML5 attribute. You must rename the file to ",filenameWithExt," after downloading it (or use Chrome/Firefox/Opera). ")');

The first argument of the function, list, is a character vector containing names of variables to save in the RData file.

Once this function is declared, all we need to do is call it in our Rmd file. If we use the argument results = 'asis' in our R code block, it will inject the appropriate HTML code into our compiled HTML document to allow a download of the embedded data as an RData file, and anyone with the HTML file can download it.

Unfortunately, blogger will not allow me to embed the data into a post; therefore, a complete, self-contained example Rmd file can be found here, and the resulting HTML file can be found here.

Keep in mind, however, that the data file is actually embedded in the HTML file. This means that the resulting HTML file can be very large, if your data file is large. Also consider that data are encoded in base64, which increases the size of the file by about a third over the equivalent RData binary file. For very large data sets, one might consider hosting them outside of the HTML file; but for many purposes, the technique I describe will improve the ease with which you can share reproducible analyses.

BayesFactor version 0.9.8 released to CRAN

BayesFactor version 0.9.8 has been released on CRAN! This is a both a bug fix and feature update. From the NEWS:

  •  Fixed bugs in model enumeration code
  •  Fixed bug leading to wrong computation of number of covariate when interactions between continuous variables were included
  •  Corrected typos/old information in the documentation 
  •  Fixed a memory allocation bug that affected computing Bayes factors with lots of data 
  •  Added meta-analytic Bayes factor for t tests (see meta.ttestBF)
  •  Fixed bug in ttestBF that yielded Bayes factor of NaN for very extreme posterior interval probabilities
  •  Fixed several bugs causing infinite integrals; generally improved integration
  •  Added check to ensure no missing data before analyses
  •  Added callbacks for access by third-party interfaces

See also the new entry in the manual for meta-analytic t tests. In addition to these changes, most of the code for contingency table analysis has been added; these functions will be released in the next update.