Tuesday, October 6, 2015

Producing Reproducible R Code

A tip in the Google+ Statistics and R community led me to the reprex package for R. Quoting the author (Professor Jennifer Bryan, University of British Columbia), the purpose of reprex is to
[r]ender reproducible example code to Markdown suitable for use in code-oriented websites, such as StackOverflow.com or GitHub.
Much has been written about the virtues of, and need for, reproducible research. Another key need for reproducibility, one at which this package aims, is when posting questions about code or bug reports. Viewers of those posts need to know exactly what you did and exactly what resulted. The readme text on the package's GitHub home page gives a work flow description and some prescriptive advice, which I think is well worth reading.

I'm all for complete and cogent bug reports/code questions and reproducible research, but I was interested in reprex for another reason: formatting R code for blog posts (such as this one). To date I've been using a third party web site (the Pretty R syntax highlighter) to generate HTML from R code, and I've been quite happy with the results. A simpler process would be nice, though. Additional, while the aforementioned site works great with the code, I'm sometimes not sure how I should format the output.

So I decided to take prerex for a test drive using code from an old post here (Tabulating Prediction Intervals in R). I used just the code from the first part of the post (definition of the model.ctable() function and one invocation of it), a total of 17 lines of source code (including Roxygen comments for the function) leading to a single output table. Using RStudio, my work flow was as follows.
  1. Open a new script file and type/paste the code into it.
  2. Source the file to confirm it works as expected.
  3. Copy the code to the clipboard.
  4. In the console window, run the following two lines.
    This runs the code in the clipboard, so be careful not to do anything to modify the clipboard contents between the previous step and this one.
  5. Examine the results in the viewer pane (which automatically opens) to confirm that is as expected.
  6. Open a new R Markdown file, delete the boilerplate RStudio inserts, and paste the contents of the clipboard into it. Along with displaying results in the viewer, the reprex() function also places the R Markdown code for it in the clipboard. Again, be careful not to modify the clipboard contents between step 4 and this one.
  7. Click the "Knit HTML" button and provide a destination file for the HTML output. This opens an HTML source file in RStudio.
  8. Copy the contents of the body tag (excluding the opening and closing body tags and ignoring the pile of stuff in the header) and paste into an HTML document. (Depending on the width of the output, you might want to surround it with a scrolling DIV tag, or hack the CSS you just pasted in to make it scrollable and/or give it a border.)
For this post, I added the following properties to the CSS .main-container style defined by reprex:

  overflow: scroll;
  border-style: groove;
  border-width: 5px;
  padding: 10px;

That created a border and a bit of padding, and told the browser to add scroll bars if needed. Here is how my example turned out:

Summarize a fitted linear model, displaying both coefficient significance and confidence intervals.
@param model an instance of class lm @param level the confidence level (default 0.95)
@return a matrix combining the coefficient summary and confidence intervals
model.ctable <- function(model, level = 0.95) {
  cbind(summary(model)$coefficients, confint(model, level = level))
x <- rnorm(20)
y <- rnorm(20)
z <- 6 + 3 * x - 5 * y + rnorm(20)
m <- lm(z ~ x + y)
model.ctable(m, level = 0.9)
#>              Estimate Std. Error   t value     Pr(>|t|)       5 %
#> (Intercept)  6.271961  0.2462757  25.46724 5.584261e-15  5.843539
#> x            2.974000  0.2571237  11.56642 1.763158e-09  2.526706
#> y           -4.951286  0.3260552 -15.18542 2.547338e-11 -5.518494
#>                  95 %
#> (Intercept)  6.700384
#> x            3.421294
#> y           -4.384079

You can see the comments, the code and, at the end, the output (formatted as R comments). It's not perfect. In particular, it would be nice if the Roxygen comments looked like comments and not like text. There's also no syntax highlighting (which is to be expected in an R Markdown document). Still, it's not bad for a blog post, and it confirms the package works (and is easy to use).

I'll close by pointing out that I'm going "off label" by using the package this way. In particular, I'm getting no value from one of the prime virtues of R Markdown: the ability to embed code in a text document such that the code can be easily read but can also be executed by "compiling" the document (not true of an HTML document like this post). For posting code to a forum, though, this looks like a definite keeper.