Thursday, July 2, 2015

Tabulating Prediction Intervals in R

I just wrapped up (knock on wood!) a coding project using R and Shiny. (Shiny, while way cool, is incidental to this post.) It was a favor for a friend, something she intends to use teaching an online course. Two of the tasks, while fairly mundane, generated code that was just barely obscure enough to be possibly worth sharing. It's straight R code, so you need not use (or have installed) Shiny to use it.

The first task was to output, in tabular form, the coefficients of a linear regression model, along with their respective confidence intervals. The second task was to output, again in tabular form, the fitted values, confidence intervals and prediction intervals for the same model. Here is the function I wrote to do the first task (with Roxygen comments):

#'
#' Summarize a fitted linear model, displaying both coefficient significance
#' and confidence intervals.
#'
#' @param model an instance of class lm
#' @param level the confidence level (default 0.95)
#'
#' @return a matrix combining the coefficient summary and confidence intervals
#'
model.ctable <- function(model, level = 0.95) {
  cbind(summary(model)$coefficients, confint(model, level = level))
}

To demonstrate its operation, I'll generate a small sample with random data and run a linear regression on it.
x <- rnorm(20)
y <- rnorm(20)
z <- 6 + 3 * x - 5 * y + rnorm(20)
m <- lm(z ~ x + y)
I'll generate the coefficient table using confidence level 0.9, rather than the default 0.95, for the coefficients.
model.ctable(m, level = 0.9)
The output is as follows:

             Estimate Std. Error   t value     Pr(>|t|)       5 %      95 %
(Intercept)  6.039951  0.2285568  26.42648 3.022477e-15  5.642352  6.437550
x            3.615331  0.2532292  14.27691 6.763279e-11  3.174812  4.055850
y           -5.442428  0.3072587 -17.71285 2.156161e-12 -5.976937 -4.907918

The code for the second table (fits, confidence intervals and prediction intervals) is a bit longer:

#'
#' Compute a table of fitted values, confidence intervals and
#' prediction intervals from a regression model.
#'
#' @param model a fitted regression model
#' @param level the desired confidence level (default 0.95)
#' @param names the names to assign to the columns (after
#' resequencing if necessry)
#' @param order the order in which to list the columns
#' (1 = fitted, 2 = lower c.i. limit, 3 = upper c.i. limit,
#' 4 = lower p.i. limit, 5 = upper p.i. limit)
#'
#' @return a matrix with one row per observation and five
#' columns (fitted value, lower/upper c.i. bounds, lower/upper
#' p.i. bounds) in the order specified by the user
#'
intervals <- function(model,
                      level = 0.95,
                      names = c("Fitted", "CI Low", "CI High",
                                "PI Low", "PI High"),
                      order = c(4, 2, 1, 3, 5)) {
  # generate fits and confidence intervals
  temp <- predict(model,
                  interval = "confidence",
                  level = level)
  # generate fits and prediciton intervals (suppressing
  # the warning about predicting past values)
  temp2 <- suppressWarnings(
    predict(model,
            interval = "prediction",
            level = level)
  )
  # drop the redundant fit column
  temp2 <- temp2[,2:3]
  # merge the tables and reorder the columns
  temp <- cbind(temp, temp2)[, order]
  # rename the columns
  colnames(temp) <- names[order]
  temp
}

Here is the call with default arguments (using head() to limit the amount of output):
The output is this:
      PI Low     CI Low     Fitted   CI High   PI High
1 -0.7928115 0.65769280  1.5196870  2.381681  3.832185
2  7.9056270 9.40123642 10.1928094 10.984382 12.479992
3  4.9125024 6.61897662  7.1149000  7.610823  9.317298
4  7.3386447 8.66123993  9.7406923 10.820145 12.142740
5 -1.4295587 0.05464529  0.8637503  1.672855  3.157059
6  4.1962493 5.84893725  6.4156619  6.982387  8.635074
Finally, I'll run it again, changing the confidence level to 0.9, tweaking the column headings a bit, and reordering them:

head(intervals(m,
               level = 0.9,
               names = c("Fit", "CI_l", "CI_u", "PI_l", "PI_u"),
               order = 1:5
               )
     )

The output is:
         Fit      CI_l      CI_u       PI_l      PI_u
1  1.5196870 0.8089467  2.230427 -0.3870379  3.426412
2 10.1928094 9.5401335 10.845485  8.3069584 12.078660
3  7.1149000 6.7059962  7.523804  5.2989566  8.930843
4  9.7406923 8.8506512 10.630733  7.7601314 11.721253
5  0.8637503 0.1966188  1.530882 -1.0271523  2.754653
6  6.4156619 5.9483803  6.882943  4.5856891  8.245635
By the way, all syntax highlighting was Created by Pretty R at inside-R.org.