---
title: Mixed models with ciTools
author: Matthew Avery
date: August 17, 2017
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Mixed models with ciTools}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, include = F}
library(dplyr)
library(ggplot2)
library(ciTools)
library(lme4)
set.seed(20170627)

x_gen_mermod <- function(ng = 8, nw = 5){
  n <- ng * nw
  x1 <- base::sample(letters[1:2], n, replace = T)
  x2 <- runif(n)
  group <- rep(as.character(1:ng), each = nw)
  return(data.frame(x1 = x1,
                    x2 = x2,
                    group = group))
}

mm_pipe <- function(df, ...){
  model.matrix(data = df, ...)
}

get_validation_set <- function(df, sigma, sigmaG, beta, includeRanef, groupIntercepts){
  vm <- sample_n(df, 5, replace = F)[rep(1:5, each = 100), ]
  vf <- bind_rows(vm, df) %>%
    select(-group) %>%
    mm_pipe(~.*.)
  vf <- vf[1:500, ]
  vGroups <- if(!includeRanef) rnorm(500, 0, sigmaG) else groupIntercepts[as.numeric(vm$group)]
  vm[["y"]] <- vf %*% beta + vGroups + rnorm(500, mean = 0, sd = sigma)
  vm
}

y_gen_mermod <- function(df, sigma = 1, sigmaG = 1, delta = 1, includeRanef = FALSE, validationPoints = FALSE){
  groupIntercepts <- rnorm(length(unique(df$group)), 0, sigmaG)
  tf <- df %>%
    dplyr::select(-group) %>%
    mm_pipe(~.*.)
  beta <- rep(delta, ncol(tf))
  if(validationPoints)  {
    vm <- get_validation_set(df, sigma, sigmaG, beta, includeRanef, groupIntercepts)
  }
  df[["y"]] <- tf %*% beta + groupIntercepts[as.numeric(df$group)] + rnorm(nrow(df), mean = 0, sd = sigma)
  df[["truth"]] <- tf %*% beta + groupIntercepts[as.numeric(df$group)] * (includeRanef)
  if(validationPoints) return(list(df = df, vm = vm)) else return(df)
}

```

## Introduction

Obtaining uncertainty estimates for prediction generated by mixed
models (such as those fit by `lme4`) in **R** has been a challenge for
users for many years. Unlike `lm` objections, there are no easy ways
to get confidence intervals for predictions based on `merMod` objects,
let alone prediction intervals. Options that did exist were based on
simulation or bootstrapping and thus require considerable computation
time. This tutorial describes the approach used to derive parametric
intervals for random intercept models, how to generate them quickly and
easily using `ciTools`, and provides discussion about how to
appropriately interpret and use these intervals. A brief discussion of
simulation results is provided at the end to show some properties of
these intervals, including comparisons to bootstrap methods.

## Writing down our model

First, we'll start by writing down our model and defining our terms so
that we're all on the same page. To an extent, I'll follow the
notation used by the author of `lme4` (Doug Bates) in his 2010 book on
the package.

$$\textbf{Y}|\textbf{G} = \textbf{g} \sim N(X\beta + Z\textbf{g}, \sigma^2I)$$
$$\textbf{G} \sim N(0, \sigma_G^2I)$$

In the above, $\textbf{Y}$ is our response vector for a particular
group, $X$ is the design matrix containing fixed effects, $\beta$ is
the vector of fixed effects parameters, $Z$ is the random effects
matrix, $\textbf{G}$ is the vector of random effects with realizations
$\textbf{G}$, $\sigma^2$ is the within-group variance, and
$\sigma_G^2$ is the between-group variance, and $I$ is the identity
matrix.

When considering uncertainty estimates for fitted values from this
model, the following may be important sources of uncertainty:

1. $\sigma^2$
2. $SE(\hat{\beta})$
3. $\sigma_G^2$
4. $SE(\hat{g})$

Some of these are easier to estimate than others, and all of these can
drive uncertainty when estimating both confidence intervals and
prediction intervals in mixed models. We'll refer to these sources of
uncertainty below by the numbers associated with them in the above
list.

## Types of intervals

### Conditional vs. unconditional confidence intervals

`ciTools` offers two approaches for estimating confidence and
prediction intervals for mixed models, and they differ based on
whether we want to condition on the random effects or not. We'll look
at the difference between conditional and unconditional intervals in
terms of confidence intervals first. The "conditional" confidence
interval is an interval for the expected value of an observation made
under a particular set of covariate values *conditional* on that
observation coming from a known group. In other words, this is a
confidence interval for $E(Y_{ij}|G_j = g_j)$ for a particular value
of $g_j$. (Note that while it would be more precise to state these
expectations conditionally on a vector of covariates, $x_{ij}$ as well
as $g_j$, we omit the covariate vector from the expression for
simplicity.)  When we estimate this conditional expectation, we have

$$E(Y_{ij}|G_j=g_j) = E(X_{ij}\beta) + E(g_j) + E(\epsilon_{ij}) = X_{ij}\beta + g_j$$

The standard error for our model estaimte is thus $SE(\hat{Y}_{ij}) =
SE(X_{ij}\hat{\beta}) + SE(\hat{g}_j)$. Thus, a confidence interval
for the group average will have a width determined by 2 and 4 from the
above list.

Alternatively, we can consider an "unconditional" confidence interval
for $E(Y)$. In this case, we are interested in the global average
without conditioning on a particular group. Since $E(G_j) = 0$, we
have $\hat{Y}_{ij} = X_{ij}\hat{\beta}$, and $SE(\hat{Y}_{ij}) =
SE(X_{ij}\hat{\beta})$, which only includes 2 from the above list.

### Prediction intervals

Similarly, we can consider both conditional and unconditional
prediction intervals. A $1 - \alpha$-level prediction interval for the
$i$ observation from the $j$th group can be defined thus:

$$(l, u) s.t. P(Y_{ij} \in (l, u)|G_j = g_j) = 1 - \alpha$$

We can re-write this expression using the CDF for a Normal
distribution,

$$(l, u) s.t. \Phi(\frac{u-E(Y_{ij}|G_j = g_j)}{\sigma}) - \Phi(\frac{l-E(Y_{ij}|G_j = g_j)}{\sigma}) = 1- \alpha.$$

Plugging in our estimated model parameters, we can find $l$ and $u$ by
solving

$$\Phi(\frac{\hat{u}- (X_{ij}\hat{\beta} + \hat{g_j})}{\hat{\sigma}}) = 1-\frac{\alpha}{2}.$$

and

$$\Phi(\frac{\hat{l}- (X_{ij}\hat{\beta} + \hat{g_j})}{\hat{\sigma}}) = \frac{\alpha}{2}.$$

In this framework, 1, 2, and 4 are relevant sources of
variance. Alternatively, consider the unconditional prediction
interval:

$$(l, u) s.t. P(Y \in (l, u)) = 1 - \alpha.$$

Here, we are instead solving

$$\Phi(\frac{\hat{u}- X_{ij}\hat{\beta}}{\hat{\sigma}_C}) = 1-\frac{\alpha}{2}$$

and

$$\Phi(\frac{\hat{l}- X_{ij}\hat{\beta}}{\hat{\sigma}_C}) = \frac{\alpha}{2},$$

where $\hat{\sigma}_C = \sqrt{\hat{\sigma}^2 + \hat{\sigma}^2_G}$. For
this unconditional prediction interval, 1, 2, and 3 determine our
interval's width.

## Interval demonstration

### Data

We'll look at each of these intervals and compare their uses below. To
do that, we need a data set. The code below generates a data set
consisting of $n_{groups} = 10$ groups with $n_{size} = 20$
observations per group. The data set includes a two-level continous
covariate and a continous covariate. The function `x_gen_mermod`
generates the X-matrix, and `y_gen_mermod` takes this matrix and
simualtes responses for given values of $\sigma$, $\sigma_G$, and
$\Delta$, where $\Delta$ is the effect size of the factors. All of
these values are set to a default of 1.


```{r}
set.seed(20170812)
df <- x_gen_mermod(10, 20) %>%
  y_gen_mermod()
```

Using this data set, we can fit a model using `lme4::lmer` to create
an `lmerMod` object, which we will use for generating confidence and
prediction intervals.


```{r}
fit2 <- lmer(y ~ x1 * x2 + (1|group) , data = df)
fit2
```


### Confidence intervals

The first interval we'll look at is the conditional confidence
interval. This is generated using `add_ci` and using the option
`includeRanef = TRUE`. This option specifies that we want a
conditional interval rather than unconditional. The default is
`includeRanef = FALSE`, which will produce an unconditional
interval. Recall from above that the conditional interval's width is
driven by both $SE(\hat{\beta})$ and $SE(\hat{g})$.



```{r}
df %>% add_ci(fit2, type = "parametric", includeRanef = T) %>% head()
```

Plotting these results lets us visualize the uncertainty around our
expected values. The figure below shows the parametric conditional
confidence intervals described above for the first three groups from
our example data set:


```{r, fig.width = 8, fig.heither = 5}
df %>%
  filter(group %in% c(1:3)) %>%
  add_ci(fit2, type = "parametric", includeRanef = T, names = c("LCB", "UCB")) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = LCB, ymax = UCB), colour = "black", fill = "red1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "red1", size = 2) +
  facet_grid(x1 ~ group)
```



For comparison, we consider bootstrap confidence intervals obtained by
simulating from the fitted model using `lme4::bootMer`. For this
function, the `re.form = ` option is used to fit conditional (`re.form
= NULL`) or unconditional (`re.form = NA`) intervals. (See
`?lme4::bootMer` for further information.) In `ciTools`, these
intervals can be obtained by specifying `type = boot`.


```{r}
set.seed(20170628)
df %>%
  add_ci(fit2, includeRanef = T, type = "parametric", names = c("Lpar", "Upar")) %>%
  add_ci(fit2, includeRanef = T, type = "boot", names = c("Lboot", "Uboot")) %>% head()
```

We can plot our intervals side by side for comparison. The plot below
shows both sets of conditional confidence intervals, with the
intervals from `simulate.merMod` shown in red and the invervals from
`add_ci` shown in blue.


```{r, fig.width = 8, fig.heither = 5}
set.seed(20170628)
df %>%
  filter(group %in% c(1:3)) %>%
  add_ci(fit2, includeRanef = T, type = "parametric", names = c("Lpar", "Upar")) %>%
  add_ci(fit2, includeRanef = T, type = "boot", names = c("Lboot", "Uboot")) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = Lpar, ymax = Upar), colour = "black", fill = "blue1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "blue1", size = 2) +
  geom_ribbon(aes(ymin = Lboot, ymax = Uboot), colour = "black", fill = "red1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "red1", size = 2) +
  facet_grid(x1 ~ group)
```


Visually, these intervals look similar, with the parametric interval
being wider than the bootstrap interval everywhere. The narrower width
for the bootstrap approach comes at the expense of coverage
probability, though as the number of groups and size of groups
increase, these differences move towards zero. A later section will
explore simulation results comparing these two approaches.

We can see that the results for unconditional intervals are also
similar between the parametric and bootstrap methods. Unconditional
intervals are generated `add_ci` by using the `includeRanef = F`
option. Note that this is the default setting for `includeRanef`.



```{r}
df %>% add_ci(fit2, includeRanef = F, type = "parametric") %>% head()
```


To illustrate the differences fitting the unconditional intervals vice
the conditional intervals, the chart below shows the parametric
unconditional intervals for a single group from our data set in blue
while the conditional intervals as plotten in red.


```{r, fig.width = 8, fig.heither = 5}
set.seed(20170628)
df %>%
  filter(group %in% c(1:3)) %>%
  add_ci(fit2, includeRanef = F, type = "parametric", names = c("LU", "UU")) %>%
  add_ci(fit2, includeRanef = T, type = "parametric", names = c("LC", "UC")) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = LU, ymax = UU), colour = "black", fill = "blue1", alpha = 0.2) +
  geom_ribbon(aes(ymin = LC, ymax = UC), colour = "black", fill = "red1", alpha = 0.2) +
  facet_grid(x1 ~ group)
```


The difference is readily apparent. For each group, the red intervals
are shifted along the y-axis by the estimated group mean. Also note
that the conditional intervals (red) are slightly wider than the
unconditional intervals. This is due to the inclusion of the standard
error for the group mean, which is unnecessary when estimating the
unconditional group mean.

It's also worth noting that the unconditional intervals for parametric
and bootstrap methods are virtually identical, unlike the conditional
intervals:


```{r, fig.width = 8, fig.heither = 5}
set.seed(20170628)
df %>%
  filter(group %in% c(1:3)) %>%
  add_ci(fit2, includeRanef = F, type = "parametric", names = c("Lpar", "Upar")) %>%
  add_ci(fit2, includeRanef = F, type = "boot", names = c("Lboot", "Uboot")) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = Lpar, ymax = Upar), colour = "black", fill = "blue1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "blue1", size = 2) +
  geom_ribbon(aes(ymin = Lboot, ymax = Uboot), colour = "black", fill = "red1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "red1", size = 2) +
  facet_grid(x1 ~ group)
```


Taking a step back, it's not clear what exactly these interval
estimates are useful for. The conditional estimates reflect fitted
values for data that we do not expect to observe again. When we choose
to treat a factor as random, it is because we are interested more in
the magnitude of its variability rather than the specific values it
takes within our sample. Therefore, the conditional mean estimates and
their associated confidence intervals are not necessarily important
One thing the conditional means may be useful for is visually
demonstrating model fit. While not very interesting for future
prediction, we can show visually the degree of model fit achieved by
plotting the fitted values against these conditional mean estimates:



```{r, fig.width = 8, fig.heither = 5}
df %>%
  add_ci(fit2, type = "parametric", includeRanef = T, names = c("LCB", "UCB")) %>%
  filter(group %in% c(1:3)) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = LCB, ymax = UCB), colour = "black", fill = "red1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "red1", size = 2) +
  geom_point(aes(y = y), size = 2) +
  facet_grid(x1 ~ group)
```

Note that each "group" should be plotted seperately, even if multiple
groups were observed under the same set of conditions. This is because
the conditional interval estimates are conditional on a *single*
group. The plot above shows data from groups 1, 2, and 3 by include
`group` as a facet in `facet_grid`.

By way of contrast, attempting to plot fitted values against
*unconditional* mean estimates could end up looking rather bad:



```{r, fig.width = 8, fig.heither = 5}
df %>%
  add_ci(fit2, type = "parametric", includeRanef = F, names = c("LCB", "UCB")) %>%
  filter(group %in% c(1:3)) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = LCB, ymax = UCB), colour = "black", fill = "red1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "red1", size = 2) +
  geom_point(aes(y = y), size = 2) +
  facet_grid(x1 ~ group)
```

The fact that these estimates fit poorly to the data is not a
reflection of poor model fit but rather the construction of the
unconditional intervals. These intervals are estimates for the
expected value of each observation, unconditional on the group,
explaining their divergence from the observed data. This is aside from
the point that confidence intervals, such as those in the above plot,
are measures of uncertainty about the estimate of the *mean* rather
than measures of variance of the observed data around the mean.

A better way to use the unconditional confidence intervals would be to
compare the response variable against some threshold value. If our
response variable was some measure of performance, we may be
interested in the conditions in which average system performance was
better than some threshold value. The unconditional confidence
intervals are useful when making such comparisons:



```{r, fig.width = 6, fig.heither = 4}
df %>%
  add_ci(fit2, type = "parametric", includeRanef = F, names = c("LCB", "UCB")) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = LCB, ymax = UCB), colour = "black", fill = "red1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "blue", size = 2) +
  facet_grid( ~ x1) +
  geom_hline(yintercept = 1.5, colour = "red1", size = 2, linetype = 2, alpha = .5)
```


From this plot, we can see the factor combinations where average
system performance is better than the threshold value as well as those
where it may not be.

It is inadviseable to include observed data on these plots, since the
confidence bounds are estimates of the average and do not reflect
either within-group or between-group variability.

### Prediction Intervals

Next, we will consider the conditional prediction interval. In
addition to the parametric intervals described above an alternative
approach is available using the `type = "boot"` option, which will
estimate a prediction interval by simulating data from the fitted
model using `simulate.merMod`.

Graphically, the prediction intervals should look like the conditional
confidence intervals only wider:


```{r}
df %>% add_pi(fit2, includeRanef = T, type = "parametric") %>% head()
```

```{r, fig.width = 8, fig.heither = 5}
df %>%
  filter(group %in% c(1:3)) %>%
  add_ci(fit2, type = "parametric", includeRanef = T, names = c("LCB", "UCB")) %>%
  add_pi(fit2, type = "parametric", includeRanef = T, names = c("LPB", "UPB")) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = LCB, ymax = UCB), colour = "black", fill = "red1", alpha = 0.2) +
  geom_ribbon(aes(ymin = LPB, ymax = UPB), colour = "black", fill = "blue1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "red1", size = 2) +
  facet_grid(x1 ~ group)
```


Like the conditional confidence intervals, the best application for
the conditional prediction intervals is showing model fit:


As above, each group should be plotted seperately, since these
intervals do not account for between-group variance. Since we are now
plotting prediction intervals, there should be no reservations about
including the observed data on this plot.

Finally, the unconditional prediction interval:


```{r}
df %>% add_pi(fit2, type = "parametric",includeRanef = F) %>% head()
```

In this case, there are a number of applications. Rather than
comparing a mean value to a threshold, we are more likely to be
interested in getting a visual indication of the likelihood that a
single observation can meet or exceed a threshold:


```{r, fig.width = 6, fig.heither = 4}
df %>%
  add_pi(fit2, type = "parametric", includeRanef = F, names = c("LPB", "UPB")) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = LPB, ymax = UPB), colour = "black", fill = "blue1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "red1", size = 2) +
  geom_point(aes(y = y), size = 2) +
  facet_grid( ~x1 )+
  geom_hline(yintercept = 0, colour = "red1", size = 2, linetype = 2)
```


In this case, there is no reason to include the groups as a
facet. Since our estimates are not conditional on group, the estimates
would be the same for each group. Therefore, we plot the full data set
conditional on only the two factors, `x1` and `x2`. Note that plotting
fitted values may make sense in this case, as the estimated
uncertainty bounds include both the between-group and within-group
variability. This means that the intervals should contain the
advertised percentage of the fitted values.

### Two intervals at once!

Finally, one potentially interesting visual could be to plot an
overall prediction interval along with a single conditional confidence
interval. This single plot would show both the overall variation for
individual observations while also showing how a single group mean
could drive variation:



```{r, fig.width = 8, fig.heither = 5}
df %>%
  filter(group %in% c(1:3)) %>%
  add_ci(fit2, type = "parametric", includeRanef = T, names = c("LCB", "UCB")) %>%
  add_pi(fit2, type = "parametric", includeRanef = F, names = c("LPB", "UPB")) %>%
  ggplot(aes(x = x2)) +
  geom_ribbon(aes(ymin = LCB, ymax = UCB), colour = "black", fill = "red1", alpha = 0.2) +
  geom_ribbon(aes(ymin = LPB, ymax = UPB), colour = "black", fill = "blue1", alpha = 0.2) +
  geom_line(aes(y = pred), colour = "red1", size = 2) +
  geom_point(aes(y = y), size = 2) +
  facet_grid(x1 ~ group)
```


This plot shows the unconditional prediction interval in blue along
with confidence intervals conditional on group in red. The shifts in
means between groups 1, 2, and 3 is clearly illustrated, with gorup 3
in particular having a large change in intercept due to group.

### `add_probs` and `add_quantile`

The other two primary functions in `ciTools` also have methods
available for `merMod` objects. Mathematically, they follow the same
arguemnts as the probability interval descritions given above, which
we won't repeat here. They follow the same conventions for
conditioning (or not) on the random effects, and like the intervals
discussed above, the choice of whether to condition or not should be
made carefully and should depend on the intended use case. For
population-level inference, unconditional probabilities and quantiles
should be used, and these seem like the most common use cases.

## Simulation

To compare coverage probability and interval widths for intervals
generated using the parametric and bootstrap methods described above,
1,000 data sets were generated for each combination of group size
($n_{size} = 5, 10, 20, 50$) and number of groups ($n_{groups} = 5, 10
,20, 50$). Both methods were used to generate 80 percent confidence
intervals for each data set. Random points within the prediction space
were selected for each simulated data set to compare to the intervals
for determining coverage probabilities.

### Unconditional confidence intervals

First, we consider the coverage probability of the unconditional
parametric and bootstrap intervals. The blue line represnts the
parametric intervals and hte red line the bootstrap intervals from
`ciTools`. The x-axis denotes group size and the facet the number of
groups in the data set.

```{r, include = F}
pm <- read.csv("lmer_unconditional_coverage.csv")
wm <- read.csv("lmer_unconditional_width.csv")
```

```{r, echo = F, fig.width = 8, fig.heither = 4}
pm %>%
  ggplot(aes(x = groupSize, y = CoverageProb, colour = Method)) +
  geom_line(size = 2) + facet_grid(~numberOfGroups) +
  scale_y_continuous(breaks = seq(0, 1, length.out = 6), limits = c(0, 1)) +
  scale_x_log10(breaks = c(5, 10, 20, 50)) +
  ggtitle("Coverage probabilities for unconditional confidence intervals") +
  geom_hline(yintercept = 0.8)

```

We can see that these two approaches produce very similar results,
with the parametric interval producing slightly higher coverage
probabilities but both methods approaching their nominal level as the
data set increases in size.

```{r, echo = F, fig.width = 8, fig.heither = 4}
pm %>%
  ggplot(aes(x = groupSize, y = CoverageProb, colour = Method)) +
  geom_line(size = 2) + facet_grid(~numberOfGroups) +
  scale_y_continuous(breaks = seq(0, 1, length.out = 6), limits = c(0, 1)) +
  scale_x_log10(breaks = c(5, 10, 20, 50)) +
  ggtitle("Coverage probabilities for unconditional confidence intervals") +
  geom_hline(yintercept = 0.8)

```

Next we can consider the interval width. Once again, the two methods
produce similar results. These results indicate that the unconditional
partametric and bootstrap approaches perform similarly. The bootstrap
is likely more robust to model misspecification (though this
simulation did not explore this) but also takes considerably longer,
particularly if you need to perform prediction across the complete
design space. The parametric approach on the other hand takes
virtually no computation time.

### Conditional confidence intervals

```{r, include = F}
pm <- read.csv("lmer_conditional_coverage.csv")
wm <- read.csv("lmer_conditional_width.csv")
```

Unlike the unconditional intervals, the characteristics of conditional
confidence intervals is not similar for bootstrap and parametric
approaches. The first plot below reports average coverage probability
on the y-axis, with the x-axis showing group size. The facet shows
number of groups.

```{r, echo = F, fig.width = 8, fig.heither = 4}
pm %>%
  ggplot(aes(x = groupSize, y = CoverageProb, colour = Method)) +
  geom_line(size = 2) + facet_grid(~numberOfGroups) +
  scale_y_continuous(breaks = seq(0, 1, length.out = 6), limits = c(0, 1)) +
  scale_x_log10(breaks = c(5, 10, 20, 50)) +
  ggtitle("Coverage probabilities for unconditional confidence intervals") +
  geom_hline(yintercept = 0.8)
```

The bootstrap method shows coverage probabilities below the nominal
alpha level. As the number of groups increases, the coverage
probability converges towards 80 percent, though increasing the group
size does not appear to matter for coverage probability. In contrast,
the parametric method shows coverage probabilities that exceed the
nominal level. As group size increases, coverage probability coverages
towards the nominal level, but when the number of groups increases,
coverage probability increases towards 1. The former is an expected
result of the standard errors for the $X\beta$ and the group means
decreasing when the group sizes increase. The increase in coverage
probability when group size increases is less easily
explained. Particularly for cases when there are a large number of
small groups, the parametric method produces extremely conservative
interval estimates.


```{r, echo = F, fig.width = 8, fig.heither = 4}
wm %>%
  ggplot(aes(x = groupSize, y = IntWidth, colour = Method)) +
  scale_x_log10(breaks = c(5, 10, 20, 50)) +
  ggtitle("Interval widths for conditional confidence intervals") +
  geom_line(size = 2) + facet_grid(~numberOfGroups)
```

These trends are also observable when we consider interval widths. The
parametric intervals are substantially wider than the bootstrap
intervals, though widths converge when the number of groups and group
sizes increase.

In general, the more conservative parametric interval is recommended
when there are few groups or at least the group size is large relative
to the number of groups. When the group size is small relative to the
number of groups, the parametric method available in `ciTools` is
overly-conservative, and the bootstrap approach will provide more
accurate coverage probabilities, although these will sitll not reach
the nominal level.

### Unconditional prediction intervals

The approach for validating prediction intervals was slightly
different than the approach for confidence intervals. The same sample
sizes were used, but for determining coverage probability, one hundred
test points were generated at five points in the design space for each
randomly generated data set. Coverage probability was determined by
calculating the proportion of these test points that were within the
estimated prediction interval for both parametric and bootstrap
methods.

Unconditional prediction intervals showed similar performance to
unconditional confidence intervals in simulation.

```{r, echo = F, fig.width = 8, fig.heither = 4}
pm %>%
  ggplot(aes(x = groupSize, y = CoverageProb, colour = Method)) +
  geom_line(size = 2) + facet_grid(~numberOfGroups) +
  scale_y_continuous(breaks = seq(0, 1, length.out = 6), limits = c(0, 1)) +
  scale_x_log10(breaks = c(5, 10, 20, 50)) +
  ggtitle("Coverage probabilities for unconditional prediction intervals") +
  geom_hline(yintercept = 0.8)

```

As with the confidence intervals, coverage probability was slightly
higher for the parametric method than the bootstrap, with the
parametric method hewing closer to the nominal level, and the
bootstrap interval falling just short. Neither interval's coverage
probabiltiy appears to be effected by the number of groups in the data
set, though for small groups sizes, the parametric interval appears
too wide for the nominal level.

```{r, echo = F, fig.width = 8, fig.heither = 4}
wm %>%
  ggplot(aes(x = groupSize, y = IntWidth, colour = Method)) +
  scale_x_log10(breaks = c(5, 10, 20, 50)) +
  ggtitle("Interval widths for unconditional prediction intervals") +
  geom_line(size = 2) + facet_grid(~numberOfGroups)
```

As expected, the parametric intervals are wider than the bootstrap
intervals. While the parametric interval width decreases with both the
group size and number of groups, the bootstrap interval width
increases as group size increases.

### Conditional prediction intervals

```{r, include = F}
pm <- read.csv("lmer_conditional_pi_coverage.csv")
wm <- read.csv("lmer_conditional_pi_width.csv")
```

The performance of the bootstrap and parametric approaches is similar
for conditional prediction intervals as it was for unconditional
prediction intervals.

```{r, echo = F, fig.width = 8, fig.heither = 4}
pm %>%
  ggplot(aes(x = groupSize, y = CoverageProb, colour = Method)) +
  geom_line(size = 2) + facet_grid(~numberOfGroups) +
  scale_y_continuous(breaks = seq(0, 1, length.out = 6), limits = c(0, 1)) +
  scale_x_log10(breaks = c(5, 10, 20, 50)) +
  ggtitle("Coverage probabilities for conditional prediction intervals") +
  geom_hline(yintercept = 0.8)
```

Coverage probability for both intervals is near the nominal level,
with the parametric approach producing wider intervals that are closer
(though above) the nominal level. The bootstrap approach failes to
reach the nominal coverage probability even for large sample sizes.

```{r, echo = F, fig.width = 8, fig.heither = 4}
wm %>%
  ggplot(aes(x = groupSize, y = IntWidth, colour = Method)) +
  scale_x_log10(breaks = c(5, 10, 20, 50)) +
  ggtitle("Interval widths for conditional prediction intervals") +
  geom_line(size = 2) +
  facet_grid(~numberOfGroups)
```

Interval widths also show a similar pattern to the one shown for
unconditional prediction intervals, with the parametric interval's
width converging towards the bootstrap interval's width as group size
and number of groups increases.
