---
title: "slim: Singular Linear Models"
author: "Daniel Farewell"
date: "`r Sys.Date()`"
output: rmarkdown::pdf_document
vignette: >
  %\VignetteIndexEntry{slim: Singular Linear Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
references:
- id: farewell2010
  title: Marginal analyses of longitudinal data with an informative pattern of observations
  author:
  - family: Farewell
    given: D.M.
  container-title: Biometrika
  volume: 97
  page: 65-78
  issued:
    year: 2010
---

# Introduction

The `slim` package fits singular linear models to longitudinal data. In
particular, for given sets $\{x\}$ ($m \times p$ design matrices, where $m$ is
the number of observations for a particular subject), $\{V_{0}\}$ ($m \times m$
possibly *singular* covariance matrices), $\{V_{*}\}$ ($m \times m$
positive-definite covariance matrices), and $\{y\}$ (length $m$ vectors
containing the longitudinal data for each subject), `slim` computes
$\hat{\beta} = \lim_{\omega \to 0} \hat{\beta}(\omega)$ where
$\hat{\beta}(\omega)$ satisfies $$ \sum x^{\top} V(\omega)^{-1} \{y
- x \hat{\beta}(\omega)\} = 0 $$ and $V(\omega) = V_{*} \omega + V_{0} (1
- \omega)$.

For a few more details you can consult @farewell2010. But for now let's load up
the package and see how it works.

```{r}
library(slim)
```

You'll see that `data.table` is a required package. That's because the main
`slim` modelling function expects to receive the data in the form of a keyed
`data.table`, not just an ordinary `data.frame`. The dialysis data, included
with the package, has two keys: `id` (the individual or subject identifier) and
`month` (the time variable). 

```{r}
data(dialysis)
str(dialysis)
```

The `slim` function makes use of these two keys in a couple different ways.
First, by having these rather fundamental aspects of longitudinal data
specified within the data, we can avoid having to ask for them each time we
want to fit a model. Second, it ensures that the data are sorted in
a consistent way (by individual, then time), so that associated quantities
derived from calls to other modelling packages can be reliably re-aligned with
the right subject and time.

Let's look at the data.

```{r}
library(ggplot2)

print(p <- ggplot(dialysis, aes(x = month, y = renalfn, group = id)) +
        geom_line(alpha = 0.1) + facet_grid(~ group))
```

Most individuals have a generally downward trajectory, towards poorer renal
function. However, if we overlay the average renal function at each measurement
occasion, we see the opposite pattern:

```{r}
naive_means <- dialysis[, list(id = group, renalfn = mean(renalfn)),
        by = list(group, month)]

print(p <- p + geom_point(data = naive_means))
```

A strong possibility, then, is that the individuals with poorest renal function
are dropping out of the study earlier. This is the kind of effect `slim` models
are designed to adjust for.

```{r}
slim_basic_fit <- slim(renalfn ~ group + month, dialysis)

summary(slim_basic_fit)
```

Notice that the sign of the estimated time trend (the coefficient associated
with `month`) now matches the predominant individual pattern. Let's overlay
this simple fit on the data:

```{r, results = "hide"}
# identify one fully observed individual in each group
dialysis[, fully_observed := (length(month) == 5), by = id]
group_reps <- dialysis[(fully_observed), list(id = min(id)), by = group]  
setkey(group_reps, id) 

dialysis[, slim_basic := fitted(slim_basic_fit)]
```

```{r}
print(p <- p + geom_line(aes(y = slim_basic), data = dialysis[group_reps]))
```

The `slim` function uses two "working" covariance matrices for each individual:
the initial, positive definite, matrix $V_{*}$ and the limiting, possibly
singular, matrix $V_{0}$. The default value of $V_{0}$ is the singular matrix
of ones, but we can change this by passing a formula to the `limit` argument:

```{r, results = "hide"}
slim_intercept_slope_fit <- slim(renalfn ~ group + month, dialysis,
        limit = ~ 1 + month) 

dialysis[, slim_intercept_slope := fitted(slim_intercept_slope_fit)]
```

```{r}
print(p <- p + geom_line(aes(y = slim_intercept_slope),
                data = dialysis[group_reps], linetype = "dashed"))
```

For each individual, the matrix $V_{0}$ is constructed as the product of the
columns of the model matrix generated by the `limit` formula with the transpose
of the same model matrix. For example, for an individual with observations at
months 0, 3, and 18,

```{r}
small_example <- data.table(month = c(0, 3, 18))
z <- model.matrix(~ 1, small_example)
z %*% t(z)
z <- model.matrix(~ 1 + month, small_example)
z %*% t(z)
```

There are several ways to specify $V_{*}$. The default (`covariance
= "randomwalk"`) is to use the covariance matrix of a random walk with
unit-variance increments, that is $\mathrm{cov}(y_{j}, y_{k}) = \min(j, k)$
, but but other options include the identity matrix (`covariance
= "identity"`), the covariance of Brownian motion ($\mathrm{cov}(y_{j}, y_{k})
= \min(t_{j}, t_{k})$, `covariance = "brownian"`) or the symmetric Pascal
matrix given by $$\mathrm{cov}(y_{j}, y_{k}) = \left(\begin{array}{c}j + k - 2 \\
j - 1\end{array}\right)$$and specified by `covariance = "pascal"`. All these
matrices are employed for their *structure* and make no reference to the actual
variance of the data. When using such working covariance structures, therefore,
it is important to use empirical standard errors to obtain reasonable estimates
of parameter uncertainty:

```{r}
vcov(slim_basic_fit)
vcov(slim_basic_fit, empirical = FALSE) # a bad idea for unmodelled covariances!
summary(slim_basic_fit)
summary(slim_basic_fit, empirical = FALSE) # still a bad idea
```

Use of `empirical = FALSE` is intended for cases when the covariance $V_{*}$
has itself been modelled from the same data. This can be accomplished by
passing another model fit to the `covariance` argument. Methods exist to handle
fits from `lmer` (package `lme4`, class `lmerMod`) and `jmcm` (package `jmcm`,
class `jmcmMod`). For example, using a mixed effects model fit by `lmer` with
a random intercept and slope, we can pass the estimated covariance matrices to
`slim` via the `covariance` argument: 

```{r}
library(lme4)

lmer_fit <- lmer(renalfn ~ group + month + (1 + month | id), dialysis)
slim_lmer_fit <- slim(renalfn ~ group + month, dialysis, covariance = lmer_fit)

summary(slim_lmer_fit)
summary(slim_lmer_fit, empirical = FALSE) # this now makes sense
```

The `jmcm` function uses a more comprehensive formula object to specify the
response, subject identifier, time variable and mean and covariance structures.
These models provide even more flexible ways to estimate covariances:

```{r}
library(jmcm)

jmcm_fit <- jmcm(renalfn | id | month ~ group | 1, data = dialysis,
        triple = rep(2L, 3), cov.method = "mcd")
slim_jmcm_fit <- slim(renalfn ~ group + month, dialysis, covariance = jmcm_fit)

summary(slim_jmcm_fit)
summary(slim_jmcm_fit, empirical = FALSE)
```
We can add these fits to our plots:

```{r, results = "hide"}
dialysis[, slim_lmer := fitted(slim_lmer_fit)]
dialysis[, slim_jmcm := fitted(slim_jmcm_fit)]
```

```{r}
print(p <- p + geom_line(aes(y = slim_lmer), data = dialysis[group_reps],
                colour = "darkblue") +
        geom_line(aes(y = slim_jmcm), data = dialysis[group_reps],
                colour = "darkred"))
```

There is reasonable consistency across all these `slim` model fits. It is
possible to demonstrate sensitivity to choice of covariance using more extreme
structures (the Pascal matrix is very extreme):

```{r, results = "hide"}
slim_pascal_fit <- slim(renalfn ~ group + month, dialysis,
        covariance = "pascal")

dialysis[, slim_pascal := fitted(slim_pascal_fit)]
```

```{r}
print(p <- p + geom_line(aes(y = slim_pascal), data = dialysis[group_reps],
                colour = "cyan"))
```
Notice the greatly increased standard error associated with the time trend in this fit:
```{r}
extract_se <- function(fit) sqrt(diag(vcov(fit)))
sapply(list(se_basic = slim_basic_fit, se_pascal = slim_pascal_fit), extract_se)
```

This sensitivity is indicative of the bias-variance trade-off associated with
choice of covariance structure. Roughly, the Pascal structure allows the
timings of observations to depend on a random intercept, random slope, random
curvature and so on to higher derivatives as far as the number of observations
on an individual allow. This consistency across a wider class of models comes
at the price of greatly increased variance, although observe that the
'baseline' parameters of the three groups are basically unaffected, relating as
they do to the time when all individuals remain in the study.

We end on this cautionary note: it does matter what covariance matrices you
choose! Our (limited) experience suggests that random walk or well-modelled
covariance structures lead to "sensible" answers, but we would greatly value
hearing your insights or experience, too. 

# References