---
title: "Quick Usage Guide to the 'fastglm' Package"
author: "Jared Huling"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Quick Usage Guide to the 'fastglm' Package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  echo = TRUE,
  eval = TRUE
)
```

The *fastglm* package is intended to be a **fast** *and* **stable** alternative to the `glm()` and `glm2::glm2()` functions for fitting generalized linear models. *fastglm* is compatible with `R`'s `family` objects (see `?family`). The package can be installed via

```{r, eval = FALSE, echo=TRUE}
pak::pak("fastglm")
# pak::pak("jaredhuling/fastglm") #development version
```

and loaded via:

```{r, echo=TRUE}
library(fastglm)
```

## Example

The main package function is `fastglm()`. Currently, `fastglm()` does not allow for formula-based data input and is restricted to matrices. We use the following example to demonstrate the usage of `fastglm()`:

```{r, echo = TRUE}
data(esoph)
x <- model.matrix(~ agegp + unclass(tobgp) + unclass(alcgp),
                  data = esoph)

y <- cbind(esoph$ncases, esoph$ncontrols)

gfit1 <- fastglm(x = x, y = y, family = binomial(link = "cloglog"))

summary(gfit1)
```

Alternatively, to use the built-in function of `glm()` for processing formulas and data frames, we can use `glm()` with `method = fastglm_fit` to call the *fastglm* fitting functionality:

```{r}
gfit2 <- glm(cbind(ncases, ncontrols) ~ agegp + unclass(tobgp) +
                 unclass(alcgp), data = esoph,
             family = binomial(link = "cloglog"),
             method = fastglm_fit)

summary(gfit2, refit = FALSE)
```

Setting `refit = FALSE` in `summary()` uses the standard errors computed by the internal fitting functions of *fastglm*. The default, `refit = TRUE`, re-fits the model using `glm()` with `method = "glm.fit"` and one IRLS iteration with the coefficients computed previously as the starting values.

## Computational stability 

*fastglm* does not compromise computational stability for speed. In fact, for many situations where `glm()` and even `glm2::glm2()` do not converge, `fastglm()` does converge.

As an example, consider the following data scenario, where the response distribution is (mildly) misspecified, but the link function is quite badly misspecified. In such scenarios, the standard IRLS algorithm tends to have convergence issues. The *glm2* package was designed to handle such cases, however, it still can have convergence issues. *fastglm* uses a similar step-halving technique as `glm2::glm2()`, but it starts at better initialized values and thus tends to have better convergence properties in practice. 

```{r, fig.show='hold'}
set.seed(1)
x <- matrix(rnorm(10000 * 100), ncol = 100)
y <- exp(0.25 * x[,1] - 0.25 * x[,3] + 0.5 * x[,4] - 0.5 * x[,5] + rnorm(10000)) + 0.1


# fastglm
system.time(gfit1 <- glm(y ~ x, family = Gamma(link = "sqrt"),
                         method = fastglm_fit))

# glm
system.time(gfit2 <- glm(y ~ x, family = Gamma(link = "sqrt")))

# glm2
system.time(gfit3 <- glm(y ~ x, family = Gamma(link = "sqrt"),
                         method = glm2::glm.fit2))

## Note that fastglm() returns estimates with the
## largest likelihood
logLik(gfit1)
logLik(gfit2)
logLik(gfit3)

coef(gfit1)[1:5]
coef(gfit2)[1:5]
coef(gfit3)[1:5]

## check convergence of fastglm
gfit1$converged
## number of IRLS iterations
gfit1$iter

## now check convergence for glm()
gfit2$converged
gfit2$iter

## check convergence for glm2()
gfit3$converged
gfit3$iter


## increasing number of IRLS iterations for glm() does not help that much
system.time(gfit2 <- glm(y ~ x, family = Gamma(link = "sqrt"),
                         control = list(maxit = 100)))

gfit2$converged
gfit2$iter

logLik(gfit1)
logLik(gfit2)

```