---
title: "Dynamic failure rate model"
output:
    rmarkdown::html_vignette:
        toc: true
vignette: >
  %\VignetteIndexEntry{Dynamic failure rate model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>")
```

`algebraic.mle` provides an algebra over maximum likelihood estimators (MLEs).
Under regularity conditions, any MLE is asymptotically normal with variance
given by the inverse Fisher information. The package exploits this structure
with operations for composition (`joint`, `combine`), transformation (`rmap`),
and bridging to distribution algebra (`as_dist`).

## Installation

```r
# From CRAN
install.packages("algebraic.mle")

# Development version from GitHub
devtools::install_github("queelius/algebraic.mle")
```

## Purpose

The primary focus of this package is the likelihood and the log-likelihood,
fundamental statistical concepts for a parametric model. They are integral
to both Bayesian and frequentist statistics, as well as for those who
prioritize likelihood. The algebraic.mle package enables easy handling of
MLEs, which, under certain conditions and assumptions (such as independence
and identical distribution (iid) for a sample), present numerous advantages,
including asymptotic normality and being the uniformly minimum variance
unbiased estimator of theta.

## API Overview

The main object in the `algebraic.mle` package is the `mle_fit` object, which
represents a fitted model. The package provides a number of generic methods
designed for `mle_fit` objects. A comprehensive list of functions is available
in the [function reference](https://queelius.github.io/algebraic.mle/reference/index.html)
for `algebraic.mle`.


## Fitting exponential models
Here is an example of fitting a conditional exponential model to some data
using `algebraic.mle`.
The true DGP is given by `Y | x ~ X(x) + W$` where `X(x) ~ EXP(rate(x))`,
`W ~ N(0, 1e-3)`, and `rate(x) = exp(b0 + b1 * x)`.

In this analysis, we do not care how `x` is distributed, and we take it to
be an observable exogenous variable. We are interested in the conditional
distribution of `Y | x`.

Let's fit a conditional exponential model to some data from this DGP.
While the true DGP is a bit more complicated, the most salient part is the
exponential component, and the gaussian term may be thought of as added
noise, say, from imprecise measurement. Of course, the true DGP is unknown
in practice, so arriving at an conditional exponential model is a matter
of judgement and domain knowledge.

In this model, `Y | x ~ EXP(rate(x))` where `rate(x) = exp(b0 + b1*x)`.
First, let's define the DGP (data generating process):

```{r dgp-1}
library(algebraic.dist)
b0 <- -.1
b1 <- 0.5

dgp <- function(n, x) {
    rate <- exp(b0 + b1 * x)
    X <- rexp(n, rate)
    W <- rnorm(n, 0, 1e-3)
    return(X + W)
}
```

Let's generate some data:
```{r data-1}
n <- 75 # number of observations
set.seed(1231) # for reproducibility
df <- data.frame(x = rep(NA, n), y = rep(NA, n))
for (i in 1:n) {
    x <- runif(1, -10, 10)
    y <- dgp(n = 1, x = x)
    df[i, ] <- c(x, y)
}
```

Now, we define two functions, `resp`, `rate`, and `loglik` function which will 
be used to define the model.
```{r model-1}
resp <- function(df) df$y
rate <- function(df, beta) exp(beta[1] + beta[2] * df$x)
loglik <- function(df, resp, rate) {
  function(beta) sum(dexp(x = resp(df), rate = rate(df, beta), log = TRUE))
}
```

Let's fit the model. We'll use the `optim` function in `stats` to fit the model
and then wrap it into an `mle_fit` object using `mle_numerical`.

```{r mle-1}
library(algebraic.mle)

# initial guess for the parameters
par0 <- c(0, 0)
names(par0) <- c("b0", "b1")

sol <- mle_numerical(optim(par = par0,
    fn = loglik(df, resp, rate),
    control = list(fnscale = -1),
    hessian = TRUE))
summary(sol)
```

Let's plot it:
```{r plot-model-1, fig.width = 6, fig.height = 6, fig.align = "center"}
# plot the x-y points from the data frame
plot(df$x,df$y)

# now overlay a plot of the conditional mean
x <- seq(-10, 10, .1)
b0.hat <- params(sol)[1]
b1.hat <- params(sol)[2]
y.hat <- 1/exp(b0.hat + b1.hat*x)
y <- 1/exp(b0 + b1*x)
lines(x, y, col = "green", lwd = 10)
lines(x, y.hat, col = "blue", lwd = 10)
```


### Hypothesis test and model selection

Let's test the hypothesis that `b0 = 0` using a likelihood ratio test.
We can use the LRT because this null model is a special case (nested) of the
full model. The null model is `Y | x ~ EXP(rate(x))` where
`rate(x) = exp(b1*x)`, while the full model is `Y | x ~ EXP(rate(x))` where
`rate(x) = exp(b0 + b1*x)`.

```{r null-1, warning = FALSE, message = FALSE}
# construct null model where b1 = 0
rate_b0_zero <- function(df, b1) exp(b1 * df$x)

# initial guess for the parameters
# fit the model under the null hypothesis
sol2 <- mle_numerical(optim(par = 0,
    fn = loglik(df, resp, rate_b0_zero),
    control = list(fnscale = -1),
    hessian = TRUE,
    method = "BFGS"))
summary(sol2)
```

Let's compute the likelihood ratio test statistic and p-value:
```{r}
(lrt.sol2 <- -2 * (as.numeric(logLik(sol2)) - as.numeric(logLik(sol))))
pchisq(lrt.sol2, df = 1, lower.tail = FALSE) # compute the p-value
```

We see that the `p < 0.05`, but just barely, so we say the data is not 
compatible with the null hypothesis `b0 = 0`.

If we wanted to do model selection, we could use the AIC:
```{r}
AIC(sol)
AIC(sol2)
```

By the AIC measure, since the full model has an AIC less than the null model,
we would choose the full model. We actually know the DGP and both models are reasonable approximations, but the full model is a closer approximation.

"All models are wrong, but some are useful." - George Box

Eventually, if we have a sufficiently large sample, any model that is not
the DGP can be discarded, but reality is so complex that we will never have
a large enough sample and we will never be able to come up with a model that
is exactly the DGP.

Let's do another test, `b1 = 0`, i.e., it's an unconditional exponential model,
or just a standard exponential distribution.
```{r null-b1-1, warning = FALSE, message = FALSE}
rate_b1_zero <- function(df, b0) exp(b0)
# fit the model under the null hypothesis
sol3 <- mle_numerical(optim(par = 0,
    fn = loglik(df, resp, rate_b1_zero),
    control = list(fnscale = -1),
    hessian = TRUE,
    method = "BFGS"))
(lrt.sol3 <- -2 * (as.numeric(logLik(sol2)) - as.numeric(logLik(sol))))
pchisq(lrt.sol3, df = 1, lower.tail = FALSE) # compute the p-value
```

This has a `p`-value of essentially zero, so we reject the null hypothesis
that `b1 = 0`.

Let's compare the confidence intervals for each of these models.

```{r}
CI <- confint(sol)
# print the confidence interval
print(CI)
```

We see that the 95% confidence interval for `b0` does not include zero, so we
reject the null hypothesis that `b0 = 0`. The 95% confidence interval for `b1`
does not include zero, so we reject the null hypothesis that `b1 = 0`.


You can see tutorials for more examples of using the package in the
[vignettes](https://queelius.github.io/algebraic.mle/articles/index.html).