---
title: "Enriching `family` objects: exponential family of distributions"
author: "[Ioannis Kosmidis](https://www.ikosmidis.com)"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: enrichwith.bib
vignette: >
  %\VignetteIndexEntry{enriching family objects}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

# Introduction

The [**enrichwith**](https://github.com/ikosmidis/enrichwith) R
package provides the `enrich` method to enrich list-like R objects
with new, relevant components. The resulting objects preserve their
class, so all methods associated with them still apply.

This vignette is a demo of the available enrichment options for
`family` objects, focusing on objects that correspond to members of
the exponential family of distributions.

# Exponential family and `family` objects

`family` objects specify characteristics of the models used by
functions such as `glm`. The families implemented in the `stats`
package include `binomial`, `gaussian`, `Gamma`, `inverse.gaussian`,
and `poisson`, which obvious corresponding distributions. These
distributions are all special cases of the exponential family of
distributions with probability mass or density function of the
form
$$
f_{Y}(y) = \exp\left\{\frac{y \theta - b(\theta) - c_1(y)}{\phi/m} - \frac{1}{2}a\left(-\frac{m}{\phi}\right) + c_2(y) \right\}
$$
for some sufficiently smooth functions $b(.)$, $c_1(.)$, $a(.)$ and $c_2(.)$, and a fixed weight $m$. The expected value and the variance of $Y$ are then
\begin{align*}
      E(Y) & = \mu =  b'(\theta) \\
      Var(Y) & = \frac{\phi}{m}b''(\theta) = \frac{\phi}{m}V(\mu)
\end{align*}
where $V(\mu)$ and $\phi$ are the variance function and the dispersion parameter, respectively. Below we list the characteristics of the distributions supported by `family` objects.

#### [Normal](https://en.wikipedia.org/wiki/Normal_distribution) with mean $\mu$ and variance $\phi/m$
$\theta = \mu$, $\displaystyle b(\theta) = \frac{\theta^2}{2}$, $\displaystyle  c_1(y) = \frac{y^2}{2}$,
$\displaystyle a(\zeta) = -\log(-\zeta)$, $\displaystyle c_2(y) = -\frac{1}{2}\log(2\pi)$

#### [Binomial](https://en.wikipedia.org/wiki/Binomial_distribution) with index $m$ and probability $\mu$
$\displaystyle \theta = \log\frac{\mu}{1- \mu}$, $\displaystyle b(\theta) = \log(1 + e^\theta)$,
$\displaystyle \phi = 1$, $\displaystyle c_1(y) = 0$, $\displaystyle a(\zeta) = 0$, $\displaystyle c_2(y) = \log{m\choose{my}}$

#### [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution) with mean $\mu$
$\displaystyle \theta = \log\mu$,
$\displaystyle b(\theta) = e^\theta$,
$\displaystyle \phi = 1$,
$\displaystyle c_1(y) = 0$,
$\displaystyle a(\zeta) = 0$,
$\displaystyle c_2(y) = -\log\Gamma(y + 1)$

#### [Gamma](https://en.wikipedia.org/wiki/Gamma_distribution) with mean $\mu$ and shape $1/\phi$
$\displaystyle \theta = -\frac{1}{\mu}$,
$\displaystyle b(\theta) = -\log(-\theta)$,
$\displaystyle c_1(y) = -\log y$,
$\displaystyle a(\zeta) = 2 \log \Gamma(-\zeta) + 2 \zeta \log\left(-\zeta\right)$,
$\displaystyle c_2(y) = -\log y$

#### [Inverse Gaussian](https://en.wikipedia.org/wiki/Inverse_Gaussian_distribution) with mean $\mu$ and variance $\phi\mu^3$
$\displaystyle \theta = -\frac{1}{2\mu^2}$,
$\displaystyle b(\theta) = -\sqrt{-2\theta}$,
$\displaystyle c_1(y) = \frac{1}{2y}$,
$\displaystyle a(\zeta) = -\log(-\zeta)$,
$\displaystyle c_2(y) = -\frac{1}{2}\log\left(\pi y^3\right)$


# Components in `family` objects
`family` objects provide functions for the variance function (`variance`), a specification of deviance residuals (`dev.resids`) and the Akaike information criterion (`aic`). For example
```{r, echo = TRUE, eval = TRUE}
inverse.gaussian()$dev.resids
inverse.gaussian()$variance
inverse.gaussian()$aic
```

# Enrichment options for `family` objects
The **enrichwith** R package provides methods for the enrichment of
`family` objects with a function that links the natural parameter
$\theta$ with $\mu$, the function $b(\theta)$, the first two
derivatives of $V(\mu)$, $a(\zeta)$ and its first four derivatives,
and $c_1(y)$ and $c_2(y)$. To illustrate, let's write a function that
reconstructs the densities and probability mass functions from the
components that result from enrichment
```{r, echo = TRUE, eval = TRUE}
library("enrichwith")
dens <- function(y, m = 1, mu, phi, family) {
    object <- enrich(family)
    with(object, {
        c2 <- if (family == "binomial") c2fun(y, m) else c2fun(y)
        exp(m * (y * theta(mu) - bfun(theta(mu)) - c1fun(y))/phi -
            0.5 * afun(-m/phi) + c2)
    })
}
```
The following chunks test `dens` for a few distributions against the
standard `d*` functions
```{r, echo = TRUE, eval = TRUE}
## Normal
all.equal(dens(y = 0.2, m = 3, mu = 1, phi = 3.22, gaussian()),
          dnorm(x = 0.2, mean = 1, sd = sqrt(3.22/3)))

## Gamma
all.equal(dens(y = 3, m = 1.44, mu = 2.3, phi = 1.3, Gamma()),
          dgamma(x = 3, shape = 1.44/1.3, 1.44/(1.3 * 2.3)))

## Inverse gaussian
all.equal(dens(y = 0.2, m = 7.23, mu = 1, phi = 3.22, inverse.gaussian()),
          SuppDists::dinvGauss(0.2, nu = 1, lambda = 7.23/3.22))

## Binomial
all.equal(dens(y = 0.34, m = 100, mu = 0.32, phi = 1, binomial()),
          dbinom(x = 34, size = 100, prob = 0.32))

```