---
title: "hmclearn:  Poisson Regression Example"
author: "Samuel Thomas"
date: "2020-10-04"
output:
  rmarkdown::html_vignette:
    fig_caption: false
vignette: >
  %\VignetteIndexEntry{poisson_regression_hmclearn}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---



## Introduction

This vignette demonstrates fitting a Poisson regression model via Hamiltonian Monte Carlo (HMC) using the **hmclearn** package.


```r
library(hmclearn)
```

For a count response, we let

$$
p(y | \mu) = \frac{e^{-\mu}\mu^y}{y!},
$$

with a log-link function

$$
\begin{aligned}
\boldsymbol\mu &:= E(\mathbf{y} | \mathbf{X}) = e^{\mathbf{X}\boldsymbol\beta}, \\
\log \boldsymbol\mu &= \mathbf{X}\boldsymbol\beta.
\end{aligned}
$$
The vector of responses is  $\mathbf{y} = (y_1, ..., y_n)^T$. The covariate values for subject $i$ are $\mathbf{x}_i^T = (x_{i0}, ..., x_{iq})$ for $q$ covariates plus an intercept. We write the full design matrix as $\mathbf{X} = (\mathbf{x}_1^T, ..., \mathbf{x}_n^T)^T \in \mathbb{R}^{n\times(q+1)}$ for $n$ observations. The regression coefficients are a vector of length $q + 1$, $\boldsymbol\beta = (\beta_0, ..., \beta_q)^T$.

## Derive log posterior and gradient for HMC

We develop the likelihood

$$
\begin{aligned}
f(\mathbf{y} | \boldsymbol\mu) &= \prod_{i=1}^n \frac{e^{-\mu_i}\mu_i^{y_i}}{y_i!}, \\
f(\mathbf{y} | \mathbf{X}, \boldsymbol\beta) &= \prod_{i=1}^n \frac{e^{-e^{\mathbf{x}_i^T\boldsymbol\beta}}e^{y_i\mathbf{x}_i^T\boldsymbol\beta}}{y_i!}, \\
\end{aligned}
$$

and log-likelihood

$$
\begin{aligned}
f(\mathbf{y} | \mathbf{X}, \boldsymbol\beta) &= \sum_{i=1}^n -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i \mathbf{x}_i^T \boldsymbol\beta - \log y_i!,  \\
&\propto \sum_{i=1}^n -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i \mathbf{x}_i^T \boldsymbol\beta.
\end{aligned}
$$

We set a multivariate normal prior for $\boldsymbol\beta$

$$
\begin{aligned}
\boldsymbol\beta &\sim N(0, \sigma_\beta^2 \mathbf{I}),
\end{aligned}
$$

with pdf, omitting constants

$$
\begin{aligned}
\pi(\boldsymbol\beta | \sigma_\beta^2) &= \frac{1}{\sqrt{\lvert 2\pi \sigma_\beta^2 \rvert }}e^{-\frac{1}{2}\boldsymbol\beta^T \boldsymbol\beta / \sigma_\beta^2},  \\
\log \pi(\boldsymbol\beta | \sigma_\beta^2) &= -\frac{1}{2}\log(2\pi \sigma_\beta^2) - \frac{1}{2}\boldsymbol\beta^T \boldsymbol\beta / \sigma_\beta^2, \\
&\propto -\frac{1}{2}\log \sigma_\beta^2 - \frac{\boldsymbol\beta^T\boldsymbol\beta}{2\sigma_\beta^2}.
\end{aligned}
$$

Next, we derive the log posterior, omitting constants,

$$
\begin{aligned}
f(\boldsymbol\beta | \mathbf{X}, \mathbf{y}, \sigma_\beta^2) &\propto f(\mathbf{y} | \mathbf{X}, \boldsymbol\beta) \pi(\boldsymbol\beta | \sigma_\beta^2), \\
\log f(\boldsymbol\beta | \mathbf{X}, \mathbf{y}, \sigma_\beta^2) & \propto \log f(\mathbf{y} | \mathbf{X}, \boldsymbol\beta) + \log \pi(\beta | \sigma_\beta^2), \\
&\propto \sum_{i=1}^n \left( -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i \mathbf{x}_i^T \boldsymbol\beta\right) - \frac{1}{2}\beta^T\beta/\sigma_\beta^2, \\
&\propto \sum_{i=1}^n \left( -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i \mathbf{x}_i^T \boldsymbol\beta\right) - \frac{\boldsymbol\beta^T\boldsymbol\beta}{2\sigma_\beta^2}, \\
&\propto \mathbf{y}^T\mathbf{X}\boldsymbol\beta - \mathbf{1}_n^T e^{\mathbf{X}\boldsymbol\beta} - \frac{\boldsymbol\beta^T\boldsymbol\beta}{2\sigma_\beta^2}.
\end{aligned}
$$

We need to derive the gradient of the log posterior for the leapfrog function in HMC.

$$
\begin{aligned}
\nabla_{\boldsymbol\beta}\log f(\boldsymbol\beta|\mathbf{X}, \mathbf{y}, \sigma_\beta^2) &\propto \sum_{i=1}^n\left( -e^{\mathbf{x}_i^T\boldsymbol\beta}\mathbf{x}_i + y_i\mathbf{x}_i\right) - \boldsymbol\beta / \sigma_\beta^2, \\
&\propto \sum_{i=1}^n \mathbf{x}_i\left( -e^{\mathbf{x}_i^T\boldsymbol\beta} + y_i\right) - \boldsymbol\beta / \sigma_\beta^2, \\
&\propto \mathbf{X}^T (\mathbf{y} - e^{\mathbf{X}\boldsymbol\beta}) - \boldsymbol\beta/ \sigma_\beta^2.
\end{aligned}
$$

## Poisson Regression Example Data

The user must define provide the design matrix directly for use in **hmclearn**.  Our first step is to load the data and store the design matrix $X$ and dependent variable vector $y$.

We load drug usage data and create the design matrix $X$ and dependent vector $y$.  This example also appears in Agresti (2015), and we compare results to his.


```r
data(Drugs)

# design matrix
X <- model.matrix(count ~ A + C + M + A:C + A:M + C:M , data=Drugs)
X <- X[, 1:ncol(X)]

# independent variable is count data
y <- Drugs$count
```

## Comparison model - Frequentist

To compare results, we first fit a standard linear model using the frequentist function *glm*.


```r
# matrix representation
f <- glm(y ~ X-1, family=poisson(link=log))
summary(f)
#> 
#> Call:
#> glm(formula = y ~ X - 1, family = poisson(link = log))
#> 
#> Deviance Residuals: 
#>        1         2         3         4         5         6  
#>  0.02044  -0.02658  -0.09256   0.02890  -0.33428   0.09452  
#>        7         8  
#>  0.49134  -0.03690  
#> 
#> Coefficients:
#>              Estimate Std. Error z value Pr(>|z|)    
#> X(Intercept)  5.63342    0.05970  94.361  < 2e-16 ***
#> XAyes         0.48772    0.07577   6.437 1.22e-10 ***
#> XCyes        -1.88667    0.16270 -11.596  < 2e-16 ***
#> XMyes        -5.30904    0.47520 -11.172  < 2e-16 ***
#> XAyes:Cyes    2.05453    0.17406  11.803  < 2e-16 ***
#> XAyes:Myes    2.98601    0.46468   6.426 1.31e-10 ***
#> XCyes:Myes    2.84789    0.16384  17.382  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for poisson family taken to be 1)
#> 
#>     Null deviance: 2.4038e+04  on 8  degrees of freedom
#> Residual deviance: 3.7399e-01  on 1  degrees of freedom
#> AIC: 63.417
#> 
#> Number of Fisher Scoring iterations: 4
```

## Fit model using *hmc*

Next, we fit the poisson regression model using HMC.  A vector of $\epsilon$ values are specified to align with the data.


```r
N <- 1e4

eps_vals <- c(rep(5e-4, 2), 1e-3, 2e-3, 1e-3, 2e-3, 5e-4)

set.seed(412)
t1.hmc <- Sys.time()
 f_hmc <- hmc(N = N, theta.init = rep(0, 7),
                    epsilon = eps_vals, L = 50,
                    logPOSTERIOR = poisson_posterior,
                    glogPOSTERIOR = g_poisson_posterior,
                    varnames = colnames(X),
                    parallel=FALSE, chains=2,
                    param=list(y=y, X=X))
t2.hmc <- Sys.time()
t2.hmc - t1.hmc
#> Time difference of 51.99019 secs
```


## MCMC summary and diagnostics

The acceptance ratio for each of the HMC chains is sufficiently high for an efficient simulation.


```r
f_hmc$accept/N
#> [1] 0.9804 0.9829
```

The posterior quantiles are summarized after removing an initial *burnin* period. The $\hat{R}$ statistics provide an indication of convergence. Values close to one indicate that the multiple MCMC chains coverged to the same distribution, while values above 1.1 indicate possible convergence problems. All $\hat{R}$ values in our example are close to one.



```r
summary(f_hmc, burnin=3000, probs=c(0.025, 0.5, 0.975))
#> Summary of MCMC simulation
#>                   2.5%        50%      97.5%     rhat
#> (Intercept)  5.5238199  5.6327435  5.7521131 1.001527
#> Ayes         0.3379506  0.4875867  0.6251244 1.006951
#> Cyes        -2.2176007 -1.8992300 -1.6078705 1.001525
#> Myes        -6.3840770 -5.3643129 -4.4945189 1.007716
#> Ayes:Cyes    1.7495447  2.0670083  2.4061751 1.005151
#> Ayes:Myes    2.1572253  3.0200944  4.1225859 1.018774
#> Cyes:Myes    2.5174705  2.8574315  3.1983475 1.026709
```

Trace plots provide a visual indication of stationarity.  These plots indicate that the MCMC chains are reasonably stationary.


```r
mcmc_trace(f_hmc, burnin=3000)
```

![plot of chunk fig1poisson](fig1poisson-1.png)

Histograms of the posterior distribution show that Bayesian parameter estimates align with Frequentist estimates.


```r
beta.freq <- coef(f)
diagplots(f_hmc, burnin=3000, comparison.theta=beta.freq)
#> $histogram
```

![plot of chunk fit2poisson](fit2poisson-1.png)

## Source

Data originally provided by Harry Khamis, Wright State University

## References

Agresti, A. (2015). *Foundations of linear and generalized linear models*. John Wiley & Sons.  ISBN: 978-1-118-73003-4

Thomas, Samuel, and Wanzhu Tu. "Learning Hamiltonian Monte Carlo in R." arXiv preprint arXiv:2006.16194 (2020).

