---
output:
  pdf_document:
    citation_package: natbib
    latex_engine: pdflatex
    toc: false
    keep_tex: true
title: " Robust Empirical Bayes Confidence Intervals"
author: "Michal Kolesár"
date: "`r format(Sys.time(), '%B %d, %Y')`"
geometry: margin=1in
fontfamily: mathpazo
bibliography: library.bib
fontsize: 11pt
vignette: >
  %\VignetteIndexEntry{ebci}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE, cache=FALSE}
library("knitr")
knitr::opts_knit$set(self.contained = FALSE)
knitr::opts_chunk$set(tidy = TRUE, collapse=TRUE, comment = "#>",
                      tidy.opts=list(blank=FALSE, width.cutoff=55))
```

The package `ebci` implements robust empirical Bayes confidence intervals
(EBCIs) proposed by @akp20 for inference in a normal means model $Y_i\sim
N(\theta_i, \sigma^2_i)$, $i=1, \dotsc, n$.

# Setup

Suppose we use an empirical Bayes estimator of $\theta_i$ that shrinks toward
the predictor based on the regression of $\theta_i$ onto $X_i$ (equivalently,
regression of $Y_i$ onto $X_i$),
\begin{equation}\label{Bayes-estimator}
\hat{\theta}_{i} = X_i'\delta+w_{EB, i}(Y_i-X_i'\delta),
\end{equation}

where $\delta=E[X_i X_i']^{-1}E[X_i\theta_i]$, $w_{EB,
i}=\frac{\mu_{2}}{\mu_{2}+\sigma_{i}^{2}}$ is the degree of shrinkage, and
\begin{equation}\label{mu2-independence}
\mu_{2}=E[(\theta_i-X_i'\delta)^{2} \mid X_{i}, \sigma_i].
\end{equation}
is the second moment of the regression residual. We assume that $\mu_2$ doesn't
depend on $\sigma_i$. Under this setup, @morris83 proposes to use the *parametric EBCI*
\begin{equation*} \hat{\theta}_{i} \pm
\frac{z_{1-\alpha/2}}{\sqrt{w_{EB, i}}}w_{EB, i}\sigma_i.
\end{equation*}

The critical value $z_{1-\alpha/2}/\sqrt{w_{EB, i}}$ is larger than the usual
critical value $z_{1-\alpha/2}=$`qnorm(1-alpha/2)` if the estimator was unbiased
conditional on $\theta_i$. This CI is justified if we strengthen the assumption
(\ref{mu2-independence}) by making the normality assumption $\theta_i\mid X_{i},
\sigma_{i} \sim N(X_i'\delta, \mu_2)$. However, if the residual
$\theta_i-X_i'\delta$ is not normally distributed, this CI will undercover.
@akp20 derive a *robust EBCI* that is only uses (\ref{mu2-independence}) and not
the normality assumption. The EBCI takes the form
\begin{equation}\label{robust-ebci}
\hat{\theta}_{i} \pm cva_{\alpha}(m_{2i}, \infty) w_{EB, i}\sigma_i,
\,\, m_{2i}=(1-1/w_{EB, i})^2\mu_2/\sigma^2_i=\sigma^{2}_{i}/\mu_{2},
\end{equation}
where the critical value $cva_{\alpha}$ is derived in @akp20. Here $m_{2i}$ is the
second moment of the conditional bias of $\hat{\theta}_i$, scaled by the
standard error (normalized bias, henceforth). This critical value imposes a
constraint (\ref{mu2-independence}) on the second moment of $\theta_i$, but no
constraints on higher moments. We can make the critical value smaller by also
imposing a constraint on the kurtosis of $\theta_i$ (or equivalently, the
kurtosis of the normalized bias)
\begin{equation}\label{kappa-independence}
 \kappa=E[(\theta_i-X_i'\delta)^{4}
\mid X_{i}, \sigma_i]/\mu_{2}^2.
\end{equation}
In analogy to (\ref{mu2-independence}), we assume here that the conditional fourth
moment of $\theta_i-X_i'\delta$ doesn't depend on $(X_i,\sigma_i)$. In this
case, the robust EBCI takes the form
\begin{equation*}
\hat{\theta}_{i} \pm cva_{\alpha}(m_{2i},\kappa) w_{EB, i}\sigma_i.
\end{equation*}

These critical values are implemented in the package by the `cva` function:

```{r}
library("ebci")
## If m_2=0, then we get the usual critical value
cva(m2=0, kappa=Inf, alpha=0.05)$cv
## Otherwise the critical value is larger:
cva(m2=4, kappa=Inf, alpha=0.05)$cv
## Imposing a constraint on kurtosis tightens it
cva(m2=4, kappa=3, alpha=0.05)$cv
```

In practice, the parameters $\delta, \mu_2$, and $\kappa$ are unknown. To
implement the EBCI, the package replaces them with consistent estimates,
following the baseline implementation in @akp20. We illustrate this in the next
section.

# Example

Here we illustrate the use of the package using a dataset from @ChHe18ii (CH
hereafter). The dataset is included in the package as the list `cz`. Run `?cz`
for a full description of the dataset. As in @ChHe18ii, we use precision weights
proportional to the inverse of the squared standard error to compute
$(\delta,\mu_2,\kappa)$.


```{r}
## As Y_i, use fixed effect estimate theta25 of causal effect of neighborhood
## for children with parents at the 25th percentile of income distribution. The
## standard error for this estimate is se25. As predictors use average outcome
## for permanent residents (stayers), stayer25. Let us use 90% CIs, as in
## Armstrong et al
r <- ebci(formula=theta25~stayer25, data=cz, se=se25, weights=1/se25^2,
          alpha=0.1)
```

For shrinkage toward the grand mean, or toward zero, use the specification
`theta25 ~ 1`, and `theta25 ~ 0`, respectively, in the `formula` argument of
`ebci`.

The return value contains (see `?ebci` for full description)

1. The least squares estimate of $\delta$:
   ```{r}
   r$delta
   ```
2. Estimates of $\mu_2$ and $\kappa$. The estimate used for EBCI calculations
   (`estimate`) is obtained by applying a finite-sample correction to an initial
   method of moments estimate (`uncorrected_estimate`). This correction ensures
   that we don't shrink all the way to zero (or past zero) if the
   method-of-moments estimate of $\mu_2$ is negative (see @akp20 for details):
   ```{r}
   c(r$mu2, r$kappa)
   ```
3. The parameter $\alpha$ determining the confidence level, `r$alpha`, and the
   matrix of regressors, `r$X`.
4. A data frame with columns:
   ```{r}
   names(r$df)
   ```

The columns of the data frame refer to:

- `w_eb` Empirical Bayes shrinkage factor $w_{EB, i}=\mu_2/(\mu_2+\sigma_i^2)$.
- `th_eb` Empirical Bayes estimator $\hat{\theta_i}$ given in (\ref{Bayes-estimator})
- `len_eb` Half-length $cva_{\alpha}(m_2, \kappa)w_i\sigma_i$ of the robust
  EBCI, so that the lower endpoint of the EBCIs are given by `th_eb-len_eb`, and
  the upper endpoint by `th_eb+len_eb`. Let us verify this for the first observation:
  ```{r}
  cva(r$df$se[1]^2/r$mu2[1], r$kappa[1], alpha=0.1)$cv*r$df$w_eb[1]*r$df$se[1]
  r$df$len_eb[1]
  ```
- `len_pa` Half-length $z_{1-\alpha/2}\sqrt{w_i}\sigma_i$ of the parametric EBCI.
- `w_opt` Shrinkage factor that optimizes the length of the resulting confidence
  interval. In other words, instead of using $w_{EB, i}$ in (\ref{robust-ebci}),
  we use shrinkage $w_i$ that minimizes $cva_{\alpha}((1-1/w_{EB,
  i})^2\mu_2/\sigma^2_i, \kappa) w_{i}\sigma_i$. See @akp20 for details. The
  vector is missing here since the default option, `wopt=FALSE`, is to skip
  computation of the length-optimal CIs to speed up the calculations.
- `th_op` Estimator based on the length-optimal shrinkage factor `w_opt`
  (missing here since the default is `wopt=FALSE`)
- `len_op` Half-length $cva_{\alpha}((1-1/w_{EB, i})^2\mu_2/\sigma^2_i, \kappa)
  w_{i}\sigma_i$ of the length-optimal EBCI (missing here since we specified
  `wopt=FALSE`).
- `th_us` The unshrunk estimate $Y_i$, as specified in the `formula` argument of
  the function `ebci`.
- `len_us` Half-length $z_{1-\alpha/2}\sigma_i$ of the CI based on the unshrunk
  estimate
- `se` The standard error $\sigma_i$, as specified by the argument `se` of the
  `ebci` function.
- `ncov_pa` maximal non-coverage of the parametric EBCI.

Using the data frame, we can give a table summarizing the results. Let us show
the results for the CZ in New York:
```{r}
df <- (cbind(cz[!is.na(cz$se25), ], r$df))
df <- df[df$state=="NY", ]

knitr::kable(data.frame(cz=df$czname, unshrunk_estimate=df$theta25,
             estimate=df$th_eb,
             lower_ci=df$th_eb-df$len_eb, upper_ci=df$th_eb+df$len_eb),
             digits=3)
```
@akp20 present the same information as a figure.

Finally, let us compute some summary statistics as in Table 3 in @akp20. Average
half-length of the robust, parametric, and unshrunk CI:

```{r}
mean(r$df$len_eb)
mean(r$df$len_pa)
mean(r$df$len_us)
```
The efficiency of the parametric and unshrunk CI relative to the robust EBCI is given by
```{r}
mean(r$df$len_eb)/mean(r$df$len_pa)
mean(r$df$len_eb)/mean(r$df$len_us)
```

While the parametric EBCI is shorter on average, it yields CIs that may violate the
90% coverage requirement. In particular, the average maximal non-coverage probability at the
estimated value of $(\mu_{2},\kappa)$ is given by
```{r}
mean(r$df$ncov_pa)
```

# References