---
title: "Frequentist Likelihood-Based Inference for Time Series Extremes"
author: "Paul Northrop"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Frequentist Likelihood-Based Inference for Time Series Extremes}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: lite.bib
csl: taylor-and-francis-chicago-author-date.csl
---

```{r, include = FALSE}
knitr::opts_chunk$set(comment = "#>", collapse = TRUE)

required <- c("exdex")

if (!all(unlist(lapply(required, function(pkg) requireNamespace(pkg, quietly = TRUE)))))
  knitr::opts_chunk$set(eval = FALSE)
```

```{r, include = FALSE}
knitr::opts_chunk$set(comment = "#>", collapse = TRUE)
```

The **lite** package performs likelihood-based inference for stationary time series extremes. This vignette concentrates on frequentist inference, using maximum likelihood estimation.  See the vignette [Bayesian Likelihood-Based Inference for Time Series Extremes](lite-2-bayesian.html) for information about Bayesian inference.

The general approach follows @FW2012. There are 3 independent parts to the inference, all performed using maximum likelihood estimation.

1. A Bernoulli$(p_u)$ model for whether a given observation exceeds the threshold $u$.
2. A generalised Pareto, GP$(\sigma_u, \xi)$, model for the marginal distribution of threshold excesses.
3. The $K$-gaps model for the extremal index $\theta$, based on inter-exceedance times.

For parts 1 and 2, inferences based on a mis-specified independence log-likelihood are adjusted to account for clustering in the data. The adjustment does not affect the point estimates of $(p_u, \sigma_u, \xi)$ but it ensures that estimates of uncertainty associated with these estimates account for dependence in the data.  Otherwise, uncertainty in these parameters will tend to be underestimated. 

Here, we follow @CB2007 to estimate adjusted log-likelihood functions for $p_u$ and for ($\sigma_u, \xi$), using the **chandwich** package [@chandwich].  For details of the log-likelihood adjustments see the [introductory vignette for the chandwich package](https://cran.r-project.org/package=chandwich).  We will see below that although there is more than one way to perform this adjustment there is a strong argument in favour of the so-called **vertical** adjustment.  A vertical adjustment adjusts on the log-likelihood scale, thereby respecting constraints on the parameter values, whereas the **horizontal** adjustment adjusts on the parameter scale.

For part 3, the methodology described in @SD2010 is used, implemented by the function `kgaps` in the **exdex** package [@exdex].  The $K$-gaps model involves a tuning parameter $K$ that defines how distant in time threshold exceedances need to be before they are considered to be from separate clusters of threshold exceedances. In addition to providing an estimator of $\theta$, @SD2010 provides a diagnostic to inform the choice of $K$ and the threshold $u$.  We use this diagnostic below, but it would also be advisable also to examine graphically the effect of $(u, K)$ on extremes value inferences, by plotting point estimates and confidence intervals for key quantities such as the generalised Pareto shape parameter $\xi$ or return levels of interest.

The (adjusted) log-likelihoods produced in parts 1, 2 and 3 are combined to form a log-likelihood for $(p_u, \sigma_u, \xi, \theta)$ from which extreme value inferences can be made.

## Cheeseboro wind gust data

We use an example dataset to illustrate the use of the two main functions in **lite**, that is, `flite` and `returnLevel`.  The 744 by 10 matrix `cheeseboro`, available from the **exdex** package, contains hourly maximum wind gusts (in miles per hour) recorded at the Cheeseboro weather station near Thousand Oaks, Southern California, USA during the month of January over the period 2000-2009.  Column $i$ contains the hourly values for year $1999+i$.  These data are sourced from the [Cheeseboro page](https://raws.dri.edu/cgi-bin/rawMAIN.pl?caCCHB) of the Remote Automated Weather Stations USA Climate Archive.  The following plot shows the general behaviour of these data. 

```{r cheesy, echo = FALSE, fig.width = 7, fig.height = 5}
oldpar <- par(mar = c(4, 4, 1, 1))
cmat <- exdex::cheeseboro
NAmat <- matrix(NA, ncol = 10, nrow = 250)
cmat <- rbind(cmat, NAmat)
cvec <- c(cmat)
plot(cvec, xlab = "year", ylab = "hourly wind gusts, mph", axes = FALSE,
     ylim = c(0, max(cvec, na.rm = TRUE)), type = "l")
axis(1, at = c(-1000, seq(from = 372, by = 744 + 250, length = 10), 100000), 
     labels = c(NA, 2000:2009, NA))
axis(2)
par(oldpar)
```

For estimating $(p_u, \sigma_u, \xi)$ we treat observations from different years as being in separate clusters.  In the code below we do not need to set the argument `cluster` explicitly because when the data are supplied as a matrix the function `flite` sets `cluster` automatically to achieve this.  A total of 42 observations are missing in `cheeseboro`. This causes no problems for inferences about $(p_u, \sigma_u, \xi)$.  For making inferences about $\theta$ using the $K$-gaps model the `kgaps` function splits the data further into sequences of non-missing values and constructs the sample $K$-gaps separately for each sequence.

## Inferences for model parameters

We need to choose values for the tuning parameters threshold $u$ and $K$-gaps run parameter $K$.  There is no definitive way to do this but the information matrix test developed in @SD2010 can help to make this choice.  Based on the $K$-gaps section of the [introductory vignette for the exdex package](https://cran.r-project.org/package=exdex) we choose $u = 45$mph and $K = 3$.

```{r cheeseboro}
library(lite)
cdata <- exdex::cheeseboro
# Each column of the matrix cdata corresponds to data from a different year
# flite() sets cluster automatically to correspond to column (year)
cfit <- flite(cdata, u = 45, k = 3)
summary(cfit)
summary(cfit, adjust = FALSE)
```

The effect of the adjustment of the log-likelihood for clustering is to increase the estimated standard errors for the parameters $p_u$ and $(\sigma_u, \xi)$ in comparison to those obtained with no adjustment.

A plot method for objects inheriting from class `"flite"` enables us to inspect the log-likelihood functions for $p_u$, $(\sigma_u, \xi)$ and $\theta$.  This first set of plots use the default vertical adjustment to the log-likelihoods for $p_u$ and $(\sigma_u, \xi)$.

```{r vertical, fig.width = 7, fig.height = 5, message = FALSE}
plot(cfit)
```

The line superimposed on the on the plot on the top right shows the boundary of the log-likelihood imposed by the constraint $\xi > -\sigma_u / x_{(n)}$, where $x_{(n)}$ is the largest threshold excess.  Using the vertical adjustment ensures that this constraint is respected after adjustment.  However, if we use one of the horizontal adjustments, **cholesky** or **spectral**, then the adjusted log-likelihood has non-zero density values outside of the correct support. This is why the vertical adjustment is the default.

```{r spectral, fig.width = 7, fig.height = 5, message = FALSE}
plot(cfit, which = "gp", adj_type = "spectral")
```

## Inferences for return levels

The $m$-year return level is the value that is exceeded with probability $1/m$ in any given year.  To infer this from time series data we need to take account of the (intended) sampling frequency of the data.  In the context of the Cheeseboro data, a year is effectively one month because we are making inferences about extremal behaviour in January.  If there are no missing values then there are $31 \times 24 = 744$ hourly values per January.  We denote this by $n_y$ (number of observations per year).  

The $m$-year return level is given by 
$$x_m = u+\frac{\sigma_u}{\xi} \left[ p_u^\xi \left\{ 1-\left( 1-\displaystyle \frac1m\right)^{1/n_{y}\theta} \right\}^{-\xi} - 1 \right].$$

If $m n_y \theta$ is large then 
$$x_m \approx u+\frac{\sigma_u}{\xi} \left[ (m \, n_{y}\, \theta \, p_u)^\xi  -  1 \right].$$
The `returnLevel` function performs inferences about return levels.  It uses the former expression for $x_m$.  By default it calculates two types of (95\%) confidence intervals: (a) intervals based on approximate large-sample normality of the maximum likelihood estimator for return level, which are symmetric about the point estimate, and (b) profile likelihood-based intervals based on an (adjusted) log-likelihood.

```{r returnlevels, fig.width = 7, fig.height = 5}
rl <- returnLevel(cfit, m = 100, level = 0.95, ny = 31 * 24)
summary(rl)
rl
plot(rl)
```

## References

<script type="text/x-mathjax-config">
   MathJax.Hub.Config({  "HTML-CSS": { minScaleAdjust: 125, availableFonts: [] }  });
</script>
