---
title: "Tweedie likelihood computations"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tweedie likelihood computations}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Tweedie distributions

Tweedie distributions are exponential dispersion models, with a mean $\mu$ and a variance $\phi \mu^\xi$, for some dispersion parameter $\phi > 0$ and a variance-power index $\xi$ [@tweedie1984], that is sometimes called $p$, that uniquely defines the distribution within the Tweedie family (for all real values of $\xi$ **not** between 0 and 1).

Special cases of the Tweedie distributions are:

* the *normal* distribution, with $\xi = 0$ (i.e., the variance is $\phi$ and not related to the mean);
* the *Poisson* distribution, with $\xi = 1$ and $\phi = 1$ (i.e., the variance is the same as the mean);
* the *gamma* distribution, with $\xi = 2$; and
* the *inverse Gaussian* distribution, with $\xi = 3$.

For all other values of $\xi$, the probability functions and distribution functions have no closed forms.


| Variance power, $\xi$       | Distribution     | Support                       |
|:---------------------------:|:----------------:|:-----------------------------:|
| $\xi = 0$                   | Normal           | $(-\infty, \infty)$           |
| $0 < \xi < 1$               | Do not exist     |                               |
| $\xi = 1$ (with $\phi = 1)$ | Poisson          | Discrete: $(0, 1, 2, \dots)$  |
| $1 < \xi < 2$               | Poisson--gamma   | $[ 0, \infty)$                |
| $\xi = 2$                   | Gamma            | $(0, \infty)$                 |
| $\xi > 2$                   | Skewed right     | $(0, \infty$)                 |
| $\xi = 3$                   | inverse Gaussian | $(0, \infty)$                 |

For $\xi < 1$, applications are limited (non-existent so far?), but have support on the entire real line and $\mu > 0$.

For $1 < \xi < 2$, Tweedie distributions can be represented as a Poisson sum of gamma distributions. 
These distributions are continuous for $Y > 0$ but have a discrete mass at $Y = 0$.


## Use cases

Likelihood computations are not need to fit a Tweedie generalized linear model, using the `tweedie()` family from the [**statmod** package](https://CRAN.R-project.org/package=statmod) (@smythStatmodStatisticalModeling2025, @dunnGeneralizedLinearModels2018):

```{r TWexample}
library(tweedie)
library(statmod)   # For the tweedie() family function, for use in glm()

set.seed(96)
N <- 25
# Mean of the Poisson (lambda) and Gamma (shape/scale)
lambda <- 1.5
# Generating Compound Poisson-Gamma data manually
y <- replicate(N, {
  n_events <- rpois(1, lambda = lambda)
  if (n_events == 0) 0 else sum(rgamma(n_events, shape = 2, scale = 1))
})

mod.tw <- glm(y ~ 1, 
              family = statmod::tweedie(var.power = 1.5, link.power = 0) )
              # link.power = 0  means the log-link
```

However, likelihood computations are necessary in other situations.
Since likelihoods cannot be computed analytically apart from special cases, the **tweedie** package computes Tweedie densities  using numerical methods (@dunnSeriesEvaluationTweedie2005, @dunnEvaluationTweedieExponential2008).

### Generating random numbers

Random numbers from a Tweedie distribution are generated using **rtweedie()**:

```{r TWrandom}
tweedie::rtweedie(10, xi = 1.1, mu = 2, phi = 1)
```

### Plotting density and probability functions

Accurate density functions are generated using **dtweedie()**:


```{r TWplotsPDF}
y <- seq(0, 2, length = 100)
xi <- 1.1
mu <- 0.5
phi <- 0.4

twden <- tweedie::dtweedie(y, xi = xi, mu = mu, phi = phi)  
twdtn <- tweedie::ptweedie(y, xi = xi, mu = mu, phi = phi)

plot( twden[y > 0] ~ y[y > 0], 
      xlab = expression(italic(y)),
      ylab = "Density function",
      type ="l",
      lwd = 2,
      las = 1)
points(twden[y==0] ~ y[y == 0],
      lwd = 2,
      pch = 19)
```

Accurate probability functions are generated using **ptweedie()**:

```{r TWplotsCDF}
plot(twdtn ~ y,
     xlab = expression(italic(y)),
     ylab = "Distribution function",
     type = "l",
     las = 1,
     lwd = 2,
     las = 1,
     ylim = c(0, 1) )
points(twdtn[y==0] ~ y[y == 0],
      lwd = 2,
      pch = 19)
```

However, the function **tweedie_plot()** is sometimes more convenient (especially for $1 < \xi \ < 2$, when a probability mass at $Y = 0$ is present):

```{r TWplotsPDF2}
tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi,
                      ylab = "Density function",
                      xlab = expression(italic(y)),
                      las = 1,
                      lwd = 2)
```

```{r TWplotsCDF2}
tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi, 
                      ylab = "Distribution function",
                      xlab = expression(italic(y)),
                      las = 1, 
                      lwd = 2, 
                      ylim = c(0, 1), 
                      type = "cdf")
```

### Computing the quantile residuals

Quantile residuals [@dunnRandomizedQuantileResiduals1996] can be computed to assess the fit of a glm.
The function **qresid()** is in the [**statmod** package](https://CRAN.R-project.org/package=statmod) (@smythStatmodStatisticalModeling2025, @dunnGeneralizedLinearModels2018):

```{r TWqqplot}
library(tweedie)

qqnorm( qr <- statmod::qresid(mod.tw),
        las = 1)
qqline(qr,
       col = "grey")
```

The quantile residuals suggest the model may not be adequate.


### Estimating $\xi$

To use a Tweedie distribution in a glm, the value of $\xi$ is needed.
To find the most suitable value of $\xi$, **tweedie_profile()** can be used [@dunnGeneralizedLinearModels2018]:

```{r TWprofile}
out <- tweedie::tweedie_profile(y ~ 1, 
                                xi.vec = seq(1.2, 1.8, by = 0.05), 
                                do.plot = TRUE)

# The estimated power index:
out$xi.max
```


## Example

Consider the `poison` data in the `GLMsData` package [@dunnGLMsData], which records the survival times of animals under various treatments and poisons (used in @dunnGeneralizedLinearModels2018, Sect. 12.4.2). 
For convenience, a copy is included with this package:

```{r PSNLoad}
poison <- read.csv(system.file("extdata", "poison.csv", package = "tweedie"))

# Convert to factors:
poison$Psn <- factor(poison$Psn)
poison$Trmt <- factor(poison$Trmt)

head(poison)
```

The survival time is likely related to the poison type and type of treatment:

```{r PSNPlotPsn}
plot(Time ~ Psn,
     data = poison,
     xlab = "Poison type",
     ylab = "Survival time (tens of hours)",
     las = 1)
```
```{r PSNPlotTmt}
plot(Time ~ Trmt,
     data = poison,
     xlab = "Treatment type",
     ylab = "Survival time (tens of hours)",
     las = 1)
```

In both cases, the variance of the survival times seems to increase with the mean survival time.
Perhaps a Tweedie glm would be appropriate.
To fit a Tweedie glm, the value of $\xi$ is needed; since no exact zeros are present (or possible) for the survival times, very likely $\xi \ge 2$:


```{r PSNProfile}
PSNPrf <- tweedie_profile(Time ~ Trmt + Psn, data = poison, 
                          do.plot = TRUE, xi.vec = seq(2.5, 5.5, length = 11) )
```

Then, $\xi\approx 4$:

```{r PSNXi}
PSNPrf$xi.max
```


So then the Tweedie glm can be fitted:

```{r PSNModel}
PSNMod <- glm(Time ~ Trmt + Psn, 
              data = poison,
              family=statmod::tweedie(var = 4, link.power = 0))
anova(PSNMod, test = "F")
```

Quantile residuals can be used to assess the model:

```{r PSNRes}
qr <- statmod::qresid(PSNMod)
qqnorm(qr,
       las = 1)
qqline(qr,
       col = "grey")
```

The quantile residuals suggest that this model looks fine.



## References
