---
title: "Owen cumulative functions"
author: "Stéphane Laurent"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    fig_caption: no
    number_sections: no
    toc: yes
vignette: >
  %\VignetteIndexEntry{Owen cumulative functions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
library(OwenQ)
knitr::opts_chunk$set(collapse=TRUE)
```


# The Owen distribution

Let $Z \sim {\cal N}(0,1)$ and $X \sim \chi^2_\nu$ be two independent random variables.
For real numbers $\delta_1$ and $\delta_2$, define the two random variables
$$
T_1 = \frac{Z+\delta_1}{\sqrt{X/\nu}}
\quad \text{and} \quad\;
T_2 = \frac{Z+\delta_2}{\sqrt{X/\nu}}.
$$

Both $T_1$ and $T_2$ follow a non-central Student distribution. 
The number of degrees of freedom is $\nu$ for each of them, and their respective non-centrality parameters are $\delta_1$ and $\delta_2$ respectively. 

Owen (1965) studied the distribution of the pair $(T_1, T_2)$.

The four Owen cumulative functions are
$$
\begin{align}
O_1(\nu, t_1, t_2, \delta_1, \delta_2) & = \Pr(T_1 \leq t_1, T_2 \leq t_2), \\
O_2(\nu, t_1, t_2, \delta_1, \delta_2) & = \Pr(T_1 \leq t_1, T_2 \geq t_2), \\
O_3(\nu, t_1, t_2, \delta_1, \delta_2) & = \Pr(T_1 \geq t_1, T_2 \geq t_2), \\
O_4(\nu, t_1, t_2, \delta_1, \delta_2) & = \Pr(T_1 \geq t_1, T_2 \leq t_2).
\end{align}
$$

Owen provided an efficient way to evaluate these functions *when $\nu$ is an integer number*. Owen's algorithms are implemented in the `OwenQ` package. 

For $\delta_1 > \delta_2$, these four functions are implemented in the `OwenQ` package under the respective names `powen1`, `powen2`, `powen3` and `powen4`. 
For general values of $\delta_1$ and $\delta_2$, they are implemented under the respective names `psbt1`, `psbt2`, `psbt3` and `psbt4`. 


# Non-central Student distribution 

Owen (1965) also provided an algorithm to evaluate the cumulative distribution function of a univariate non-central Student distribution with an integer number of degrees of freedom. 
This evaluation is performed by the function `ptOwen` of the `OwenQ` package.

```{r}
ptOwen(q=1, nu=3, delta=2)
pt(q=1, df=3, ncp=2)
```

It is known that the `pt` function is not reliable when the non-centrality parameter `ncp` is large. 
Below we compare the values given by `ptOwen` and `pt` to the value given by Wolfram|Alpha (returned by the command `N[CDF[NoncentralStudentTDistribution[4,70],80]]`):

```{r}
p1 <- pt(q=80, df=4, ncp=70)
p2 <- ptOwen(q=80, nu=4, delta=70)
wolfram <- 0.54742763380700947685
p1 - wolfram
p2 - wolfram
```


# Limitations

When `q`$=$`delta`, the value of `ptOwen(q, nu, delta)` should go to `0.5` as `nu` increases to infinity. The examples below show the failure of this expectation when `nu` is too large.

```{r ptOwen_fails}
ptOwen(q=50, nu=3500, delta=50)
ptOwen(q=50, nu=3600, delta=50)
ptOwen(q=50, nu=3650, delta=50)
ptOwen(q=50, nu=3660, delta=50)
ptOwen(q=50, nu=3670, delta=50)
ptOwen(q=50, nu=3680, delta=50)
```

Since all Owen's algorithms are somehow similar to the algorithm evaluating `ptOwen`, we can suspect that the other ones also suffer from certain limitations. 
In the other vignette, we investigate the reliability and the limitations of `OwenQ`. 

In order to do some comparisons, and thanks to the `BH` package, we have ported the `boost` implementation of the cumulative Student distribution to `OwenQ`. It is evaluated by the internal function `pt_boost`. We concluded that this function is highly reliable. In particular it does not suffer from the failures of `ptOwen` we have just shown:

```{r pt_boost}
OwenQ:::pt_boost(q=50, nu=3500, delta=50)
OwenQ:::pt_boost(q=50, nu=3600, delta=50)
OwenQ:::pt_boost(q=50, nu=3650, delta=50)
OwenQ:::pt_boost(q=50, nu=3660, delta=50)
OwenQ:::pt_boost(q=50, nu=3670, delta=50)
OwenQ:::pt_boost(q=50, nu=3680, delta=50)
```


# Application to equivalence testing 

The Owen distribution intervenes in the calculation of the power of equivalence tests. 

## One sample 

Assume a statistical model given by a sample 
$y_i \sim_{\text{iid}} {\cal N}(\mu, \sigma^2)$ for $i=1, \ldots, n$. 
We want to demonstrate that, up to a given confidence level, the mean $\mu$ belongs to a certain interval $[\Delta_1, \Delta_2]$. 
In other words, we are interested in the alternative hypothesis 
$H_1\colon\{\Delta_1 \leq \mu \leq \Delta_2\}$. 

Consider the usual $100(1-2\alpha)\%$-confidence interval about $\mu$:
$$
\left[\bar y - t^\ast_{n-1}(\alpha)\frac{\hat\sigma}{\sqrt{n}}, \, 
\bar y + t^\ast_{n-1}(\alpha)\frac{\hat\sigma}{\sqrt{n}} \right].
$$

The $H_1$ hypothesis is accepted at level $\alpha$ when this interval falls into the interval $[\Delta_1, \Delta_2]$.

This can be written as follows:
$$
T_1 := 
\frac{\bar y - \Delta_1}{\hat\sigma/\sqrt{n}} 
\geq t^\ast_{n-1}(\alpha)
\quad \text{and} \quad 
T_2 := 
\frac{\bar y - \Delta_2}{\hat\sigma/\sqrt{n}} 
\leq - t^\ast_{n-1}(\alpha).
$$

Observe that 
$$
T_1 = \frac{z - \delta_1}{\dfrac{\sqrt{n-1}\hat\sigma/\sigma}{\sqrt{n-1}}}
$$
where
$$
z = \frac{\sqrt{n}}{\sigma}(\bar y - \mu) \sim {\cal N}(0,1)
\quad \text{and} \quad 
\delta_1 = \frac{\mu - \Delta_1}{\sigma/\sqrt{n}}.
$$

By reasoning in the same way for $T_2$, we find that the pair $(T_1, T_2)$ follows the 
Owen distribution with degrees of freedom $\nu = n-1$, and non-centrality 
parameters $\delta_1$ given above and 
$\delta_2 = \tfrac{\mu - \Delta_2}{\sigma/\sqrt{n}}$.

Therefore the power of the test - *i.e.* the probability to accept $H_1$ - is given by 
$$
O_4\bigl(n-1, t^\ast_{n-1}(\alpha), -t^\ast_{n-1}(\alpha), \delta_1, \delta_2\bigr),
$$
and then can be evaluated thanks to `powen4`. 

## Inconclusive equivalence test

The result of the equivalence test is said to be *inconclusive* when only one of the bounds of the confidence interval falls into the interval $[\Delta_1, \Delta_2]$. 

The probability to get an inconclusive result can be obtained with the `OwenQ` package. We show and check that below with the help of simulations.

```{r}
Delta1 <- -2; Delta2 <- 2 
mu <- 1; sigma <- 6; n <- 30L
alpha <- 0.05
nsims <- 1e6L
equivalence <- inconclusive <- numeric(nsims)
for (i in 1L:nsims) {
  y <- rnorm(n, mu, sigma)
  CI <- t.test(x = y, conf.level = 1-2*alpha)$conf.int
  equivalence[i] <- (CI[1] > Delta1) && (CI[2] < Delta2)
  inconclusive[i] <- ((CI[1] < Delta1) && (CI[2] > Delta1)) ||
    ((CI[1] < Delta2) && (CI[2] > Delta2))
}
```

```{r}
dof <- n-1
q <- qt(1-alpha, dof)
se <- sqrt(1/n)*sigma
delta1 <- (mu-Delta1)/se; delta2 <- (mu-Delta2)/se
# probability to get equivalence
mean(equivalence)
powen4(dof, q, -q, delta1, delta2)
# probability to get inconclusive
mean(inconclusive)
ptOwen(q, dof, delta2) - ptOwen(-q, dof, delta1) - powen4(dof, q, -q, delta1, delta2)
```

## Parallel design 

Now consider two independent samples 
$x_i \sim_{\text{iid}} {\cal N}(\mu, \sigma^2)$ for $i=1, \ldots, n_1$. 
and 
$y_i \sim_{\text{iid}} {\cal N}(\nu, \sigma^2)$ for $i=1, \ldots, n_2$ 
and the alternative hypothesis 
$H_1\colon\bigl\{|\mu-\nu| \leq \Delta\bigr\}$. 
The power of this test is evaluated by the function `powerTOST` below. 
The parameter `delta0` corresponds to the difference $\mu-\nu$. 

```{r}
powerTOST <- function(alpha, delta0, Delta, sigma, n1, n2) {
  se <- sqrt(1/n1 + 1/n2) * sigma
  delta1 <- (delta0 + Delta) / se
  delta2 <- (delta0 - Delta) / se
  dof <- n1 + n2 - 2
  q <- qt(1 - alpha, dof)
  powen4(dof, q, -q, delta1, delta2)
}
```



# The Owen $T$-function

The `OwenQ` package also provides an implementation of the Owen $T$-function, 
under the name `OwenT`. 
This is a port of the function `owens_t` of `boost`, the peer-reviewed collection of C++ libraries.

```{r, echo=FALSE, fig.width=5, fig.height=5}
h <- seq(-3, 3, length.out=30)
a <- seq(-5, 5, length.out=30)
z <- outer(h, a, Vectorize(OwenT))
oldpar <- par(mar=c(0,2,0,0))
persp(h, a, z, theta=30, phi=30, expand=0.5, zlab="Owen T", ticktype = "detailed", col = "lightblue")
par(oldpar)
```


# The Owen $Q$-functions

The Owen cumulative functions are based on the two Owen $Q$-functions
$$
Q_1(\nu, t, \delta, R) = \frac{1}{\Gamma\left(\frac{\nu}{2}\right)2^{\frac12(\nu-2)}}
\int_0^R \Phi\left(\frac{tx}{\sqrt{\nu}}-\delta\right)
x^{\nu-1} e^{-\frac{x^2}{2}} \mathrm{d}x
$$
and
$$
Q_2(\nu, t, \delta, R) = \frac{1}{\Gamma\left(\frac{\nu}{2}\right)2^{\frac12(\nu-2)}}
\int_R^\infty \Phi\left(\frac{tx}{\sqrt{\nu}}-\delta\right)
x^{\nu-1} e^{-\frac{x^2}{2}} \mathrm{d}x.
$$

They are implemented in the `OwenQ` package under the respective names `OwenQ1` and `OwenQ2` (for integer values of $\nu$). 


# Application: Equal-tailed tolerance interval

Consider the statistical model given by a sample 
$y_i \sim_{\text{iid}} {\cal N}(\mu, \sigma^2)$ for $i=1, \ldots, n$. 

An equal-tailed $(p,\alpha)$-tolerance interval is an interval containing at least $100p\%$ of the "center data" with $100(1-\alpha)\%$ confidence. 

The natural choice for such an interval is, as shown by Owen (1965),
$$
\bigl[\bar y - k_e \hat\sigma, \bar y + k_e \hat\sigma]
$$
where $k_e$ satisfies 
$$
O_2(n-1, k_e\sqrt{n}, -k_e\sqrt{n}, \delta, -\delta) = 1-\alpha
$$
where $\delta = \sqrt{n}z_{(1+p)/2}$. 

Therefore, the tolerance factor $k_e$ can be determined with the help of the `powen2` function of the `OwenQ` package. 
But it is more efficient to use the function `spowen2` for this purpose; 
`spowen2(nu, t, delta)` has the same value as `powen2(nu, t, -t, delta, -delta)` but it is evaluated more efficiently.  

```{r}
p <- 0.9; alpha <- 0.05
n <- 100
delta <- sqrt(n)*qnorm((1+p)/2)
uniroot(function(ke) spowen2(n-1, ke*sqrt(n), delta) - (1-alpha), 
        lower=qnorm(1-alpha), upper=5, extendInt = "upX", tol=1e-9)$root
```

The $k_e$ factors are tabulated in Krishnamoorthy & Mathew's book.


# Internal functions: numerical integration

The `OwenQ` package provides three internal functions, `iOwenQ1`, `iOwenQ2` and `ipowen4`, which respectively perform the evaluation of $Q_1$, $Q_2$ and $O_4$ by numerical integration using the `RcppNumerical` package. 
They can be called with a positive non-integer value of $\nu$. 
The evaluation of $O_4$ with `ipowen4` is correct only for $t_1 > t_2$ and $\delta_1 > \delta_2$. 


# References

- Owen, D. B. (1965). *A special case of a bivariate noncentral t-distribution.* Biometrika 52, 437-446.

- Krishnamoorthy & Mathew (2009). *Statistical Tolerance Regions*. Wiley. 
