---
title: "km.support"
author: "Yuxin Qin <yqin08@wm.edu>, Lawrence Leemis <leemis@math.wm.edu>, Heather Sasinowska <hdsasinowska@wm.edu>"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{km.support}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8} 
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

The `km.support` function is part of the [conf](https://CRAN.R-project.org/package=conf) package. The function calculates the the support values for Kaplan and Meier's product–limit estimator[^1]. The Kaplan-Meier product-limit estimator (KMPLE) is used to estimate the survivor function for a data set of positive values in the presence of right censoring, and the support values are all possible values of the KMPLE for a specific sample size.

The `km.support` function finds the support values of the KMPLE for a particular sample size $n$ (the number of items on test) using an induction algorithm[^2]. The support values are returned as a list with two components: numerators and denominators. This allows the user to generate exact fractions.



[^1]: Kaplan, E. L., and Meier, P. (1958), “Nonparametric Estimation from Incomplete Observations,” Journal of the American Statistical Association, 53, 457–481.

[^2]: Qin Y., Sasinowska H. D., Leemis L. M. (2023), “The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator,” The American Statistician, 77 (1), 102–110.



## Installation Instructions

The `km.support` function is accessible following installation of the `conf` package:

```         
install.packages("conf")
library(conf)
```

## Details
The KMPLE is a nonparametric estimate of the survival function
from a data set of lifetimes that includes right-censored 
observations and is used in a variety of application areas. 
For simplicity, we will refer to the object of interest
generically as the item and the event of interest
as the failure.

Let $n$ denote the number of items on test.
The KMPLE of the survival function $S(t)$ is given by
$$
\hat{S}(t) =
\prod\limits_{i:t_i \leq t}\left( 1 - \frac{d_i}{n_i}\right),
$$
for $t \ge 0$,
where $t_1, \, t_2, \, \ldots, \, t_k$ are the times when 
at least one failure is observed ($k$ is an integer between
1 and $n$, which is the number of distinct failure times in
the data set),
$d_1, \, d_2, \, \ldots, \, d_k$ are the
number of failures observed at times
$t_1, \, t_2, \, \ldots, \, t_k$, 
and 
$n_1, \, n_2, \, \ldots, \, n_k$ are
the number of items at risk just prior to times 
$t_1, \, t_2, \, \ldots, \, t_k$.
It is common practice to have the KMPLE ''cut off'' after
the largest time recorded if it corresponds to a 
right-censored observation[^3]. The KMPLE drops to zero after the
largest time recorded if it is a failure;
the KMPLE is undefined, however, after the
largest time recorded if it is a right-censored
observation.

The support values in `km.support` are the calculated from $\hat{S}(t)$ at any $t \ge 0$ 
for all possible outcomes of an experiment with $n$ items on test. The function has only the sample size $n$ as its argument.

[^3]: Kalbfleisch, J. D., and Prentice, R. L. (2002),
The Statistical Analysis of Failure Time Data (2nd ed.),
Hoboken, NJ: Wiley. 

## Example

To illustrate a simple case, consider the KMPLE for one particular experiment when there are 
$n = 4$ items on test, failures occur at times $t = 1$ and $t = 3$,
and right censorings occur at times $t = 2$ and $t = 4$. In
this setting, the KMPLE is 

\begin{equation*}
  \hat{S}(t) =
  \begin{cases}
    1 & \qquad 0 \le t < 1 \\
    \left(1 - \frac{1}{4}\right) =
    \frac{3}{4} & \qquad 1 \leq t < 3 \\
    \left(1 - \frac{1}{4}\right)
    \left(1 - \frac{1}{2}\right) =
    \frac{3}{8} & \qquad 3 \leq t < 4 \\
    \text{NA} & \qquad t \geq 4,
  \end{cases}
\end{equation*}

\noindent where NA indicates that the KMPLE is undefined.


The KMPLE in this experiment has 3 of the 8 support values that are produced by `km.support`, as can be seen in the output below, and NA. The NA's will not be displayed in the output.

```{r, fig.width = 4, fig.height = 4, fig.show = 'hold'}
library(conf)
#  display unsorted numerators and denominators of support values for n = 4
n = 4
s = km.support(n)
s
#  display sorted support values for n = 4 as decimals
sort(s$num / s$den)
#  display sorted support values for n = 4 as exact fractions
i <- order(s$num / s$den)
m <- length(s$num)
f <- ""
for (j in i[2:(m - 1)]) f <- paste(f, s$num[j], "/", s$den[j], ", ", sep = "")
cat(paste("The ", m, " support values for n = ", n, " are: 0, ", f, "1.\n", sep = ""))

``` 

Consider the another KMPLE for a different outcome of the same experiment when there are $n = 4$ items on test. This time we observe 4 failures at times $t = 1,2,3,4$. 
In
this setting, the KMPLE is 

\begin{equation*}
  \hat{S}(t) =
  \begin{cases}
    1 & \qquad 0 \le t < 1 \\
    \left(1 - \frac{1}{4}\right) =
    \frac{3}{4} & \qquad 1 \leq t < 2 \\
    \left(1 - \frac{1}{4}\right)
    \left(1 - \frac{1}{3}\right) =
    \frac{1}{2} & \qquad 2 \leq t < 3 \\
    \left(1 - \frac{1}{4}\right)
    \left(1 - \frac{1}{3}\right)
    \left(1 - \frac{1}{2}\right) =
    \frac{1}{4} & \qquad 3 \leq t < 4 \\
    \text{0} & \qquad t \geq 4,
  \end{cases}
\end{equation*}


The KMPLE in this experiment has 5 (3 new ones: 1/2, 1/4, and 0) of the 8 support values that are produced by `km.support`, as can be seen in the output above. By looking at all possible combinations of outcomes, the remaining support values will be found.


## Package Notes

The function `km.support` is also called from the functions `km.pmf` and `km.surv`, which are also part of the [conf](https://CRAN.R-project.org/package=conf) package.