---
title: "Evaluate a Distribution"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Evaluate a Distribution}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(distionary)
```

This vignette covers the second goal of `distionary`: to evaluate probability distributions, even when that property is not specified in the distribution's definition.

## Distributional Representations

A _distributional representation_ is a mathematical function that completely defines a probability distribution. Unlike a simple property (such as the mean or variance), a representation contains enough information that any other property or representation can be calculated from it.

The key innovation in `distionary` is that these representations are interconnected through a network of relationships, allowing you to specify a distribution using any available representation and automatically derive others as needed. For example, if you specify only a CDF, `distionary` can compute the quantile function, mean, variance, and other properties.

Here is a list of representations recognised by `distionary`, and the functions for accessing them. 

| Representation                   | `distionary` Functions                  |
|----------------------------------|-----------------------------------------|
| Cumulative Distribution Function | `eval_cdf()`,      `enframe_cdf()`      |
| Survival Function                | `eval_survival()`, `enframe_survival()` |
| Quantile Function                | `eval_quantile()`, `enframe_quantile()` |
| Hazard Function                  | `eval_hazard()`,   `enframe_hazard()`   |
| Cumulative Hazard Function       | `eval_chf()`,      `enframe_chf()`      |
| Probability density Function     | `eval_density()`,  `enframe_density()`  |
| Probability mass Function (PMF)  | `eval_pmf()`,      `enframe_pmf()`      |
| Odds Function                    | `eval_odds()`,     `enframe_odds()`     |
| Return Level Function            | `eval_return()`,   `enframe_return()`   |

All representations can either be accessed by the `eval_*()` family of functions, providing a vector of the evaluated representation.

```{r}
d1 <- dst_geom(0.6)
eval_pmf(d1, at = 0:5)
```

Alternatively, the `enframe_*()` family of functions provides the results in a tibble or data frame paired with the inputs, useful in a data wrangling workflow.

```{r}
enframe_pmf(d1, at = 0:5)
```

The `enframe_*()` functions allow for insertion of multiple distributions, placing a column for each distribution. The column names can be changed in three ways:

1. The input column `.arg` can be renamed with the `arg_name` argument.
2. The `pmf` prefix on the evaluation columns can be changed with the `fn_prefix` argument.
3. The distribution names can be changed by assigning name-value pairs for the input distributions.

Let's practice this with the addition of a second distribution.

```{r}
d2 <- dst_geom(0.4)
enframe_pmf(
  model1 = d1, model2 = d2, at = 0:5,
  arg_name = "num_failures", fn_prefix = "probability"
)
```

## Drawing a random sample

To draw a random sample from a distribution, use the `realise()` or `realize()` function:

```{r}
set.seed(42)
realise(d1, n = 5)
```

You can read this call as "realise distribution `d` five times". By default, `n` is set to 1, so that realising converts a distribution to a numeric draw:

```{r}
realise(d1)
```

While random sampling falls into the same family as the `p*/d*/q*/r*` functions from the `stats` package (e.g., `rnorm()`), this function is not a distributional representation, hence does not have a `eval_*()` or `enframe_*()` counterpart. This is because it's impossible to perfectly describe a distribution based on a sample. 

## Properties of Distributions

`distionary` distinguishes between _distributional representations_ (which fully define a distribution) and _distributional properties_ (which are characteristics that can be computed from representations). 

A distribution _property_ is any measurable characteristic that can be calculated from a distribution's representation. Unlike representations, properties do not contain enough information to fully reconstruct the distribution. For example, knowing the mean and variance of a distribution doesn't tell you whether it's a Normal, Gamma, or some other distribution family. Properties include statistical moments and other summary measures.

Below is a table of the properties incorporated in `distionary`, and the corresponding functions for accessing them.

| Property | `distionary` Function |
|----------|---------------------|
| Mean                       | `mean()` |
| Median                     | `median()` |
| Variance                   | `variance()` |
| Standard Deviation         | `sd()` |
| Skewness                   | `skewness()` |
| Excess Kurtosis            | `kurtosis_exc()` |
| Kurtosis                   | `kurtosis()` |
| Range                      | `range()` |

Here's the mean and variance of our original distribution.

```{r}
mean(d1)
variance(d1)
```
