---
title: "Details of the incidence class"
author: "Thibaut Jombart, Zhian N. Kamvar"
date: "`r Sys.Date()`"
output:
   rmarkdown::html_vignette:
     toc: true
     toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Incidence class}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---




```{r, options, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  fig.width=7, 
  fig.height=5 
)

```

This vignette details the structure of *incidence* objects, as produced by the
`incidence` function.

<br>

# Structure of an *incidence* object.

We generate a toy dataset of dates to examine the content of *incidence* objects.

```{r, data}
library(incidence)
set.seed(1)
dat <- sample(1:50, 200, replace = TRUE, prob = 1 + exp(1:50 * 0.1))
sex <- sample(c("female", "male"), 200, replace = TRUE)
```

The incidence by 48h period is computed as:

```{r, i}
i <- incidence(dat, interval = 2)
i
plot(i)
```

We also compute incidence by gender:

```{r, sex}
i.sex <- incidence(dat, interval = 2, group = sex)
i.sex
plot(i.sex)
```


The object `i` is a `list` with the class *incidence*:

```{r, names}
class(i)
is.list(i)
names(i)
```

Items in `i` can be accessed using the same indexing as any lists, but it's
safer to use the accessors for each item:

```{r, access}
## use name
head(i$dates)

head(get_dates(i))
```

In the following sections, we examine each of the components of the object.


## `$dates`

The `$dates` component contains a vector for all the dates for which incidence
have been computed, in the format of the input dataset (e.g. `Date`, `numeric`,
`integer`).

```{r, dates1}
date_bins <- get_dates(i)
class(date_bins)
class(dat)

date_bins
```

The dates correspond to the lower bounds of the time intervals used as bins for
the incidence. Bins always include the lower bound and exclude the upper bound.
In the example provided above, this means that the first bin counts events that
happened at day 5-6, the second bin counts events from 7-8, etc.

Note that if we had actual `Date`-class dates, they would be returned as dates

```{r date-dates1}
dat_Date <- as.Date("2018-10-31") + dat
head(dat_Date)
i.date <- incidence(dat_Date, interval = 2, group = sex)
i.date
get_dates(i.date)
class(get_dates(i.date))
```

These can be converted to integers, counting the number of days from the first
date.

```{r get-dates-integer}
get_dates(i.date, count_days = TRUE)
get_dates(i, count_days = TRUE)
```

To facilitate modelling, it's also possible to get the center of the interval by
using the `position = "center"` argument:

```{r get-dates-center}
get_dates(i.date, position = "center")
get_dates(i.date, position = "center", count_days = TRUE)
```


## `$counts`

The `$counts` component contains the actual incidence, i.e. counts of events
for the defined bins. It is a `matrix` of `integers` where rows correspond to
time intervals, with one column for each group for which incidence is computed
(a single, unnamed column if no groups were provided). If groups were provided,
columns are named after the groups. We illustrate the difference comparing the
two objects `i` and `i.sex`:

```{r, counts1}
counts <- get_counts(i)
class(counts)
storage.mode(counts)

counts
get_counts(i.sex)
```

You can see the dimensions of the incidence object by using `dim()`, `ncol()`,
and `nrow()`, which returns the dimensions of the counts matrix:

```{r counts1.1}
dim(get_counts(i.sex))
dim(i.sex)
nrow(i.sex) # number of date bins
ncol(i.sex) # number of groups
```


There are also accessors for handling groups:

```{r groups}
# Number of groups
ncol(i.sex)
ncol(i)

# Names of groups
group_names(i.sex)
group_names(i)

# You can also rename the groups
group_names(i.sex) <- c("F", "M")
group_names(i.sex)
```


Note that a `data.frame` containing *dates* and *counts* can be obtained using
`as.data.frame`:

```{r, as.data.frame}
## basic conversion
as.data.frame(i)
as.data.frame(i.sex)

## long format for ggplot2
as.data.frame(i.sex, long = TRUE)
```

Note that `incidence` has an argument called `na_as_group` which is `TRUE` by
default, which will pool all missing groups into a separate group, in which
case it will be a separate column in `$counts`.

## `$timespan`

The `$timespan` component stores the length of the time period covered by the
object:

```{r, timespan}
get_timespan(i)
print(date_range <- range(get_dates(i)))
diff(date_range) + 1
```

## `$interval`

The `$interval` component contains the length of the time interval for the bins:

```{r, interval}
get_interval(i)
diff(get_dates(i))
```

## `$n`

The `$n` component stores the total number of events in the data:

```{r, n}
get_n(i)
```

Note that to obtain the number of cases by groups, one can use:

```{r, n2}
colSums(get_counts(i.sex))
```

## `$weeks`

The `$weeks` component is optional, and used to store
[aweek](https://www.repidemicsconsortium.org/aweek/) objects whenever they have
been used. Weeks are used by default when weekly incidence is computed from
dates (see argument `standard` in `?incidence`).

```{r, isoweek}
library(outbreaks)
dat <- ebola_sim$linelist$date_of_onset
i.7 <- incidence(dat, "1 epiweek", standard = TRUE)
i.7
i.7$weeks
```

Because `$weeks` is an optional element, it does not have a dedicated
accessor. If the element is not present, attempting to access it will result in
a `NULL`:

```{r isoweek-null}
i$weeks
```

Both dates and weeks are returned when converting an `incidence` object to
`data.frame`:

```{r, isoweek3}
head(as.data.frame(i.7))
```

