---
title: "`epicontacts`: Manipulation, Visualisation and Analysis of Epidemiological Contact Data "
date: "`r Sys.Date()`"
output: 
    rmarkdown::html_vignette:
        toc: true
toc_depth: 4
vignette: >
    %\VignetteIndexEntry{epicontacts overview}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

```{r init, include=F}
library(knitr)
opts_chunk$set(message=FALSE, warning=FALSE, eval=TRUE, echo=TRUE)
```

## Introduction   

`epicontacts` aims to facilitate manipulation, visualisation and analysis of
epidemiological contact data. Such datasets inherently have network components,
in which nodes are typically cases and reported contacts or exposures are
(directed or undirected) edges. This package provides a convenient data
structure as well as functionality specific to handle these data.

```{r}
library(outbreaks)
library(epicontacts)
```

## Loading Data

`epicontacts` provides a convenient structure to store heterogeneous
epidemiological contact network data (i.e. nodes and edges) in a single
object. The `epicontacts` class must contain two components: a line list and a
contact dataset.

Each row of the line list should represent unique observations of cases, and
each row of the contact list should represent unique pairs of contacts. Each can
include arbitrary features, but both datasets should share an identification
scheme.

### Example Dataset

The example that follows will use the `mers_korea_2015`, which is a dataset (in
*list* format) distributed in the *outbreaks* package.

```{r}
str(mers_korea_2015)
```

What features are in the line list?

```{r}
colnames((mers_korea_2015$linelist))
```

What about the contact dataset?

```{r}
colnames((mers_korea_2015$contacts))
```



### Creating `epicontacts` Object

In order to create the `epicontacts` object, both the line list and contact data
frames must be passed to `make_epicontacts()`. This function accommodates
instances when the respective identifiers are not the first columns of these
data frames (see the "id", "from" and "to" arguments). `make_epicontacts()` can
also account for contact networks that have a direction (see "directed"
argument).

```{r}
merskor15 <- make_epicontacts(linelist = mers_korea_2015$linelist,
                              contacts = mers_korea_2015$contacts, 
                              directed = FALSE)
class(merskor15)
summary(merskor15)
```

## Data Manipulation

### Access Unique Identifiers

The `summary()` method above provided counts for the number unique cases in both
the contact and line list. The `get_id()` function retrieves similar information
but as vectors of identifiers. This can be parameterized as follows:

- **linelist**: IDs in the line list data
- **contacts**: IDs in the contact dataset ("from" and "to" combined)
- **from**: IDs in the "from" column of contact datset
- **to** IDs in the "to" column of contact dataset
- **all**: IDs that appear anywhere in either dataset
- **common**: IDs that appear in both contacts dataset and line list
    
What are the first ten IDs in the contacts dataset?
```{r}  
contacts_ids <- get_id(merskor15, "contacts")
head(contacts_ids, n = 10)
```

How many IDs are common to both?

```{r}
length(get_id(merskor15, "common"))
```

### Subsetting

The `subset()` method for `epicontacts` objects allows for, among other things,
pruning of networks based on values of node and edge attributes. These values
must be passed as named lists to the respective argument.

```{r}
subset(merskor15, node_attribute = list("outcome" = "Dead", "sex" = "M"), 
       edge_attribute = list("exposure" = "Emergency room"))
```

In addition to subsetting by node and edge attributes, networks can be pruned to
only include components that are connected to certain nodes. The "id" argument
takes a vector of nodes and returns the line list of individuals that "touch"
those IDs.

```{r}
nodes <- c("SK_14","SK_145")                  
subset(merskor15, cluster_id = nodes)
```

The `subset()` method for `epicontacts` objects also accepts cluster size
parameters (see "cs", "cs_min" and "cs_max" arguments).
    
```{r}   
subset(merskor15, cs = 3)
subset(merskor15, cs_min = 10, cs_max = 100)
```

## Visualisation

### Default Plotting Method

One of the main features of `epicontacts` is its visualisation capabilities. As
a default, the package uses interactive plotting based on the `visNetwork`
package[^2]. This interactivity is particularly useful for visualising large
datasets.

```{r}
plot(merskor15) 
```

The above is a generic method based on the `vis_epicontacts()` and accepts a
number of arguments to customize the plot appearance and functionality. For a
full list of options use `?vis_epicontacts()`. For instance, one can customize
nodes using colors and icons:
```{r}
plot(merskor15, "place_infect", node_shape = "sex",
     shapes = c(M = "male", F = "female")) 
```
See `codeawesome` to see available shapes.


Alternatively, the `method` used
for plotting can be `graph3D`, in which case a 3-dimensional graph will be used
(see below).


### 3D plots

`epicontacts` loads the `threejs` package to enable 3D visualisation tools with
the `graph3D()` function[^3].

```{r}
graph3D(merskor15, node_color = "sex", g_title = "MERS Korea 2014")
```

To interact with the plot:

- **zoom**: scrollwheel
- **rotate**: left-mouse button + move
- **pan**: right-mouse button + move
- **identify node by label**: mouse over 





## Analysis

### Extract Characteristics of Pairwise Nodes

The `get_pairwise()` function allows processing of variable(s) in the line list
according to each pair in the contact dataset. For the following example, date
of onset of disease is extracted from the line list in order to compute the
difference between disease date of onset for each pair. The value that is
produced from this comparison represents the **serial interval (si)**.

```{r} 
si <- get_pairwise(merskor15, "dt_onset")   
summary(si)
hist(si, col="grey", border="white", xlab="Days after symptoms",
     main="MERS Korea 2014\nSerial Interval")
```

The `get_pairwise()` will interpret the class of the column being used for
comparison, and will adjust its method of comparing the values accordingly. For
numbers and dates (like the **si** example above), the function will subtract
the values. When applied to columns that are characters or categorical,
`get_pairwise()` will paste values together. Because the function also allows
for arbitrary processing (see "f" argument), these discrete combinations can be
easily tabulated and analyzed.
    
```{r} 
head(get_pairwise(merskor15, "sex"), n = 10)
get_pairwise(merskor15, "sex", f=table)
fisher.test(get_pairwise(merskor15, "sex", f=table)) 
```

### Identify Clusters

The `get_clusters()` function can be used for to identify connected components
in an `epicontacts` object. Here, we illustrate its use to study contact
patterns in a simulated Ebola outbreak. First, we use it to retrieve
`data.frame` containing the cluster information:

```{r, get_clusters}
x <- make_epicontacts(ebola_sim$linelist, ebola_sim$contacts,
                      id = "case_id", to = "case_id", from = "infector",
                      directed = TRUE)
x

clust <- get_clusters(x, output = "data.frame")
class(clust)
dim(clust)
table(clust$cluster_size)
barplot(table(clust$cluster_size),
        main = "Cluster size distribution",
	xlab = "Cluster size",
	ylab = "Frequency")

```

Let us look at the largest clusters. For this, we add cluster information to the
`epicontacts` object, and then subset it:

```{r, get_clusters2}

x <- get_clusters(x)
x_14 <- subset(x, cs = 14)
plot(x_14, "cluster_member")

```




### Calculate Degree

The degree of a node corresponds to its number of edges or connections to other
nodes. `get_degree()` provides an easy method for calculating this value for
`epicontacts` networks. A high degree in this context indicates an individual
who was in contact with many others.

**nb** use of "type" argument depends on whether or not the network is directed. 

```{r}  
deg_both <- get_degree(merskor15, "both", only_linelist = TRUE)
```

Which individuals have the ten most contacts?

```{r}
head(sort(deg_both, decreasing = TRUE), 10)
```

What is the mean number of contacts?

```{r}
mean(deg_both)
```

## References

[^1]: https://github.com/reconhub/epicontacts
[^2]: https://cran.r-project.org/package=visNetwork
[^3]: http://bwlewis.github.io/rthreejs/graphjs.html
