---
title: "Manipulate Competition Results"
author: "Evgeni Chasnovski"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Manipulate Competition Results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette will describe `comperes` functionality for manipulating (summarising and transforming) competition results (hereafter - results):

- Computation of item summaries.
- Computation of Head-to-Head values and conversion between its formats.
- Creating pairgaimes.

We will need the following packages:

```{r library, warning = FALSE}
library(comperes)
library(dplyr)
library(rlang)
```

Example results in long format:

```{r cr_long}
cr_long <- tibble(
  game   = c("a1", "a1", "a1", "a2", "a2", "b1", "b1", "b2"),
  player = c(1, NA, NA, 1, 2, 2, 1, 2),
  score  = 1:8,
  season = c(rep("A", 5), rep("B", 3))
) %>%
  as_longcr()
```

Functions discussed in these topics leverage `dplyr`'s grammar of data manipulation. Only basic knowledge is enough to use them. Also a knowledge of `rlang`'s quotation mechanism is preferred.

## Item summaries

Item summary is understand as some summary measurements (of arbitrary nature) of item (one or more columns) present in data. To compute them, `comperes` offers `summarise_*()` family of functions in which summary functions should be provided as in `dplyr::summarise()`. Basically, they are wrappers for grouped summarise with forced ungrouping, conversion to `tibble` and possible adding prefix to summaries. **Note** that if one of columns in item is a factor with implicit `NA`s (present in vector but not in levels), there will be a warning suggesting to add `NA` to levels. This is due to `group_by()` functionality in `dplyr` after 0.8.0 version.

Couple of examples:

```{r item-summary}
cr_long %>% summarise_player(mean_score = mean(score))

cr_long %>% summarise_game(min_score = min(score), max_score = max(score))

cr_long %>% summarise_item("season", sd_score = sd(score))
```

For convenient transformation of results there are `join_*_summary()` family of functions, which compute respective summaries and join them to original data:

```{r item-summary-join}
cr_long %>%
  join_item_summary("season", season_mean_score = mean(score)) %>%
  mutate(score = score - season_mean_score)
```

For common summary functions `comperes` has a list `summary_funs` with `r length(summary_funs)` quoted expressions to be used with `rlang`'s unquoting mechanism:

```{r summary_funs}
# Use .prefix to add prefix to summary columns
cr_long %>%
  join_player_summary(!!!summary_funs[1:2], .prefix = "player_") %>%
  join_item_summary("season", !!!summary_funs[1:2], .prefix = "season_")
```

## Head-to-Head values

Head-to-Head value is a summary statistic of direct confrontation between two players. It is assumed that this value can be computed based only on the players' __matchups__, data of actual participation for ordered pair of players in one game.

To compute matchups, `comperes` has `get_matchups()`, which returns a `widecr` object with all matchups actually present in results (including matchups of players with themselves). __Note__ that missing values in `player` column are treated as separate players. It allows operating with games where multiple players' identifiers are not known. However, when computing Head-to-Head values they treated as single player. Example:

```{r matchups}
get_matchups(cr_long)
```

Head-to-Head values can be stored in two ways:

- __Long__, a `tibble` with columns `player1` and `player2` which identify ordered pair of players, and columns corresponding to Head-to-Head values. Computation is done with `h2h_long()` which returns an object of class `h2h_long`. Head-to-Head functions are specified as in `dplyr`'s grammar __for results matchups__:

```{r h2h_long}
cr_long %>%
  h2h_long(
    abs_diff = mean(abs(score1 - score2)),
    num_wins = sum(score1 > score2)
  )
```

- __Matrix__, a matrix where rows and columns describe ordered pair of players and entries - Head-to-Head values. This allows convenient storage of only one Head-to-Head value. Computation is done with `h2h_mat()` which returns an object of class `h2h_mat`. Head-to-Head functions are specified as in `h2h_long()`:

```{r h2h_mat}
cr_long %>% h2h_mat(sum_score = sum(score1 + score2))
```

`comperes` also offers a list `h2h_funs` of `r length(h2h_funs)` common Head-to-Head functions as quoted expressions to be used with `rlang`'s unquoting mechanism:

```{r h2h_funs}
cr_long %>% h2h_long(!!!h2h_funs)
```

To compute Head-to-Head for only subset of players or include values for players that are not in the results, use factor `player` column. __Notes__:
 
- You can use `fill` argument to replace `NA`s in certain columns after computing Head-to-Head values.
- As Head-to-Head functions use `summarise_item()`, there will be a warning in case of implicit `NA`s in factor columns.

```{r h2h-factors}
cr_long_fac <- cr_long %>%
  mutate(player = factor(player, levels = c(1, 2, 3)))

cr_long_fac %>%
  h2h_long(abs_diff = mean(abs(score1 - score2)),
           fill = list(abs_diff = -100))

cr_long_fac %>%
  h2h_mat(mean(abs(score1 - score2)),
          fill = -100)
```

### Conversion

To convert between long and matrix formats of Head-to-Head values, `comperes` has `to_h2h_long()` and `to_h2h_mat()` which convert from matrix to long and from long to matrix respectively. __Note__ that output of `to_h2h_long()` has `player1` and `player2` columns as characters. Examples:

```{r h2h-conversion}
cr_long %>% h2h_mat(mean(score1)) %>% to_h2h_long()

cr_long %>%
  h2h_long(mean_score1 = mean(score1), mean_score2 = mean(score2)) %>%
  to_h2h_mat()
```

All this functionality is powered by useful outside of `comperes` functions `long_to_mat()` and `mat_to_long()`. They convert general pair-value data between long and matrix format:

```{r convert-pair-value}
pair_value_long <- tibble(
  key_1 = c(1, 1, 2),
  key_2 = c(2, 3, 3),
  val = 1:3
)

pair_value_mat <- pair_value_long %>%
  long_to_mat(row_key = "key_1", col_key = "key_2", value = "val")
pair_value_mat

pair_value_mat %>%
  mat_to_long(
    row_key = "key_1", col_key = "key_2", value = "val",
    drop = TRUE
  )
```


## Pairgames

For some ranking algorithms it crucial that games should only be between two players. `comperes` has function `to_pairgames()` for this. It removes games with one player. Games with three and more players `to_pairgames()` splits into __separate games__ between unordered pairs of different players without specific order. __Note__ that game identifiers are changed to integers but order of initial games is preserved. Example:

```{r pairgames}
to_pairgames(cr_long)
```
