---
title: "Tutorial of R package gpyramid"
output: rmarkdown::html_vignette
author: Shoji Taniguchi
vignette: >
  %\VignetteIndexEntry{Tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## 1. Set up

```{r setup}
library(gpyramid)
library(ape)
library(dplyr)
```

## 2. Prepare data

### 2.1 Gene data

```{r r1}
line_df <- data.frame(line = c("x1", "x2", "x3", "x4", "x5", "x6"),
                      gene1 = c("H", "H", "A", "B", "B", "B"),
                      gene2 = c("H", "H", "A", "B", "B", "A"),
                      gene3 = c("H", "H", "B", "A", "A", "A"),
                      gene4 = c("H", "B", "B", "B", "A", "B"),
                      gene5 = c("H", "H", "B", "A", "B", "B"),
                      gene6 = c("B", "A", "B", "A", "B", "B"),
                      gene7 = c("B", "B", "B", "B", "B", "A"))

line_df
```

### 2.2 Position data

```{r}
position_df <- data.frame(Gene = c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6", "gene7"),
                          Chr = c("1", "2", "3", "4", "5", "6", "7"),
                          cM = c(20, 0, 40, 20, 10, 0, 0))
position_df
```


### 2.3 Preprosessing

#### Generate haplotype dataframe from row data

```{r r3}
gene_dat <- util_haplo(line_df, target = "A", non_target = "B", hetero = "H", line_cul = "line")

gene_df1 <- gene_dat[[1]]
gene_df2 <- gene_dat[[2]]
line_id <- gene_dat[[3]]

colnames(gene_df1) <- line_id
colnames(gene_df2) <- line_id

gene_df1
gene_df2
```

#### Generate recombination probability matrix from raw data

```{r}
recom_mat <- util_recom_mat(position_df, "cM")
recom_mat
```

## 3. Find parent sets from candidate lines (cultivars)

From candidate lines, `findPset` function returns the parent sets for gene pyramidding.

In this example, there are 4 sets for gene pyramidding.

```{r}
line_comb_lis <- findPset(gene_df1, gene_df2, line_id)
line_comb_lis
```

## 4. Calculate the number of necessary individuals and generations

### 4.1 Calculate cost of all the crossing schemes

`calcCostAll` function calculates the number of necessary individuals and generations as the crossing cost for all the crossing schemes.

Given parent sets for gene pyramidding, `calCostAll` function simulates all the crossing schemes and calculates the number of necessary individuals and generations as the cost of gene pyramidding.

`calcCostAll` function returns the `gpyramid_all` object, which contains information of all the crossing schemes.

Here, `getFromAll` function get one crossing scheme from `gpyramid_all` object.


```{r}
cost_rslt <- calcCostAll(line_comb_lis, gene_df1, gene_df2, recom_mat, prob_total = 0.99)
cost_rslt
```

### 4.2 Plot cost of each crossing scheme

The output of `calCostAll` function contains the cost of each crossing scheme.

Here, we generate a plot of the number of necessary plants and generations.

```{r}
cost_all <- cost_rslt$cost_all
plot(x = cost_all$n_parent, y = cost_all$N_total, 
     log = "y", xlab = "Number of generations", ylab = "Number of individuals")
```

### 4.3 Select the most cost-effective crossing strategy

```{r}
cost_all[cost_all$N_total < 200,]
```

```{r}
rslt_one <- getFromAll(cost_rslt, cross_id = 13)
summary(rslt_one)
plot(rslt_one$topolo)
nodelabels()
```

### 4.4 Another example

```{r}
rslt_one <- getFromAll(cost_rslt, cross_id = 7)
summary(rslt_one)
plot(rslt_one$topolo)
nodelabels()
```