---
title: "Introduction"
output: rmarkdown::html_vignette
author: Shoji Taniguchi
vignette: >
  %\VignetteIndexEntry{Introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## 0. Introduction to gpyramid package

R package `gpyramid` has been designed for gene pyrammiding in plant breeding.

The gene pyramidding was formulated by Servin et al. (2004) <DOI: 10.1534/genetics.103.023358>.

This document describes how to conduct the same calculation as Servin et al. (2004) in the R environment.

## 1. Set up

```{r setup}
library(gpyramid)
library(ape)
library(dplyr)
```

## 2. Prepare data

### 2.1 Gene data

```{r}
line_df <- data.frame(line = c("x1", "x2", "x3", "x4"),
                      gene1 = c("A", "B", "B", "B"),
                      gene2 = c("B", "A", "B", "B"),
                      gene3 = c("B", "B", "A", "B"),
                      gene4 = c("B", "B", "B", "A"))

line_df
```

### 2.2 Position data

```{r}
position_df <- data.frame(Gene = c("g1", "g2", "g3", "g4"),
                          Chr = c("1A", "1A", "1A", "1A"),
                          cM = c(0, 20, 40, 60))
position_df
```

### 2.3 Preprosessing

#### Generate haplotype dataframe from row data

```{r}
gene_dat <- util_haplo(line_df, target = "A", non_target = "B", hetero = "H", line_cul = "line")

gene_df1 <- gene_dat[[1]]
gene_df2 <- gene_dat[[2]]
line_id <- gene_dat[[3]]

colnames(gene_df1) <- line_id
colnames(gene_df2) <- line_id

gene_df1
gene_df2
```

#### Generate recombination probability matrix from raw data

```{r}
recom_mat <- util_recom_mat(position_df, "cM")
recom_mat
```

## 3. Find parent sets from candidate lines (cultivars)

Fron candidate lines, `findPset` function returns the parent sets for gene pyramidding.

In this example, only one parent set was returned.

```{r}
line_comb_lis <- findPset(gene_df1, gene_df2, line_id)
line_comb_lis
```

## 4. Calculate the number of necessary individuals and generations

`calcCostAll` function calculates the number of necessary individuals and generations as the crossing cost for all the crossing schemes.

Given parent sets for gene pyramidding, `calCostAll` function simulates all the crossing schemes and calculates the number of necessary individuals and generations as the cost of gene pyramidding.

`calcCostAll` function returns the `gpyramid_all` object, which contains information of all the crossing schemes.

Here, `getFromAll` function get one crossing scheme from `gpyramid_all` object.

```{r}
rslt <- calcCostAll(line_comb_lis, gene_df1, gene_df2, recom_mat, 
                    prob_total = 0.99, last_cross = T, last_selfing = T)
rslt$cost_all
```

### 4.1 Fig 4a (Servin et al., 2004)

Fig 4a in Servin et al. (2004) corresponds to `cross_id = 15` in the above `gpyramid_all` object.

```{r fig.width=4, fig.height=4}
rslt_one <- getFromAll(rslt, cross_id = 15)
summary(rslt_one)
plot(rslt_one$topolo)
nodelabels()
```

### 4.2 Fig 4b (Servin et al., 2004)

Fig 4b in Servin et al. (2004) corresponds to `cross_id = 6` in the above `gpyramid_all` object.


```{r fig.width=4, fig.height=4}
rslt_one <- getFromAll(rslt, cross_id = 6)
summary(rslt_one)
plot(rslt_one$topolo)
nodelabels()
```

### 4.3 Fig 4c (Servin et al., 2004)

Fig 4c in Servin et al. (2004) corresponds to `cross_id = 13` in the above `gpyramid_all` object.

```{r fig.width=4, fig.height=4}
rslt_one <- getFromAll(rslt, cross_id = 13)
summary(rslt_one)
plot(rslt_one$topolo)
nodelabels()
```

