---
title: "How to use the geneticae package"
subtitle: "Statistical Tools for the Analysis of Multi-Environment Agronomic Trials"
author: |
    | Julia Angelini
    | Marcos Prunello
    | Gerardo Cervigni
    |
    | Centro de Estudios Fotosintéticos y Bioquímicos
    | Universidad Nacional de Rosario
    | Rosario, Argentina
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{How to use the geneticae package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(warning = F, message = F, out.width = "60%")
library(dplyr)
```

<style type="text/css">

body{
text-align: justify}
</style>
  
  
<!-- devtools::build_vignettes() -->
# Getting Started

__Installing the package.__ To install the released version of `geneticae` from [CRAN](https://CRAN.R-project.org):

```{r, eval=F}
install.packages("geneticae")
```

You can install the development version from our [GitHub repo](https://github.com/jangelini/geneticae) with:

``` {r, eval=F}
# install.packages("devtools")
devtools::install_github("jangelini/geneticae")
```



__Loading the package.__ Once the `geneticae` package is installed, it needs to be loaded by:

```{r}
library(geneticae)
```

__Help files.__  Detailed information on `geneticae` package functions can be obtained from  help files using `help(package="geneticae")`.
The help file for a function, for example `imputation` can be obtained using `?imputation` or `help(imputation)`.


# Introduction

Understanding the relationship between crops performance and environment is a
key problem for plant breeders and geneticists. In advanced stages of breeding
programs, in which few genotypes are evaluated, multi-environment trials (MET)
are one of the most used experiments. Such studies test a number of genotypes in
multiple environments in order to identify the superior genotypes according to
their performance. In these experiments, crop performance is modeled as a
function of genotype (G), environment (E) and genotype-environment interaction
(GEI). The presence of GEI generates differential genotypic responses in the
different environments (Angelini et al., 2019; Crossa, 1990; Kang and Magari,
1996). Therefore appropriate statistical methods should be used to obtain an
adequate GEI analysis, which is essential for plant breeders (Giauffret et al.,
2000).

The average performance of genotypes through different environments can only be
considered in the absence of GEI (Yan and Kang, 2003). However, GEI is almost
always present and the comparison of the mean performance between genotypes is
not enough. The most widely used methods to analyze MET data are based on
regression models, analysis of variance (ANOVA) and multivariate techniques. In
particular, two statistical models are widely used among plant breeders as they
provide useful graphical tools for the study of GEI: the Additive Main effects
and Multiplicative Interaction model (AMMI) (Kempton, 1984; Gauch, 1988) and the
Site Regression Model (SREG) (Cornelius et al., 1996; Gauch and Zobel, 1997).
However, these models are not always efficient enough to analyze MET data
structure of plant breeding programs. They present serious limitations in the
presence of atypical observations and missing values, which occur very
frequently. To overcome this, several imputation alternatives, a robust AMMI 
(Rodrigues et al., 2016) and SREG alternative (Angelini et al., 2022) were recently
proposed in literature.

The `geneticae` package was created to gather in one place the most useful
functions for this type of analysis and it also implements new methodology which
can be found in recent literature. More importantly, `geneticae` is the first
package to implement the robust AMMI models proposed by Rodrigues et al. (2016),
the robusts SREG proposed by Angelini et al. (2022) and new imputation methods
proposed by Angelini et al. (2024). In addition, there is no need to preprocess
the data to use the `geneticae` package, as it the case of some previous
packages which require a data frame or matrix containing genotype by environment
means  with  the genotypes in rows and the environments in columns. In this
package, data in long format is required. There is no restriction on columns
names of genotypes, environments, repetitions (if any) and phenotypic traits of
interest. Also, extra information that will not be used in the analysis may be
present in the dataset. Finally, `geneticae` offers a wide variety of options to
customize the biplots, which are part of the graphical output of these methods.


# Datasets

The `geneticae` package utilizes two datasets to illustrate the methodology
included to analyse MET data.

* `yan.winterwheat` dataset: yield of 18 winter wheat varieties grown in nine
environments in Ontario at 1993. Although four blocks or replicas in each
environment were performed in the experiment, only yield mean for each variety
and environment combination was available in the dataset obtained from the
[agridat](https://CRAN.R-project.org/package=agridat) package (Wright, 2020).


```{r}
library(agridat)
data(yan.winterwheat)
head(yan.winterwheat)
```

* `plrv` dataset: resistance study to PLRV (Patato Leaf Roll Virus) causing leaf
curl. 28 genotypes were experimented at 6 locations in Peru. Each clone was
evaluated three times in each environment, and yield, plant weight and plot were
registered. This dataset is available in `geneticae` and was obtained from the
[agricolae](https://CRAN.R-project.org/package=agricolae) package (de Mendiburu,
2020).


```{r}
data(plrv)
head(plrv)
```


# Statistical models for multi-environment trials

## AMMI model

The *AMMI model* (Gauch, 1988) is widely used to analyse the effect of GEI.
This model includes two stages. First, an ANOVA is performed to obtain estimates
for the additive main effects of environments and genotypes. Secondly, the
residuals from the ANOVA are arranged in a matrix with genotypes in the rows and
environments in the columns and a singular value decomposition (SVD) is applied
in order to explore patterns related to GEI, still present in the residuals. The
result of the first two multiplicative terms of the SVD is often presented in a
biplot called GE and represents a two-rank approximation of GEI effects.

The `rAMMI()` function returns the GE biplot. Data in long format is required by
this function, i.e. each row corresponds to one observation and each column to
one variable (genotype, environment, repetition (if any) and the observed
phenotype). If each genotype has been evaluated more than once at each
environment, the phenotypic mean for each combination of genotype and
environment is internally calculated and then the model is estimated. Extra
variables that will not be used in the analysis may be present in the dataset.
Missing values are not allowed (but can be imputated, see below).

The GE biplot for `yan.winterwheat` dataset is shown in Figure 1 along with the
sentence used to obtain it. The first argument is the input dataset, then the
names of the columns in which the necessary information to apply the technique
is found and also the model to be obtained are indicated.  Optionally, the
percentage of GEI explained by the GE biplot can be added as a footnote with
`footnote = T`, as well as a tittle with `titles = T`. In this example, BH93,
KE93 and OA93 are the environments that contribute the most to the interaction
as their vectors are the longest ones. The cultivars _m12_ and _Kat_ present
similar interaction patterns (their markers are close to each other in the biplot)
and they are very different from _Ann_ and _Aug_, for example. 
The closeness between the cultivar _Dia_  and the environment BH93 indicates a 
strong positive association between them, which means that BH93 is a 
extremely favorable environment for that genotype. As OA93 and _Luc_ markers are
opposite, this environment is considerably unfavorable for that genotype.
Finally, _Cas_ and _Reb_ are close to the origin, which means that they adapt
equally to all environments.

```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 1: GE biplot based on yield data of 1993 Ontario winter wheat performance trials.  The 71.66% of GE variability is explained by the first two multiplicative terms. Cultivars are shown in lowercase and environments in uppercase.'}

rAMMI_clasic <- rAMMIModel(yan.winterwheat, genotype = "gen", environment = "env", 
      response = "yield", type = "AMMI")

rAMMIPlot(rAMMI_clasic, titles = TRUE, footnote = TRUE)

```

The AMMI model, in its standard form, assumes that no outliers are present in
the data. To overcome the problem of data contamination with outlying
observations, Rodrigues et al. (2016) proposed five robust AMMI models, which
can be obtained in two stages: (i) fitting a robust regression model with an
M-Huber estimator (Huber, 1981) to replace the ANOVA model; and (ii) using a
robust SVD or principal components analysis (PCA) procedure to replace the
standard SVD. Until now, robust AMMI models  were not available in any R
package. All robust biplots proposed by Rodrigues et al. (2016) can be obtained
using `rAMMI()`. The argument `type` can be used to specify the type of
model to be fitted (`"rAMMI"`, `"hAMMI"`, `"gAMMI"`, `"lAMMI"` or `"ppAMMI"`).0
Since the sample `yan.winterwheat` dataset does not present outliers, the
conclusions obtained with robust biplots will not differ from those made with
the classic biplot (Rodrigues et al., 2016). Thus, no interpretation of the
robust biplots is presented in this tutorial.


## Site Regression model

The *Site Regression model* (SREG, also called *genotype plus
genotype-by-environment model* or *GGE model*) is another powerful tool for the
analysis and interpretation of MET data in breeding programs. In this case, an
ANOVA is performed to obtain estimates for the additive main effects of
environments and a SVD is performed on the residuals matrix in order to explore
patterns related to genotype (G) and GEI. However, ANOVA and SVD are sensitive to
atypical observations, which are common in MET. To overcome
this problem, three robust models were proposed by Angelini et al. (2022) to obtain valid
results even in the presence of outliers. The current package implements these 
methods, allowing users to perform robust SREG analysis easily. As `yan.winterwheat` 
dataset does not present outliers, the conclusions obtained with robust biplots 
will not differ from those made with the classic biplot (Angelini et al., 2022). Thus,
no interpretation of the robust biplots is presented in this tutorial.

As `rAMMI()` function, `GGEmodel()` data needs to be presented in a long format
and repetitions or extra variables in the dataset are allowed. All the
combinations between genotypes and environments must be present.

```{r}
GGE1 <- rSREGModel(yan.winterwheat, genotype = "gen", environment = "env", 
                 response = "yield")
```

The output from `GGEmodel()` is a list with the following elements:

* `model`: method for fitting the SREG model: `"SREG"`,`"CovSREG"`,`"hSREG"`or `"ppSREG"`.
* `coordgenotype`: plot coordinates for all genotypes in each component. 
* `coordenviroment`: plot coordinates for all environments in each component.
* `eigenvalues`: vector of eigenvalues for each component.
* `vartotal`: overall variance.
* `varexpl`: percentage of variance explained by each component. 
* `labelgen`: genotype names.
* `labelenv`: environment names.
* `axes`: axis labels.
* `Data`: centered input data.
* `SVP`: SVP method.

The result of the first two multiplicative terms of the SVD is often presented
in a GGE biplot (Yan et al., 2000), which represents a rank-two approximation of
the G + GEI effects. Plant breeders have found GGE biplots as an useful tools
for the analysis of mega-environment (Yan et al., 2001; Yan and Rajcan, 2002)
and genotype and environment evaluation (Bhan et al., 2005; Kang et al., 2006;
Yan et al., 2007). The GGE biplot addresses many issues relative to genotype and
test environment evaluation. Considering the average performance of each
genotype, this plot can be used to evaluate specific and general adaptation. In
addition, environments can be visually grouped according to their ability to
discriminate among genotypes and their representativeness of other test
environments. GGE biplot reveals the *which-won-where* pattern and allows to
recommend specific genotypes for each environment (Yan and Tinker, 2005).

Using the output from `GGEmodel()`, `GGEPlot()` builds several GGE biplots
views, in which cultivars are shown in lowercase and environments in uppercase.
The plot also displays the methods used for centering, scaling and SVD.
Optionally, the percentage of G + GEI explained by the two axes can be added as
a footnote with `footnote = T`, as well as a tittle with `titles = T`.

A basic biplot is produced with the option `type="Biplot"` (Figure 2). In this
example the 78% of G + GE variability is explained by the fist two
multiplicative terms. The angles between genotypes markers and environments
vectors are considered to understand this plot. Thus, for example, _Kat_
performs below the average in all environments, as it has an angle greater than
90$°$ with all environments. On the other hand, _Fun_ presents an above-average
performance in all locations except OA93 and KE93, as indicated by the acute angles.
The length of the environment vectors is a measure of the environment's ability to
discriminate between crops.

```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 2: GGE biplot based on yield data of 1993 Ontario winter wheat performance trials. The scaling method used is symmetrical singular value partitioning (by default). The 78% of G + GE variability is explained by the first two multiplicative terms. Cultivars are shown in lowercase and environments in uppercase.'}
rSREGPlot(GGE1, type = "Biplot", footnote = F, titles = F)
```

Breeders usually want to identify the most suitable cultivars for a particular
environment of interest, i.e., OA93. To do this with GGE biplots, Yan and Kang
(2003) suggest drawing a line that passes through the environment marker and the
biplot origin, which may be referred to as the OA93 axis. The performance of the
cultivars in this particular environment can be ranked projecting them onto this
axis.
This can be done by setting `type = "Selected Environment"` and providing 
the name of the environment (OA93) in
`selectedE` (Figure 3). Thus, at OA93, the highest-yielding cultivar was _Zav_,
and the lowest-yielding cultivar was _Luc_. The line that passes through the
biplot origin and is perpendicular to the OA93 axis separates genotypes that
yielded above and below the mean in this environment.

```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 3: comparison of cultivar performance in a selected environment (OA93). The scaling method used is symmetrical singular value partitioning (by default). The 78% of G + GE variability is explained by the first two multiplicative terms.'}
rSREGPlot(GGE1, type = "Selected Environment", selectedE = "OA93", 
        footnote = F, titles = F)
```

Another goal of plant breeders is to determine which is the most suitable
environment for a genotype. Yan and Kang (2003) suggest plotting a line that
passes through the origin and a cultivar marker, i.e., _Kat_. To obtain this GGE
biplots view the argument  `type = "Selected Genotype"` and `selectedG = "Kat"`
must be indicated (Figure 4). Environments are classified along the genotype
axis in the direction indicated by the arrow. The perpendicular axis separates
the environments in which the cultivar presented a performance below or above
the average. In this example, _Kat_ presented a performance below the average in
all the environments studied.

```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 4: comparison of the performance of cultivar Luc in different environments. The scaling method used is symmetrical singular value partitioning (by default). The 78% of G + GE variability is explained by the first two multiplicative terms. '}
rSREGPlot(GGE1, type = "Selected Genotype", selectedG = "Kat", 
        footnote = F, titles = F)
```

It is also possible to compare two cultivars, i.e. _Kat_ and _Cas_, linking them
with a line and a segment perpendicular to it. To obtain this GGE biplots view
the argument  `type = "Comparison of Genotype"` and the genotypes to be compared
`selectedG1 = "Kat"` and `selectedG2 = "Cas"` must be indicated (Figure 5).
_Cas_ was more yielding than _Kat_ in all environments as they all are in the 
same side of the perpendicular line as _Cas_.

```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 5: comparison of the cultivars _Kat_ and _Cas_. The scaling method used is symmetrical singular value partitioning (by default). The 78% of G + GE variability is explained by the first two multiplicative terms. Cultivars are shown in lowercase and environments in uppercase.'}
rSREGPlot(GGE1, type = "Comparison of Genotype", 
        selectedG1 = "Kat", selectedG2 = "Cas", 
        footnote = F, titles = F, axis_expand = 1.5)
```

The polygonal view of the GGE biplots provides an effective way to visualize the
*which-won-where* pattern of MET data (Figure 6). Cultivars in the vertices of
the polygon (_Fun_,_Zav_, _Ena_, _Kat_ and _Luc_) are those with the longest
vectors, in their respective directions, which is a measure of the ability to
respond to environments. The vertex cultivars are, therefore, among the most
responsive cultivars; all other cultivars are less responsive in their
respective directions. 

The dotted lines are perpendicular to the polygon sides and divide the biplot
into mega-environments, each of which has a vertex cultivar, which is the one
with the highest yield (phenotype) in all environments found in it. OA93 and
KE93 are in the same sector, separated from the rest of the biplot by two
perpendicular lines, and _Zav_ is the highest-yielding cultivar in this sector.
_Fun_ is the highest-yielding cultivar in its sector, which contains seven
environments, namely, EA93, BH93, HW93, ID93, WP93, NN93, and RN93. No
environments fell in the sectors with _Ena_, _Kat_, and _Luc_ as vertex
cultivars. This indicates that these vertex cultivars were not the best in any
of the test environments. Moreover, these cultivars were the poorest in some or
all of the environments.

```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 6: polygon view of the GGE biplot, showing which cultivars presented highest yield in each environment. The scaling method used is symmetrical singular value partitioning (by default). The 78% of G + GE variability is explained by the first two multiplicative terms. Cultivars are shown in lowercase and environments in uppercase.'}
rSREGPlot(GGE1, type = "Which Won Where/What", footnote = F,
        titles = F, axis_expand = 1.5)

```

Selecting cultivars within each mega-environments is an issue among plant
breeders. Figure 6 clearly suggests that _Zav_ is the best cultivar for OA93
and KE93, and _Fun_ is the best cultivar for the other locations. However,
breeders do not select a single cultivar in each megaenvironment. Instead, they
evaluate all cultivars in order to get an idea of their performance (yield and
stability). 

In the GGE biplot it is also possible to visualize mean yield and stability of
genotypes in yield units _per se_ (Figure 7 and 8). The GGE biplot based on
*genotype-focused scaling*, obtained indicating `SVP = "row"` in `GGEmodel()`,
provides an useful way to visualize both mean performance and stability of
the tested genotypes. This is because the unit of both axes for the genotypes is
the original unit of the data.

Visualization of the mean and stability of genotypes is achieved by drawing an
average environment coordinate (AEC). For example, Figure 7 shows the AEC for
the mega-environment composed of he environments BH93, EA93, HW93, ID93, NN93,
RN93, WP93. The abscissa represents the G effect, thus, the cultivars are ranked
along the AEC abscissa. Cultivar _Fun_ was clearly the highest-yielding
cultivar, on average, in this mega-environment, followed by _Cas_ and _Har_,and
_Kat_ was the poorest. The AEC ordinate approximate the GEI associated with each
genotype, which is a measure of the variability or instability of the genotype.
_Rub_ and _Dia_ are more variable and less stable than other cultivars, by the
contrary, _Cas_, _Zav_, _Reb_, _Del_, _Ari_, and _Kar_, were more stable.


```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 7: average environment view of the GGE biplot based on genotype-focused scaling, showing mean yield and stability of genotypes. '}
data <- yan.winterwheat[yan.winterwheat$env %in% c("BH93", "EA93","HW93", "ID93",
                                                   "NN93", "RN93", "WP93"), ]
data <- droplevels(data)
GGE2 <- rSREGModel(data, genotype = "gen", environment = "env", 
                 response = "yield", SVP = "row")

rSREGPlot(GGE2, type = "Mean vs. Stability", footnote = F, titles = F, sizeEnv = 0)

```


Figure 8 compares the cultivars to the "ideal" one with the highest
yield and absolute stability. This ideal cultivar is represented by a small
circle and is used as a reference, as it rarely exists. The distance between
cultivars and the ideal one can be used as a measure of convenience. Concentric
circles help to visualize these distances. In the example, _Fun_ is the closest
one to the ideal crop, and therefore the most desirable one, followed by _Cas_
and _Hay_, which in turn are followed by _Rum_, _Ham_, _Rub_, _Zav_, _Del_ and
_Reb_, etc.


```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 8: Classification of genotypes with respect to the ideal genotype. Genotype-focused scaling is used.', warning=FALSE}
rSREGPlot(GGE2, type = "Ranking Genotypes", footnote = F, titles = F, sizeEnv = 0)
```

Although METs are performed to study cultivars, they are equally useful for the
analysis of the environments. This includes several aspects: (i) evaluating
whether the target region belongs to one or more megaenvironments; (ii)
identifying better test environments; (iii) detecting redundant environments
that do not provide additional information on cultivars; and (iv) determining
environments that can be used for indirect selection. To obtain GGE biplots for
comparing environments the *environment-focused scaling* should be used as is
most informative of interrelationships among them (Figure 9 and 10). This is
obtained indicating `SVP = "column"` in `GGEmodel()`.

In Figure 9 environments are connected to the origin through vectors, allowing
us to understand the interrelationships between them. The coefficient of correlation
between two environments it is approximated by the cosine of the angle formed by
the respective vectors. In this example the relation between the environments
for the mega-environment with BH93, EA93, HW93, ID93, NN93, RN93 and WP93 is
considered. The angle between the vectors for the environments NN93 and WP93 is
approximately 10$º$; therefore, they are closely related; while RN93 and BH93
present a weak negative correlation since the angle is slightly greater than
90$º$. The cosine of the angles does not translate precisely into coefficients
of correlation, since the biplot does not explain all the variability in the
dataset. However, they are informative enough to understand the
interrelationship between test environments.

```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 9: Relationship between environments. Environment-focused scaling is used.'}

GGE3 <- rSREGModel(data, genotype = "gen", environment = "env", 
                 response = "yield", SVP = "column")
rSREGPlot(GGE3, type = "Relationship Among Environments", footnote = F, titles = F)
```


Discrimination ability as well as representativeness with respect to the target
environment are fundamental measures for an environment. An ideal test
environment should be both discriminating and representative. If it does not
have the ability to discriminate, it does not provide information on cultivars
and is therefore of no use. At the same time, if it is not representative, not
only does it lack usefulness but it can also provide biased information on the
evaluated cultivars.

To visualize these measurements, an average environment coordinate is defined
and the center of a set of concentric circles represents the ideal environment.
Figure 10 shows the GGE biplots view for the mega-environment with BH93, EA93,
HW93, ID93, NN93, RN93 and WP93. The angle between the vector of an environment
and the AEC provides a measure of representativeness. Therefore, EA93 and ID93
are the most representative, while RN93 and BH93 are the least representative of
the average environment, when the mega-environment is analyzed. On the other
hand, an environment to be discriminative  must be close to the ideal
environment. HW93 is the closest to ideal environment and therefore the most
desirable of the mega-environment, followed by EA93 and ID93. By the contrary,
RN93 and BH93 were the least desirable test environments of this
mega-environment.


```{r, fig.width=6.5, fig.height=6, fig.align='center', dpi=300, out.width="60%", fig.cap='Figure 10: classification of environments with respect to the ideal environment. Environment-focused scaling is used.'}
rSREGPlot(GGE3, type = "Ranking Environments", footnote = F, titles = F)
```


## Robust Site Regression model
While the previous example used a dataset without contamination, MET data often
contain atypical observations due to environmental stress, pests, or measurement
errors. To illustrate the robust capabilities of the package, we refer to the
study by Angelini et al. (2022) using the `lavoranti.eucalyptus` dataset,
available in the agridat package (Wright 2020). In this case, a
distance-distance plot identified five atypical genotypes (1, 2, 21, 22, and
24).
As noted by Angelini et al. (2022), when outliers are present, the first PC
eigenvalues in the classic SREG model are often overestimated. This occurs
because standard SREG uses measures of variability that are highly influenced by
extreme values; consequently, outliers in the direction of the first principal
axes inflate the corresponding variances and artificially increase their
proportion of explained variability. In this context, the high percentage of
variability explained by a classic biplot can be spurious, reflecting the
presence of outliers rather than the actual main structure of the data.
The package allows fitting robust alternatives to overcome this inflation by
simply changing the method argument. The following code compares the classic
SREG with the robust approaches:


```{r, fig.width=12, fig.height=10, fig.align='center', dpi=300, out.width="100%", fig.cap='Figure 11: GGE biplot based on the height of 25 progenies of Eucalyptus grandis from trials carried out in seven environments in the southern and southeastern regions of Brazil. The scaling method used is symmetrical singular value partitioning. SREG (A), hSREG (B), CovSREG (C), and ppSREG (D).'}

library(patchwork)
library(ggplot2)
data("lavoranti.eucalyptus")

# Bellthorpe
lavoranti_Bellthorpe <- droplevels(subset(lavoranti.eucalyptus, origin=="Bellthorpe"))

Nenv <-length(levels(lavoranti_Bellthorpe$loc))
Ngen <- length(levels(lavoranti_Bellthorpe$gen))

lavoranti_Bellthorpe$genot <- rep(1:Ngen, Nenv)
lavoranti_Bellthorpe$genot <- as.factor(lavoranti_Bellthorpe$genot)


# SREG Classic
SREG_classic <- rSREGModel(lavoranti_Bellthorpe, genotype = "genot", 
                          environment = "loc", response = "height", 
                          model = "SREG")

# hSREG
hSREG <- rSREGModel(lavoranti_Bellthorpe, genotype = "genot", 
                    environment = "loc", response = "height", 
                    model = "hSREG")

# CovSREG
CovSREG <- rSREGModel(lavoranti_Bellthorpe, genotype = "genot", 
                    environment = "loc", response = "height", 
                    model = "hSREG")



# ppSREG
ppSREG <- rSREGModel(lavoranti_Bellthorpe, genotype = "genot", 
                    environment = "loc", response = "height", 
                    model = "ppSREG")

# Graphs
p1 <- rSREGPlot(SREG_classic, footnote = F, titles = T)
p2 <- hSREG_plot  <- rSREGPlot(hSREG, footnote = F, titles = T)
p3 <- CovSREG_plot <- rSREGPlot(CovSREG, footnote = F, titles = T)
p4 <- ppSREG_plot  <- rSREGPlot(ppSREG, footnote = F, titles = T)

grafico_robusto_SREG <- (p1 + p2) / (p3 + p4) + 
  plot_annotation(tag_levels = 'A') & 
  theme(plot.tag = element_text(size = 18, face = "bold"))

grafico_robusto_SREG
```

The display of genotypes and environments shows a different behavior for
the classic SREG biplot. The standard SREG biplot shows an overlap of
genotype markers in the direction of the first PC due to the dominant effect
of observations 2, 21, 22 and 24, which difficult the interpretation and
analysis of MET data (Figure 11 A). For example, as the length of the environ­
mental vectors is a measure of the environment’s ability to discriminate
between cultivars, we can conclude that environment 2 was one of the
most discriminating when the classical biplot is analyzed. By the contrary,
it is near the biplot origin when the robust ones were interpreted, indicating
the lack of discriminating ability; that is, it does not provide information
about the cultivars, and therefore, the test environment is useless (Figure 11
B–D). This contradiction occurs because of the influence of genotypes 2, 21,
22 and 24 which have the poorest performance in that environment (while all
the genotypes rendering above 22.60 in that environment give 17.79, 19.77,
20.45 and 20.36, respectively). Outliers concentrated in a single environment
may be the result of a particular climatic event or some pests/diseases that
have influenced the expected performance.
When considering the robust alternatives, the impact of the outlying
observations is reduced leading to an easier interpretation, with genotype
markers being more dispersed. Therefore, the use of robust alternatives
allowed the inclusion of those genotypes in the analysis, reducing their
influence, and consequently not distorting the results.

# Imputation methods

One major limitation of the AMMI and SREG models is that they require a complete
two-way data table. Although METs are designed so that all genotypes are
evaluated in all environments, missing values are very common due to measurement
errors or destruction of plants by animals, floods or harvest problems. In
addition, genotypes might be incorporated or discarded during the study because
of their promising or poor performance. The `imputation()` function includes
several methods to overcome the problem of missing data, some of which  have
been recently published and were not available in any R package until now, such
as the recently proposed EM-SREG and EM-bSREG methods (Angelini et al., 2024).
To present an example, some observations from the complete `yan.winterwheat` are
deleted:

```{r}
# Generating missing data
yan.winterwheat[1,3] <- NA
yan.winterwheat[3,3] <- NA
yan.winterwheat[2,3] <- NA
```


To impute missing values using the methods proposed by Angelini et al. (2024) `"EM-SREG"` and `"EM-bSREG"`, use the following code:

```{r}
imputation(yan.winterwheat, genotype = "gen", environment = "env", 
           response = "yield", type = "EM-SREG")


imputation(yan.winterwheat, genotype = "gen", environment = "env", 
           response = "yield", type = "EM-bSREG")
```

The other methods available in `geneticae` are: `"EM-SREG"`, `"EM-bSREG"`, `"EM-SVD"`, `"Gabriel"`, `"WGabriel"` and `"EM-PCA"`.
 


# References

Angelini, J., Faviere, G. S., Bortolotto, E. B., Arroyo, L., Valentini, G. H.,
and Domingo Lucio Cervigni, G. 2019. Biplot pattern interaction analysis and
statistical test for crossover and non-crossover genotype-by-environment
interaction in peach. Scientia Horticulturae, 252, 298–309. \n

Angelini, J., Faviere, G. S., Bortolotto, E. B., Domingo Lucio Cervigni, G, and
Quaglino, M. B. 2022. Handling outliers in multi-environment trial data analysis: 
in the direction of robust SREG model. Journal of Crop Improvement. \n

Angelini, J., Domingo Lucio Cervigni, G, and Quaglino, M. B. 2024. New imputation 
methodologies for genotype‐by‐environment data: an extensive study of properties 
of estimators. Euphytica. \n

Bhan, M.K., Pal, S., Rao, B.L., Dhar, A.K., and Kang, M.S. 2005. GGE biplot
analysis of oil yield in lemongrass. Journal of New Seeds, 7, 127–139.  \n

Cornelius, P.L., J. Crossa, and M.S. Seyedsadr. 1996. Statistical tests and
estimates of multiplicative models for GE interaction, p. 199–234. In: M.S. Kang
and H.G. Gauch, Jr. (Eds.), Genotype-by-environment interaction, CRC Press, Boca
Raton, FL. \n

Crossa, J. 1990. Statistical Analyses of Multilocation Trials. Advances in
Agronomy, 55–85. \doi{10.1016/s0065-2113(08)60818-4} \n

Dumble, S. 2017. GGEBiplots: GGE Biplots with 'ggplot2'. R package version
0.1.1.  \n

de Mendiburu, F. 2020. agricolae: Statistical Procedures for Agricultural
Research. R package version 1.3-2. \n

Gauch, H.G., Jr. 1988. Model selection and validation for yield trials with
interaction, Biometrics, 44, 705–715. \n

Gauch H.G. and R.W. Zobel. 1997. Identifying mega-environments and targeting
genotypes. Crop Science, 37, 311–326.  \n

Giauffret, C., Lothrop, J., Dorvillez, D., Gouesnard, B., and Derieux, M., 2000.
Genotype x environment interactions in maize hybrids from temperate or highland
tropical origin. Crop Science, 40, 1004-1012. \n

Huber, P.J. 1981. Robust Statistics. Wiley, New York. \n

Kang, M.S., Aggarwal, V.D., and Chirwa, R.M. 2006. Adaptability and stability of
bean cultivars as determined via yield-stability statistic and GGE biplot
analysis. Journal of Crop Improvement, 15, 97–120. \n

Kang, M.S. and Magari, R. 1996. New developments in selecting for phenotypic
stability in crop breeding, p. 1–14. In: M.S. Kang and H.G. Gauch, Jr. (Eds.),
Genotype-by-environment interaction, CRC Press, Boca Raton, FL. \n

Kempton, R.A. 1984. The use of biplots in interpreting variety by environment
interactions. The Journal of Agricultural Science, 103, 123–135.  \n

Rodrigues,  P.C.,  Monteiro,  A., and Lourenço,  V.M. 2016.  A  robust  AMMI
model  for  the  analysis  of  genotype-by-environment  data. Bioinformatics,32,
58–66.  \n

Wright, K. 2020. agridat: Agricultural Datasets. R package version 1.17. \n

Yan, W., Cornelius, P.L., Crossa, J., and Hunt, L.A. 2001. Two types of GGE
biplots for analyzing multi-environment trial data. Crop Science, 41, 656–663. \n

Yan, W., Hunt, L.A., Sheng, Q., and Szlavnics, Z. 2000. Cultivar evaluation and
mega-environment investigation based on the GGE biplot. Crop Science, 40,
597–605.\n

Yan, W. and Kang, M.S. 2003. GGE Biplot Analysis: A Graphical Tool for Breeders,
Geneticists, and Agronomists. CRC Press, Boca Raton, FL. \n

Yan, W., Kang, M.S., Ma, B., Woods, S., and Cornelius, P.L. 2007. GGE Biplot vs.
AMMI analysis of genotype-by-environment data. Crop Science, 7, 641–653. \n

Yan, W. and Rajcan, I. 2002. Biplot analysis of sites and trait relations of
soybean in Ontario, Crop Science, 42, 11–20. \n

Yan, W. and Tinker, N.A. 2005. An integrated biplot system for displaying,
interpreting, and exploring genotype 9 environment interaction. Crop Science,
45, 1004–1016.  \n


# Session Info

```{r, echo = FALSE}
sessionInfo()
```
