---
title: "Modified ALCC"
author: Adrian Correndo 
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Modified ALCC}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=6, 
  fig.height=4
)
```

<img src="../man/figures/soiltestcorr_logo.png" align="right" height="200" style="float:right; height:200px;"/>

## Description

This tutorial is intended to show how to deploy the `mod_alcc()` function for estimating critical soil test values using the Modified Arcsine-Log Calibration Curve, originally introduced by Dyson & Conyers (2013) and modified by Correndo et al. (2017). This function produces the estimation of critical soil test values (CSTV) for a target relative yield (`ry`) with confidence intervals at adjustable confidence levels. Behind the scenes, the `mod_alcc()` is based on the standardized major axis (SMA), a bivariate regression model that assumes that both axis are random variables (Warton et al., 2006).

## General Instructions

1.  Load your dataframe with soil test value (`stv`) and relative yield (`ry`) data.
2.  Specify the following arguments into the function `mod_alcc()`
    1.  `data` (optional)

    2.  `stv` (soil test value) and `ry` (relative yield) columns or vectors,

    3.  `target` of relative yield (e.g. 90%),

    4.  desired `confidence` level (e.g. 0.95 for 1 - alpha(0.05)).

        -   Used for the estimation of critical soil test value (CSTV) lower and upper limits.

    5.  `tidy` TRUE (produces a data.frame with results) or FALSE-default- (store results as list),

    6.  `plot` TRUE (produces a ggplot as main output) or FALSE (default; no plot, only results as data.frame or list)
3.  Run and check results.
4.  Check residuals plot, and warnings related to potential leverage points.
5.  Adjust curve plots as desired with additional `ggplot2` functions.

# Tutorial

```{r setup}
library(soiltestcorr)
```

Suggested packages

```{r warning=FALSE, message=FALSE}
# Install if needed 
library(ggplot2) # Plots
library(dplyr) # Data wrangling
library(tidyr) # Data wrangling
library(purrr) # Mapping
```

This is a basic example using three different datasets:

## Load datasets

```{r}
# Native fake dataset from soiltestcorr package
corr_df <- soiltestcorr::freitas1966
```

# Fit `mod_alcc()`

## 1. Individual fits

RY target = 90%, confidence level = 0.95, replace with your desired values

### 1.1. `tidy` = FALSE

It returns a LIST (may be more efficient for multiple fits at once)

```{r warning=TRUE, message=TRUE}
# Using dataframe argument, tidy = FALSE -> return a LIST
mod_alcc(data = corr_df, ry = RY, stv = STK, target=90, confidence = 0.95,
         tidy = TRUE)

```

### 1.2. `tidy` = TRUE

It returns a data.frame (more organized results)

```{r warning=TRUE, message=TRUE}
# Using dataframe argument, tidy = FALSE -> return a LIST
mod_alcc(data = corr_df, ry = RY, stv = STK, target=90, confidence = 0.95, tidy = TRUE)

```

### 1.3. Alternative using the vectors

You can call `stv` and `ry` vectors using the `$`.

The `tidy` argument still applies for controlling the output type

```{r warning=TRUE, message=TRUE}
fit_vectors_tidy <- mod_alcc(ry = corr_df$RY,
                             stv = corr_df$STK,
                             target = 90,
                             confidence = 0.95)

fit_vectors_list <- mod_alcc(ry = corr_df$RY,
                             stv = corr_df$STK,
                             target = 90,
                             confidence = 0.95,
                             tidy = FALSE)
```

## 2. Multiple fits at once

```{r warning=T, message=F}
# Example 1. Fake dataset manually created
data_1 <- data.frame("RY"  = c(65,80,85,88,90,94,93,96,97,95,98,100,99,99,100),
                     "STV" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
  
# Example 2. Native fake dataset from soiltestcorr package
data_2 <- soiltestcorr::data_test


# Example 3. Native dataset from soiltestcorr package, Freitas et al.  (1966), used by Cate & Nelson (1971)
data_3 <- soiltestcorr::freitas1966 %>% 
  rename(STV = STK)

data.all <- bind_rows(data_1, data_2, data_3, .id = "id")
```

Note: the `stv` column needs to have the same name for all datasets if binding rows.

### 2.1. Using `map()`

```{r warning=T, message=F}

# Run multiple examples at once with purrr::map()
data.all %>%
  nest(data = c("STV", "RY")) %>% 
  mutate(model = map(data, ~ mod_alcc(stv = .$STV, ry = .$RY, target = 90))) %>%
  unnest(model)
```

Note: the output table is still generated despite warnings regarding confidence level and leverage points.

### 2.2. Using `group_modify()`

Alternatively, with `group_modify`, nested data is not required. However, it still requires a grouping variable (in this case, `id`) to identify each dataset. `group_map()` may also be used, though `list_rbind()` is required to return a tidy data frame of the model results instead of a list.

```{r warning=T, message=F}

data.all %>% 
  group_by(id) %>% 
  group_modify(~ mod_alcc(data = ., STV, RY, target = 90, confidence = 0.95))

```

## 3. Bootstrapping

Bootstrapping is a suitable method for obtaining confidence intervals for parameters or derived quantities. Bootstrapping is a resampling technique (with replacement) that draws samples from the original data with the same size. If you have groups within your data, you can specify grouping variables as arguments in order to maintain, within each resample, the same proportion of observations than in the original dataset.

This function returns a table with as many rows as the resampling size (n) containing the results for each resample.

```{r warning=T, message=F}
set.seed(123)
boot_alcc <- boot_mod_alcc(data = corr_df,
                           stv = STK, ry = RY,
                           target = 90, n = 500)

boot_alcc %>% head(n = 5)

# CSTV Confidence Interval
quantile(boot_alcc$CSTV, probs = c(0.025, 0.5, 0.975))

# Plot
boot_alcc %>% 
  ggplot2::ggplot(aes(x = CSTV))+
  geom_histogram(color = "grey25", fill = "steelblue", bins = 10)
```

## 4. Plots

### 4.1. Calibration Curve

We can generate a ggplot with the same `mod_alcc()` function. We just need to specify the argument `plot = TRUE`.

```{r warning=F, message=F}
plt_alcc <- mod_alcc(data = corr_df,
                     ry = RY, 
                     stv = STK, 
                     target = 95,
                     plot = TRUE)

plt_alcc
```

### 4.2. Fine-tune the plots

As ggplot object, plots can be adjusted in several ways, such as modifying titles and axis scales.

```{r warning=F, message=F}
plt_alcc +
  # Main title
  ggtitle("My own plot title")+
  # Axis titles
  labs(x = "Soil Test K (ppm)",
       y = "Cotton RY(%)") +
  # Axis scales
  scale_x_continuous(limits = c(20,220),
                     breaks = seq(0,220, by = 25)) +
  # Axis limits
  scale_y_continuous(limits = c(30,100),
                     breaks = seq(20,100, by = 20))
```

### 3.2. SMA regression

Behind the scenes, the `mod_alcc()` runs a Standardized Major Axis regression (SMA).

We can extract the SMA fit and also check the residuals of this model out as follows:

```{r warning=F, message=F}
fit_3 <- mod_alcc(data = corr_df, ry = RY, stv = STK, target = 90)
# Extract SMA regression fit and residuals from fit_3 (data_3, (Freitas et al., 1966))
SMA_freitas_1966 <- fit_3$SMA %>% as.data.frame()
 
SMA_freitas_1966 %>% 
  ggplot(aes(x = arc_RY, y = ln_STV))+
  ggtitle("SMA Regression. Dataset 3")+
  geom_point(shape=21, fill = "orange", size = 3, alpha = 0.75)+
  #SMA Line
  geom_path(aes(x=arc_RY, y = SMA_line, linetype = "SMA_fit"), linewidth = 1.5, col = "grey25")+
  scale_linetype_manual(name="", values = c("solid"))+
  #Critical value
  geom_vline(xintercept = 0, col = "grey10", size = 1.25, linetype = "dashed")+
  theme_bw()+
  # Axis titles
  labs(y = "ln_STV", y = "asin(sqrt(RY))-centered")
```

### 3.3. SMA residuals

```{r warning=F, message=F}

# Residuals plot
SMA_freitas_1966 %>% 
  ggplot(aes(x = fitted_axis, y = residuals))+
  ggtitle("Residuals SMA. Dataset 3")+
  geom_point(shape=21, fill = "orange", size = 3, alpha = 0.75)+
  geom_hline(yintercept = 0, col = "grey10", linewidth = 1.25, linetype = "dashed")+
  theme_bw()+
  # Axis titles
  labs(x = "Fitted Axis -SMA- (see Warton et al. 2006)", y = "Residuals (STV units)")

```

### References

*Correndo, A.A., Salvagiotti, F., García, F.O. and Gutiérrez-Boem, F.H., 2017. A modification of the arcsine--log calibration curve for analysing soil test value--relative yield relationships. Crop and Pasture Science, 68(3), pp.297-304. 10.1071/CP16444*

*Dyson, C.B., Conyers, M.K., 2013. Methodology for online biometric analysis of soil test-crop response datasets. Crop & Pasture Science 64: 435--441. 10.1071/CP13009*

*Warton, D.I., Wright, I.J., Falster, D.S., Westoby, M., 2006. Bivariate line-fitting methods for allometry. Biol. Rev. Camb. Philos. Soc. 81, 259--291. 10.1017/S1464793106007007*
