---
title: "Linear-plateau response"
author: Adrian Correndo & Austin Pearce 
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Linear-plateau response}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=6, 
  fig.height=4
)
```

<img src="../man/figures/soiltestcorr_logo.png" align="right" height="200" style="float:right; height:200px;"/>

## Description

This tutorial demonstrates the `linear_plateau()` function for fitting a continuous response model and estimating a critical soil test value (CSTV). This function fits a segmented regression model that follows two phases: a positive linear response and a flat plateau. The join point or break point is often interpreted as the CSTV. See Anderson and Nelson (1975) for examples.

$$
\begin{cases}
x < j,\ y = a + bx \\
x > j,\ y = a + bj
\end{cases}
$$

where\
`y` represents the fitted crop relative yield,\
`x` the soil test value,\
`a` the intercept (`ry` when `stv` = 0),\
`b` the slope (as the change in RY per unit of soil nutrient supply),\
`j` the join point (a.k.a, break point) when the plateau phase starts (i.e., the CSTV).

The `linear_plateau()` function works automatically with self-starting initial values to facilitate the model's convergence. The parameters of this regression model have simple interpretations.

Some disadvantages are that:

-   the crop relative yield response may be more curvilinear, such that a sharp break from linear response to plateau is unreasonable.

-   the default CSTV confidence interval (based on symmetric Wald's intervals) is generally unreliable. We recommend the user try the `boot_linear_plateau()` function for a reliable confidence interval estimation of parameters via bootstrapping (resampling with replacement).

## General Instructions

1.  Load your data frame with soil test value and relative yield data.

2.  Specify the following arguments in `linear_plateau()`:

    1.  `data` (optional)

    2.  `stv` (soil test value)

    3.  `ry` (relative yield) columns or vectors

    4.  `target` (optional) for calculating the soil test value at some RY level along the slope segment.

    5.  `tidy` `TRUE` (produces a data.frame with results) or `FALSE` (store results as list)

    6.  `plot` `TRUE` (produces a ggplot as main output) or `FALSE` (no plot, only results as data.frame)

    7.  `resid` `TRUE` (produces plots with residuals analysis) or `FALSE` (no plot),

3.  Run and check results.

4.  Check residuals plot, and warnings related to potential limitations of this model.

5.  Adjust curve plots as desired with additional `ggplot2` functions.

# Tutorial

```{r setup}
library(soiltestcorr)
```

Suggested packages

```{r warning=FALSE, message=FALSE}
# Install if needed 
library(ggplot2) # Plots
library(dplyr) # Data wrangling
library(tidyr) # Data wrangling
library(purrr) # Mapping

```

## Load dataset

```{r}
# Native fake dataset from soiltestcorr package
corr_df <- soiltestcorr::data_test
```

# Fit `linear_plateau()`

## 1. Individual fits

### 1.1. `tidy` = TRUE (default)

It returns a tidy data frame (more organized results)

```{r warning=TRUE, message=TRUE}

linear_plateau(corr_df, STV, RY, tidy = TRUE)
```

### 1.2. `tidy` = FALSE

It returns a LIST (may be more efficient for multiple fits at once)

```{r warning=TRUE, message=TRUE}

linear_plateau(corr_df, STV, RY, tidy = FALSE)
```

### 1.3. Alternative using the vectors

You can use the `stv` and `ry` vectors from the data frame using the `$`.

```{r warning=TRUE, message=TRUE}

fit_vectors_tidy <- linear_plateau(stv = corr_df$STV, ry = corr_df$RY)

fit_vectors_list <- linear_plateau(stv = corr_df$STV, ry = corr_df$RY, tidy = FALSE)
```

## 2. Multiple fits at once

```{r warning=T, message=F}
# Example 1. Fake dataset manually created
data_1 <- data.frame("RY"  = c(65,80,85,88,90,94,93,96,97,95,98,100,99,99,100),
                     "STV" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))
  
# Example 2. Native fake dataset from soiltestcorr package
data_2 <- soiltestcorr::data_test


# Example 3. Native dataset from soiltestcorr package, Freitas et al.  (1966), used by Cate & Nelson (1971)
data_3 <- soiltestcorr::freitas1966 %>% 
  rename(STV = STK)

data.all <- bind_rows(data_1, data_2, data_3, .id = "id")
```

Note: the `stv` column needs to have the same name for all datasets if binding rows.

### 2.1. Using `map()`

```{r warning=T, message=F}

# Run multiple examples at once with purrr::map()
data.all %>%
  nest(data = c("STV", "RY")) %>% 
  mutate(model = map(data, ~ linear_plateau(stv = .$STV, ry = .$RY))) %>%
  unnest(model)

```

### 2.2. Using `group_modify()`

Alternatively, with `group_modify`, nested data is not required. However, it still requires a grouping variable (in this case, `id`) to identify each dataset. `group_map()` may also be used, though `list_rbind()` is required to return a tidy data frame of the model results instead of a list.

```{r warning=T, message=F}

data.all %>% 
  group_by(id) %>% 
  group_modify(~ linear_plateau(data = ., STV, RY))

```

## 3. Bootstrapping

Bootstrapping is a suitable method for obtaining confidence intervals for parameters or derived quantities. Bootstrapping is a resampling technique (with replacement) that draws samples from the original data with the same size. If you have groups within your data, you can specify grouping variables as arguments in order to maintain, within each resample, the same proportion of observations than in the original dataset.

This function returns a table with as many rows as the resampling size (n) containing the results for each resample.

```{r}
set.seed(123)

boot_lp <- boot_linear_plateau(corr_df, STV, RY, n = 500) # only 500 for sake of speed

boot_lp %>% head(n = 5)

# CSTV Confidence Interval
quantile(boot_lp$CSTV, probs = c(0.025, 0.5, 0.975))

# Plot
boot_lp %>% 
  ggplot(aes(x = CSTV))+
  geom_histogram(color = "grey25", fill = "#9de0bf", bins = 10)
```

## 4. Plots

### 4.1. Correlation Curve

We can generate a ggplot with the same `linear_plateau()` function. We just need to specify the argument `plot = TRUE`.

```{r warning=F, message=F}
data_3 <- soiltestcorr::freitas1966

plot_lp <- linear_plateau(data = data_3, STK, RY, plot = TRUE)

plot_lp
```

### 4.2. Fine-tune the plots

As ggplot object, plots can be adjusted in several ways, such as modifying titles and axis scales.

```{r warning=F, message=F}
plot_lp +
  # Main title
  ggtitle("My own plot title")+
  # Axis titles
  labs(x = "Soil Test K (ppm)",
       y = "Cotton RY(%)") +
  # Axis scales
  scale_x_continuous(limits = c(20,220),
                     breaks = seq(0,220, by = 10)) +
  # Axis limits
  scale_y_continuous(limits = c(30,100),
                     breaks = seq(30,100, by = 10))
```

### 4.3. Residuals

Set argument `resid = TRUE`.

```{r warning=F, message=F}

# Residuals plot

linear_plateau(data_3, STK, RY, resid = TRUE)

```

#### References

*Anderson, R. L., and Nelson, L. A. (1975). A Family of Models Involving Intersecting Straight Lines and Concomitant Experimental Designs Useful in Evaluating Response to Fertilizer Nutrients. Biometrics, 31(2), 303--318. 10.2307/2529422*
