---
title: "fullRankMatrix - Comparison to other packages"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{fullrankmat-comparison}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```


## Other available packages that detect linear dependent columns
There are already a few other packages out there that offer functions to detect linear dependent columns. Here are the ones we are aware of:

```{r}
library(fullRankMatrix)

# let's say we have 10 fruit salads and indicate which ingredients are present in each salad
strawberry <- c(1,1,1,1,0,0,0,0,0,0)
poppyseed <- c(0,0,0,0,1,1,1,0,0,0)
orange <- c(1,1,1,1,1,1,1,0,0,0)
pear <- c(0,0,0,1,0,0,0,1,1,1)
mint <- c(1,1,0,0,0,0,0,0,0,0)
apple <- c(0,0,0,0,0,0,1,1,1,1)

# let's pretend we know how each fruit influences the sweetness of a fruit salad
# in this case we say that strawberries and oranges have the biggest influence on sweetness
set.seed(30)
strawberry_sweet <- strawberry * rnorm(10, 4)
poppyseed_sweet <- poppyseed * rnorm(10, 0.1)
orange_sweet <- orange * rnorm(10, 5)
pear_sweet <- pear * rnorm(10, 0.5)
mint_sweet <- mint * rnorm(10, 1)
apple_sweet <- apple * rnorm(10, 2)

sweetness <- strawberry_sweet + poppyseed_sweet+ orange_sweet + pear_sweet +
  mint_sweet + apple_sweet 

mat <- cbind(strawberry,poppyseed,orange,pear,mint,apple)

```



**`caret::findLinearCombos()`**: https://rdrr.io/cran/caret/man/findLinearCombos.html

This function identifies which columns are linearly dependent and suggests which columns to remove. But it doesn't provide appropriate naming for the remaining columns to indicate that any significant associations with the remaining columns are actually associations with the space spanned by the originally linearly dependent columns. Just removing the  and then fitting the linear model would lead to erroneous interpretation.
```{r}
caret_result <- caret::findLinearCombos(mat)
```
Fitting a linear model with the `orange` column removed would lead to the erroneous interpretation that `strawberry` and `poppyseed` have the biggest influence on the fruit salad `sweetness`, but we know it is actually `strawberry` and `orange`.
```{r}
mat_caret <- mat[, -caret_result$remove]
fit <- lm(sweetness ~ mat_caret + 0)
print(summary(fit))
```


**`WeightIt::make_full_rank()`**: https://rdrr.io/cran/WeightIt/man/make_full_rank.html

This function removes some of the linearly dependent columns to create a full rank matrix, but doesn't rename the remaining columns accordingly. For the user it isn't clear which columns were linearly dependent and they can't choose which column will be removed.
```{r}
mat_weightit <- WeightIt::make_full_rank(mat, with.intercept = FALSE)
mat_weightit
```
As above fitting a linear model with this full rank matrix would lead to erroneous interpretation that `strawberry` and `poppyseed` influence the `sweetness`, but we know it is actually `strawberry` and `orange`.

```{r}
fit <- lm(sweetness ~ mat_weightit + 0)
print(summary(fit))
```


**`plm::detect.lindep()`:** https://rdrr.io/cran/plm/man/detect.lindep.html

The function returns which columns are potentially linearly dependent.
```{r}
plm::detect.lindep(mat)
```

However it doesn't capture all cases. For example here `plm::detect.lindep()` says there are no dependent columns, while there are several:
```{r}
c1 <- rbinom(10, 1, .4)
c2 <- 1-c1
c3 <- integer(10)
c4 <- c1
c5 <- 2*c2
c6 <- rbinom(10, 1, .8)
c7 <- c5+c6
mat_test <- as.matrix(data.frame(c1,c2,c3,c4,c5,c6,c7))

plm::detect.lindep(mat_test)
```

`fullRankMatrix` captures these cases:
```{r}
result <- make_full_rank_matrix(mat_test)
result$matrix
```
**`Smisc::findDepMat()`**: https://rdrr.io/cran/Smisc/man/findDepMat.html

**NOTE**: this package was removed from CRAN as of 2020-01-26 (https://CRAN.R-project.org/package=Smisc) due to failing checks.

This function indicates linearly dependent rows/columns, but it doesn't state which rows/columns are linearly dependent with each other.

However, this function seems to not work well for one-hot encoded matrices and the package doesn't seem to be updated anymore (s. this issue: https://github.com/pnnl/Smisc/issues/24).
```
# example provided by Smisc documentation
Y <- matrix(c(1, 3, 4,
              2, 6, 8,
              7, 2, 9,
              4, 1, 7,
              3.5, 1, 4.5), byrow = TRUE, ncol = 3)
Smisc::findDepMat(t(Y), rows = FALSE)
```

Trying with the model matrix from our example above:
```
Smisc::findDepMat(mat, rows=FALSE)
#> Error in if (!depends[j]) { : missing value where TRUE/FALSE needed
```
