---
title: "Alternative Inputs for Population Structure"
author: "Sahir Rai Bhatnagar"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    toc_float: true
    #code_folding: hide
    fig_retina: null
vignette: >
  %\VignetteIndexEntry{Alternative Inputs for Population Structure}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

There are three different ways that the `ggmix` function can take the population structure information:  
1) `kinship`: A $n \times n$ kinship matrix  
2) `K`: A $n \times k$ matrix of SNPs used to determine the relatedness between subjects  
3) `U`: the eigenvectors of the kinship matrix or the left singular vectors of `K`, and `D`: the eigenvalues of the kinship matrix or the square of the singular values of `K`  

We provide this flexibility to allow for a low rank estimation procedure (not yet implemented), and in case the user has already calculated this information elsewhere.  


## The kinship argument

We have already seen how to use the `kinship` argument:  


```{r}
library(ggmix)
data("admixed")

# supply p.fac to the penalty.factor argument
res <- ggmix(x = admixed$xtrain, y = admixed$ytrain, kinship = admixed$kin_train)
plot(res)
coef(res, s = 0.059, type = "nonzero")
```



## The U and D arguments

Alternatively we can directly supply the eigenvectors and eigenvalues of the kinship matrix:

```{r}
eigKinship <- eigen(admixed$kin_train)

resUD <- ggmix(x = admixed$xtrain, y = admixed$ytrain, 
               U = eigKinship$vectors[,which(eigKinship$values>0)],
               D = eigKinship$values[which(eigKinship$values>0)])
plot(resUD)
coef(resUD, s = 0.059, type = "nonzero")
```

We can also run a singular value decomposition (SVD) on the matrix of SNPs used to construct the kinship matrix. This approach is computationally less expensive since it bypasses explicit computation of the kinship matrix.

```{r}
svdXkinship <- svd(admixed$Xkinship)
resUDsvd <- ggmix(x = admixed$xtrain, y = admixed$ytrain, U = svdXkinship$u,
                  D = svdXkinship$d ^ 2)
plot(resUDsvd)
coef(resUDsvd, s = 0.059, type = "nonzero")
```

The results differ from the one using the kinship matrix above because this simulated data comes from an admixed population, and therefore, using a matrix of SNPs in this case may not be appropriate. The kinship matrix in the `admixed` data was calculated using the `popkin` function from the `popkin` package.  

## The K argument

We can directly supply the matrix of SNPs used to calculated the kinship matrix also to the `K` argument:

```{r}
resK <- ggmix(x = admixed$xtrain, y = admixed$ytrain, K = admixed$kin_train)
plot(resK)
coef(resK, s = 0.059, type = "nonzero")
```

