---
title: "eimpute: Efficiently IMPUTE Large Scale Incomplete Matrix"
output: 
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{eimpute: Efficiently IMPUTE Large Scale Incomplete Matrix}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE, eval=TRUE}
knitr::opts_chunk$set(comment = "#>", warning = FALSE, eval = TRUE, message = FALSE, collapse = TRUE)
library(eimpute)
```

## Introduction
Matrix completion is a procedure for imputing the missing elements in matrices by using the information of observed elements. This procedure can be visualized as:

![](./matrixcom.jpg)

Matrix completion has attracted a lot of attention, it is widely applied in:
    
- tabular data imputation: recover the missing elements in data table;
- recommend system: estimate users' potantial preference for items pending purchased;
- image inpainting: inpaint the missing elements in digit images.

A computationally efficient R package, **eimpute** is developed for matrix completion. 
In **eimpute**, matrix completion problem is solved by iteratively performing low-rank approximation and data calibration, which enjoy two admirable advantages:   
    
- unbiased low-rank approximation for incomplete matrix   
- less time consumption via truncated SVD   

Compare **eimpute** and **softimpute** in systhesis datasets $X_{m \times m}$ with $p$ proportion missing observations.
The square matrix $X_{m \times m}$ is generated by $X =  UV + \epsilon$, where $U$ and $V$ are $m \times r$, $r \times n$ matrices whose entries are $i.i.d.$ sampled standard normal distribution, $\epsilon \sim N(0, r/3)$.

- $m$ is chosen as 1000, 2000, 3000, 4000
- $p$ is chosen as 0.1, 0.5, 0.9.

<img src="./time3.png" width="680" height="280" />
<img src="./error3.png" width="680" height="280" />

In high dimension case, als method in **softimpute** is a little faster than **eimpute** in low proportion of missing observations, as the proportion of missing observations increase, rsvd method in **eimpute** have a better performance than **softimpute** in time cost and test error. Compare with two method in **eimpute*, rsvd method is better than tsvd in time cost.

## Installation

Install the stable version from CRAN:        
```{r, eval=FALSE}
install.packages("eimpute")
```

Install the development version from github:        
```{r, eval=FALSE}
library(devtools)
install_github("Mamba413/eimpute", build_vignettes = TRUE)
```

## Quick Example

We start with a toy example. Let us generate a small matrix with some values missing via **incomplete.generator** function.

```{r}
m <- 6
n <- 5
r <- 3
x_na <- incomplete.generator(m, n, r)
x_na
```

Use **eimpute** function to impute missing values. 

```{r}
x_impute <- eimpute(x_na, r)
x_impute[["x.imp"]]
```


