---
title: "Cleansing the dataset"
author: "Choonghyun Ryu"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Cleansing the dataset}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r environment, echo = FALSE, message = FALSE, warning=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "")
```

## Preface
If you created a dataset to create a classification model, you must perform cleansing of the data.
After you create the dataset, you should do the following:

* **Cleansing the dataset**
    + **Optional removal of variables including missing values**
    + **Remove a variable with one unique number**
    + **Remove categorical variables with a large number of levels**
    + **Convert a character variable to a categorical variable**    
* Split the data into a train set and a test set
* Modeling and Evaluate, Predict

The alookr package makes these steps fast and easy:


## How to perform cleansing the dataset

For information on how to perform cleansing the dataset, refer to the following website.

- [`Cleansing the dataset`](https://choonghyunryu.github.io/alookr_vignette/cleansing.html)



