---
title: "Continuous Data"
author: "Aravind Hebbali"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Continuous Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, echo=FALSE, message=FALSE}
library(descriptr)
library(dplyr)
```

## Introduction

This document introduces you to a basic set of functions that describe data
continuous data. The other two vignettes introduce you to functions that
describe categorical data and visualization options.

## Data

We have modified the `mtcars` data to create a new data set `mtcarz`. The only
difference between the two data sets is related to the variable types. 

```{r egdata}
str(mtcarz)
```

## Data Screening

The `ds_screener()` function will screen a data set and return the following:
- Column/Variable Names
- Data Type
- Levels (in case of categorical data)
- Number of missing observations
- % of missing observations

```{r screener}
ds_screener(mtcarz)
```

## Summary Statistics

The `ds_summary_stats` function returns a comprehensive set of statistics 
including measures of location, variation, symmetry and extreme observations.

```{r summary}
ds_summary_stats(mtcarz, mpg)
```

You can pass multiple variables as shown below:

```{r summary2}
ds_summary_stats(mtcarz, mpg, disp)
```

If you do not specify any variables, it will detect all the continuous 
variables in the data set and return summary statistics for each of them.

## Frequency Distribution

The `ds_freq_table` function creates frequency tables for continuous variables. 
The default number of intervals is 5.

```{r fcont}
ds_freq_table(mtcarz, mpg, 4)
```

### Histogram

A `plot()` method has been defined which will generate a histogram.

```{r fcont_hist, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_freq_table(mtcarz, mpg, 4)
plot(k)
```

## Auto Summary

If you want to view summary statistics and frequency tables of all or subset of
variables in a data set, use `ds_auto_summary()`.

```{r auto-summary}
ds_auto_summary_stats(mtcarz, disp, mpg)
```

## Group Summary

The `ds_group_summary()` function returns descriptive statistics of a continuous
variable for the different levels of a categorical variable.

```{r gsummary}
k <- ds_group_summary(mtcarz, cyl, mpg)
k
```

`ds_group_summary()` returns a tibble which can be used for further analysis.

```{r gsummary_tibble}
k$tidy_stats
```

### Box Plot

A `plot()` method has been defined for comparing distributions.

```{r gsum_boxplot, fig.width=7, fig.height=7, fig.align='centre'}
k <- ds_group_summary(mtcarz, cyl, mpg)
plot(k)
```

### Multiple Variables

If you want grouped summary statistics for multiple variables in a data set, use
`ds_auto_group_summary()`.

```{r auto-group-summary}
ds_auto_group_summary(mtcarz, cyl, gear, mpg)
```

### Combination of Categories

To look at the descriptive statistics of a continuous variable for different 
combinations of levels of two or more categorical variables, use 
`ds_group_summary_interact()`.

```{r interact-summary}
ds_group_summary_interact(mtcarz, mpg, cyl, gear)
```

## Multiple Variable Statistics 

The `ds_tidy_stats()` function returns summary/descriptive statistics for 
variables in a data frame/tibble.

```{r multistats}
ds_tidy_stats(mtcarz, mpg, disp, hp)
```

## Measures

If you want to view the measure of location, variation, symmetry, percentiles 
and extreme observations as tibbles, use the below functions. All of them, 
except for `ds_extreme_obs()` will work with single or multiple variables. If 
you do not specify the variables, they will return the results for all the 
continuous variables in the data set.

#### Measures of Location

```{r mloc}
ds_measures_location(mtcarz)
```

#### Measures of Variation

```{r mvar}
ds_measures_variation(mtcarz)
```

#### Measures of Symmetry

```{r msym}
ds_measures_symmetry(mtcarz)
```

#### Percentiles

```{r mperc}
ds_percentiles(mtcarz)
```

#### Extreme Observations

```{r mextreme}
ds_extreme_obs(mtcarz, mpg)
```




