---
title: "Hypothesis test for a difference between means"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Hypothesis test for the difference between means}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,comment=NA,fig.width=7,fig.height=5)
library(interpretCI)
library(glue)
```


```{r,echo=FALSE,message=FALSE}
x=meanCI(n1=100,n2=100,m1=200,s1=40,m2=190,s2=20,mu=7,alpha=0.05,alternative="greater")
two.sided<-greater<-less<-FALSE
if(x$result$alternative=="two.sided") two.sided=TRUE
if(x$result$alternative=="less") less=TRUE
if(x$result$alternative=="greater") greater=TRUE

twoS="The null hypothesis will be rejected if the difference between sample means is too big or if it is too small."
lessS="The null hypothesis will be rejected if the difference between sample means is too small."
greaterS="The null hypothesis will be rejected if the difference between sample means is too big."
```


This document is prepared automatically using the following R command.

```{r,echo=FALSE}
call=paste0(deparse(x$call),collapse="")
x1=paste0("library(interpretCI)\nx=",call,"\ninterpret(x)")
textBox(x1,italic=TRUE,bg="grey95",lcolor="grey50")
```



```{r,echo=FALSE}
library(glue)
library(interpretCI)
if(two.sided){
if(x$result$mu==0) {
     claim= "Test the hypothesis that men and women spend equally on refreshment"
} else {
     claim= paste0("The team owner claims that the difference in spending money between men and women is equal to ", x$result$mu) 
}
} else{
    if(x$result$mu==0) {
     claim= paste0("Test the hypothesis that men spend", ifelse(greater,"more","less")," than women on refreshment")
   } else {
       claim= paste0("The team owner claims that men spend at least $", x$result$mu, 
                   ifelse(greater," more"," less")," than women") 
   }
}
```

## Given Problem : `r ifelse(two.sided,"Two","One")`-Tailed Test

```{r,echo=FALSE}
string=glue("The local baseball team conducts a study to find the amount spent on refreshments at the ball park. Over the course of the season they gather simple random samples of {x$result$n1} men and {x$result$n2} women. For men, the average expenditure was ${x$result$m1}, with a standard deviation of ${x$result$s1}. For women, it was ${x$result$m2}, with a standard deviation of ${x$result$s2}.

{claim}. Assume that the two populations are independent and normally distributed.")

textBox(string)
```

## Hypothesis test

This lesson explains how to conduct a hypothesis test for the difference between two means. The test procedure, called the two-sample t-test, is appropriate when the following conditions are met:

- The sampling method for each sample is simple random sampling.

- The samples are independent.

- Each population is at least 20 times larger than its respective sample.

- The sampling distribution is approximately normal, which is generally the case if any of the following conditions apply.

     + The population distribution is normal.
     
     + The population data are symmetric, unimodal, without outliers, and the sample size is 15 or less.
     
     + The population data are slightly skewed, unimodal, without outliers, and the sample size is 16 to 40.
     
     + The sample size is greater than 40, without outliers.


### This approach consists of four steps: 

1. state the hypotheses

2. formulate an analysis plan

3. analyze sample data  

4. interpret results.


### 1. State the hypotheses 

The first step is to state the null hypothesis and an alternative hypothesis.

$$Null\ hypothesis(H_0): \mu_1-\mu_2 `r ifelse(two.sided,"=",ifelse(less,">=","<="))` `r x$result$mu`$$
$$Alternative\ hypothesis(H_1): \mu_1-\mu_2 `r ifelse(two.sided, "\\neq" ,ifelse(less,"<",">"))` `r x$result$mu`$$

Note that these hypotheses constitute a `r ifelse(two.sided,"two","one")`-tailed test. `r ifelse(two.sided,twoS,ifelse(less,lessS,greaterS))`.


### 2. Formulate an analysis plan. 

For this analysis, the significance level is `r (1-x$result$alpha)*100`%. Using sample data, we will conduct a **two-sample t-test** of the null hypothesis.

### 3. Analyze sample data

Using sample data, we compute the standard error (SE), degrees of freedom (DF), and the t statistic test statistic (t).


$$SE=\sqrt{\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}}$$
$$SE=\sqrt{\frac{`r x$result$s1`^2}{`r x$result$n1`}+\frac{`r x$result$s2`^2}{`r x$result$n2`}}$$
$$SE=`r round(x$result$se,3)`$$


 $$DF=\frac{(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2})^2}{\frac{(s_1^2/n_1)^2}{n_1-1}+\frac{(s_2^2/n_2)^2}{n_2-1}}$$
 
$$DF=\frac{(\frac{`r x$result$s1`^2}{`r x$result$n1`}+\frac{`r x$result$s2`^2}{`r x$result$n2`})^2}{\frac{(`r x$result$s1`^2/`r x$result$n1`)^2}{`r x$result$n1`-1}+\frac{(`r x$result$s2`^2/`r x$result$n2`)^2}{`r x$result$n2`-1}}$$
$$DF=`r round(x$result$DF,2)`$$

$$t=\frac{(\bar{x_1}-\bar{x_2})-d}{SE} = \frac{(`r x$result$m1` -`r x$result$m2`)-`r x$result$mu`}{`r round(x$result$se,3)`}=`r round(x$result$t,3)`$$

where $s_1$ is the **standard deviation** of sample 1, $s_2$ is the standard deviation of sample 2, $n_1$ is the size of sample 1, $n_2$ is the size of sample 2, $\bar{x_1}$ is the mean of sample 1, $\bar{x_2}$ is the mean of sample 2, d is the hypothesized difference between population means, and SE is the standard error.

We can plot the mean difference.
```{r}
plot(x)
```

Since we have a `r ifelse(two.sided,"two","one")`-tailed test, the P-value is the probability that the t statistic having `r round(x$result$DF,2)` degrees of freedom is `r if(!greater) "less than"` `r if(!greater) round(-abs(x$result$t),2)` `r if(!less) "or greater than "` `r if(!less) round(abs(x$result$t),2)`.

We use the t Distribution curve to find p value.

```{r}
draw_t(DF=round(x$result$DF,2),t=x$result$t,alternative=x$result$alternative)
```

### 4. Interpret results. 

Since the P-value (`r round(x$result$p,3)`) is `r ifelse(x$result$p>x$result$alpha,"greater","less")` than the significance level (`r x$result$alpha`), we can`r if(x$result$p>x$result$alpha) "not"` reject the null hypothesis.

### Result of meanCI()

```{r,echo=FALSE}
print(x)
```


### Reference

The contents of this document are modified from StatTrek.com.
Berman H.B., "AP Statistics Tutorial", [online] Available at:  https://stattrek.com/hypothesis-test/difference-in-means.aspx?tutorial=AP URL[Accessed Data: 1/23/2022]. 
