---
title: "`bluebike`: A Data Package for Bluebike users"
author: "Ziyue Yang and Tianshu Zhang"
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{`bluebike`: A Data Package for Bluebike users}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
tags:
  - R
  - Rstats
  - tidyverse
  - leaflet
  - bluebike
authors:
  - name: Ziyue Yang
    orcid: 0000-0002-9299-8327
  - name: Tianshu Zhang
    orcid: 0000-0002-3004-4472
output:
  rmarkdown::html_vignette:
    df_print: default
    number_sections: no
---
```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, message = FALSE}
# needed packages in vignette
library(bluebike)
library(dplyr)
library(leaflet)
```

## Summary

Our package includes data from the Boston Blue Bike trip history data acquired from the [Blue Bikes System Data](https://www.bluebikes.com/system-data). The users can import all monthly trip history data from 2020 to 2022 into a cleaned data set that can easily be used for data analysis. \
  
The package also includes a sample data set that includes 1000 sampled trip history from Feb. 2022, and a full data set that contains information about all available stations. 
  Functions inside the package:\
  
- `import_month_data`: takes in numeric year/month values and imports data for the specified time\

- `station_distance`: returns stations with distance in ascending order given the user's current location\

- `station_radius`: plots the position of the stations within walking distance (500 m), and present the basic information about the stations via leaflet\

- `trip_distance`: computes the geographical distance between the start and end stations\ 


The package would be a useful tool for the Blue Bike operations to analyze the trip data and help improve the shared bike service based on user data. It is also an easy-to-use tool for data analysis and visualization for anyone interested in the Blue Bike trip data.
## Data Sets Included

- `trip_history_sample`: a sample of 1000 trip data entries from February 2022.
- `station_data`: A dataset that includes identification, position, and other basic information about bluebike stations

## Basic Usage

```{r, message=FALSE, warning=FALSE}
library(bluebike)
library(dplyr)
```

### Retrieve data online

`import_month_data` enables users to retrieve monthly data from Bluebike System Data website.
```{r}
jan2015 <- import_month_data(2015, 1)
```

### Data Wrangling

- Using the cleaned dataset `trip_history_sample` included in the package, the user can easily find out the most popular station in Feb. 2022:

```{r example, message=FALSE, warning=FALSE}
stations <- trip_history_sample %>% 
  group_by(start_station_name) %>% 
  summarize(trips_from = n())
head(stations)
```


- Via `trip_distance`, the user can compute the the average distance that user traveled in Jan. 2015
```{r}
jan_distance <- jan2015 %>% 
  sample_n(1000) %>% 
  trip_distance()
mean_jan_distance <- mean(jan_distance$distance)

mean_jan_distance
```


- The function `station_distance()` helps the user to find the closest stations nearby. 
```{r}
top_5_station <- station_distance(-71.13, 42.36) %>%
  head(5)

top_5_station
```

### Data Visualization via Leaflet

- Incorporated with the interactive map package `leaflet`, the position of the stations can be displayed:



```{r}
library(leaflet)

BostonMap <- leaflet(data = station_data) %>% 
  addTiles() %>% 
  addCircleMarkers(lng = station_data$longitude, 
                   lat = station_data$latitude, 
                   radius = 0.1, 
                   color = "blue")

BostonMap
```


- The function `station_radius()` plots the positions of stations within a certain user defined radius and display basic information about stations available. 

```{r}
station_500 <- station_radius(-71.13, 42.36, r = 500)

station_500
```


## Contributors

-   [Ziyue Yang](https://github.com/zyang2k)
-   [Tianshu Zhang](https://github.com/tianshu-zhang)
