---
title: "Getting started with workflowr"
subtitle: "workflowr version `r utils::packageVersion('workflowr')`"
author: "John Blischak"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Getting started with workflowr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r decide-to-execute, cache=FALSE, echo=FALSE}
library("knitr")

# The code in this vignette requires a functional Git setup. If a workflowr user
# has a .git directory upstream of R's temporary diretory, then wflow_start will
# fail. If this situation is detected, the code is not evaluated.
if (git2r::in_repository(tempdir())) {
  opts_chunk$set(eval = FALSE)
  warning(workflowr:::wrap(
    "Because you have a Git repository upstream of R's temporary directory,
    none of the code below was executed. Please refer to the online
    documentation to see the output:
    https://workflowr.github.io/workflowr/articles/wflow-01-getting-started.html
    \n\nYou should consider removing the directory since it was likely created
    in error: ",
    workflowr:::git2r_slot(git2r::repository(tempdir(), discover = TRUE), "path")))
}

# The code in this vignette requires pandoc. Not every CRAN server has pandoc
# installed.
if (!rmarkdown::pandoc_available()) {
  opts_chunk$set(eval = FALSE)
  message(workflowr:::wrap(
    "The code chunks below were not executed because this machine does not
    have pandoc installed."
  ))
}
```

```{r chunk-options, cache=FALSE, include=FALSE}
.tmp <- tempfile("wflow-01-getting-started-")
.tmp <- workflowr:::absolute(.tmp)
.project <- file.path(.tmp, "myproject")
fs::dir_create(.project)
opts_knit$set(root.dir = .project)
opts_chunk$set(collapse = TRUE)
```

The workflowr R package helps scientists organize their research in a way that
promotes effective project management, reproducibility, collaboration, and
sharing of results. Workflowr combines literate programming (knitr and
rmarkdown) and version control (Git, via git2r) to generate a website containing
time-stamped, versioned, and documented results. Any R user can quickly and
easily adopt workflowr.

This tutorial assumes you have already followed the [installation
instructions](https://workflowr.github.io/workflowr/index.html#installation).
Specifically, you need to have R, pandoc (or RStudio), and workflowr installed
on your computer. Furthermore, you need an account on [GitHub][gh] or
[GitLab][gl].

[gh]: https://github.com
[gl]: https://about.gitlab.com/

## Overview

A workflowr project has two key components:

1. An R Markdown-based website. This consists of a configuration file
(`_site.yml`), a collection of R Markdown files, and their
corresponding HTML files.

2. A Git repository. Git is a [version control system][vcs] that helps track
code development^[There are many ways to use Git: in the Terminal, in the RStudio
Git pane, or another Git graphical user interface (GUI) (see
[here](https://git-scm.com/download/gui/linux) for GUI options).]. Workflowr is
able to run the basic Git commands, so there is no need to install Git prior to
using workflowr.

One of the main goals of workflowr is to help make your research more
transparent and reproducible. This is achieved by displaying multiple
"reproducibility checks" at the top of each analysis, including the unique
identifier that Git assigns a snapshot of your code (or "commit" as Git calls
it), so you always know which version of the code produced the results.

[vcs]: https://en.wikipedia.org/wiki/Version_control

## Start the project

To start a new project, open R (or RStudio) and load the workflowr package (note
that all the code in this vignette should be run directly in the R console, i.e.
do **not** try to run workflowr functions inside of R Markdown documents).

```{r load-workflowr}
library("workflowr")
```

If you have never created a Git repository on your computer before, you need to
run the following command to tell Git your name and email. Git uses this
information to assign the changes you make to the code to you (analogous to how
Track Changes in a Microsoft Office Word document assigns your changes to you).
You do not need to use the exact same name and email as you used for your
account on GitHub or GitLab. Also, you only need to run this command once per
computer, and all subsequent workflowr projects will use this information (you
can also update it at any time by re-running the command with different input).

```{r wflow-git-config, eval=FALSE}
# Replace the example text with your information
wflow_git_config(user.name = "Your Name", user.email = "email@domain")
```

Now you are ready to start your first workflowr project!
`wflow_start("myproject")` creates a directory called `myproject/` that contains
all the files to get started. It also changes the working directory to
`myproject/`^[If you're using RStudio, you can alternatively create a new
workflowr project using the RStudio project template. Go to `File` -> `New
Project...` and select `workflowr project` from the list of project types. In
the future you can return to your project by choosing `Open Project...` and
selecting the file `myproject.Rproj`. This will set the correct working
directory in the R console, switch the file navigator to the project, and
configure the Git pane.] and initializes a Git repository with the initial
commit already made.

```{r wflow-start, eval=FALSE}
wflow_start("myproject")
```

```{r wflow-start-hidden, echo=FALSE}
setwd(.tmp)
unlink(.project, recursive = TRUE)
wflow_start("myproject", user.name = "Your Name", user.email = "email@domain")
```

`wflow_start()` created the following directory structure in `myproject/`:

```
myproject/
├── .gitignore
├── .Rprofile
├── _workflowr.yml
├── analysis/
│   ├── about.Rmd
│   ├── index.Rmd
│   ├── license.Rmd
│   └── _site.yml
├── code/
│   ├── README.md
├── data/
│   └── README.md
├── docs/
├── myproject.Rproj
├── output/
│   └── README.md
└── README.md
```

At this point, you have a minimal but complete workflowr project; that is, you 
have all the files needed to use the main workflowr commands and publish a 
research website. Later on, as you get more comfortable with the basic setup, 
you can modify and add to the initial file structure. The overall rationale for
this setup is to help organize the files that will be commonly included in a
data analysis project. However, not all of these files are required to use
workflowr.

The two **required** subdirectories are `analysis/` and `docs/`. These
directories should never be removed from the workflowr project.

* `analysis/`: This directory contains all the source R Markdown files for
implementing the data analyses for your project. It also contains a special R
Markdown file, `index.Rmd`, that does not contain any R code, but will be used
to generate `index.html`, the homepage for your website. In addition, this
directory contains the important configuration file `_site.yml`, which you can
use to edit the theme, navigation bar, and other website aesthetics (for more
details see the documentation on [R Markdown websites][rmd-website]). Do not
delete `index.Rmd` or `_site.yml`.

[rmd-website]: https://bookdown.org/yihui/rmarkdown/rmarkdown-site.html

* `docs/`: This directory contains all the HTML files for your
website. The HTML files are built from the R Markdown files in
`analysis/`. Furthermore, any figures created by the R Markdown files
are saved here. Each of these figures is saved according to the
following pattern: `docs/figure/<insert Rmd filename>/<insert chunk
name>-#.png`, where `#` corresponds to which of the plots the chunk
generated (since one chunk can produce an arbitrary number of plots)^[Because of
this requirement, you can't customize the knitr option `fig.path` (which
controls where figure files are saved) in any R Markdown file that is part of a
workflowr project. If you do set it, it will be ignored and workflowr will
insert a warning into the HTML file to alert you.].

The workflowr-specific configuration file is `_workflowr.yml`. It will apply the
workflowr reproducibility checks consistently across all your R Markdown files.
The most critical setting is `knit_root_dir`, which determines the directory
where the files in `analysis/` will be executed. The default is to execute the
code in the root of the project where `_workflowr.yml` is located (i.e. `"."`).
To instead execute the code from `analysis/`, change the setting to
`knit_root_dir: "analysis"`. See `?wflow_html` for more details.

Also required is the RStudio project file, in this example `myproject.Rproj`. 
Even if you are not using RStudio, do not delete this file because the workflowr
functions rely on it to determine the root directory of the project. 

The **optional** directories are `data/`, `code/`, and `output/`.
These directories are suggestions for organizing your data analysis
project, but can be removed if you do not find them useful.

* `data/`: This directory is for raw data files.

* `code/`: This directory is for code that might not be appropriate to include
in R Markdown format (e.g. for pre-processing the data, or for long-running
code).

* `output/`: This directory is for processed data files and other
outputs generated from the code and data. For example, scripts in
`code/` that pre-process raw data files from `data/` should save the
processed data files in `output/`.

The `.Rprofile` file is a regular R script that is run once when the project is
opened. It contains the call `library("workflowr")`, ensuring that workflowr is
loaded automatically each time a workflowr-project is opened.

## Build the website

You will notice that the `docs/` directory is currently empty. That is
because we have not yet generated the website from the `analysis/`
files. This is what we will do next.

To build the website, run the function `wflow_build()` in the R
console:

```{r wflow-build, eval=FALSE}
wflow_build()
```

```{r wflow-build-hidden, echo=FALSE}
# Don't want to actually open the website when building the vignette
wflow_build(view = FALSE)
```

This command builds all the R Markdown files in `analysis/` and saves
the corresponding HTML files in `docs/`. It sets the same seed before
running every file so that any function that generates random data
(e.g. permutations) is reproducible. Furthermore, each file is built
in its own external R session to avoid any potential conflicts between
analyses (e.g. accidentally sharing a variable with the same name across files).
Lastly, it displays the website in the RStudio Viewer or default web browser.

The default action of `wflow_build()` is to behave similar to a
[Makefile](https://swcarpentry.github.io/make-novice/) (`make = TRUE` is the
default when no input files are provided), i.e. it only builds R Markdown files
that have been modified more recently than their corresponding HTML files. Thus
if you run it again, no files are built (and no files are displayed).

```{r wflow-build-no-action}
wflow_build()
```

To view the site without first building any files, run `wflow_view()`, which by
default displays the file `docs/index.html`:

```{r wflow-view, eval=FALSE}
wflow_view()
```

This is how you can view your site right on your local machine. Go ahead and
edit the files `index.Rmd`, `about.Rmd`, and `license.Rmd` to describe your
project. Then run `wflow_build()` to re-build the HTML files and display them in
the RStudio Viewer or your browser.

```{r edit-files, include=FALSE}
for (f in file.path("analysis", c("index.Rmd", "about.Rmd", "license.Rmd"))) {
  cat("\nedit\n", file = f, append = TRUE)
}
```

## Publish the website

workflowr makes an important distinction between R Markdown files that are
published versus unpublished. A published file is included in the website
online; whereas, the HTML file of an unpublished R Markdown file is only able to
be viewed on the local computer. Since the project was just started, there are
no published files. To view the status of the workflowr project, run
`wflow_status()`.

```{r wflow-status}
wflow_status()
```

This alerts us that our project has 3 R Markdown files, and they are all
unpublished ("Unp"). Furthermore, it instructs how to publish them: use
`wflow_publish()`. The first argument to `wflow_publish()` is a character vector
of the R Markdown files to publish ^[Instead of listing each file individually,
you can also pass [file globs](https://en.wikipedia.org/wiki/Glob_(programming))
as input to any workflowr function, e.g. `wflow_publish("analysis/*Rmd",
"Publish the initial files for myproject")`.]. The second is a message that will
be recorded by the version control system Git when it commits (i.e. saves a
snapshot of) these files. The more informative the commit message the better (so
that future you knows what you were trying to accomplish).

```{r wflow-publish, eval=FALSE}
wflow_publish(c("analysis/index.Rmd", "analysis/about.Rmd", "analysis/license.Rmd"),
              "Publish the initial files for myproject")
```

```{r wflow-publish-hidden, echo=FALSE}
# Don't want to actually open the website when building the vignette
wflow_publish(c("analysis/index.Rmd", "analysis/about.Rmd", "analysis/license.Rmd"),
              "Publish the initial files for myproject",
              view = FALSE)
```

`wflow_publish()` reports the 3 steps it took:

* **Step 1:** Commits the 3 R Markdown files using the custom commit message

* **Step 2:** Builds the HTML files using `wflow_build()`

* **Step 3:** Commits the 3 HTML files plus the files that specify the style of
the website (e.g. CSS and JavaScript files)

Performing these 3 steps ensures that the HTML files are always in sync with the
latest versions of the R Markdown files. Performing these steps manually would
be tedious and error-prone (e.g. an HTML file may have been built with an
outdated version of an R Markdown file). However, `wflow_publish()` makes it
easy to keep the pages of your site in sync.

Now when you run `wflow_status()`, it reports that all the files are published
and up-to-date.

```{r wflow-status-post-publish}
wflow_status()
```

## Deploy the website

At this point you have built a version-controlled website that exists on your
local computer. The next step is to put your code on GitHub so that it can serve
your website online. If you are using GitLab, switch to the vignette [Hosting
workflowr websites using GitLab](wflow-06-gitlab.html) and then continue with
the next section.

All the required setup can be performed by the workflowr function
`wflow_use_github()`. The only required argument is your GitHub username^[The
default is to name the GitHub repository using the same name as the directory
that contains the workflowr project. This is likely what you used with
`wflow_start()`, which in this case was `"myproject"`. If you'd prefer the
GitHub repository to have a different name, or if you've already created a
GitHub repo with a different name, you can pass the argument `repository =
"other-name"`.]:

```{r wflow-use-github, eval=FALSE}
wflow_use_github("myname")
```

```{r wflow-use-github-hidden, echo=FALSE}
# Don't want to try to authenticate on GitHub
wflow_use_github("myname", create_on_github = FALSE)
```

This has two main effects on your local machine: 1) it configures Git to
communicate with your future GitHub repository, and 2) it inserts a link to your
future GitHub repository into the navigation bar (you'll need to run
`wflow_build()` or `wflow_publish()` to observe this change). Furthermore,
`wflow_use_github()` will prompt you to ask if you'd like to authorize workflowr
to automatically create the repository on GitHub. If you agree, a browser tab
will open, and you will need to authenticate with your username and password,
and then give permission to the "workflowr-oauth-app" to access your
account^[This sounds scarier than it actually is. The "workflowr-oauth-app" is
simply a formality for GitHub to grant authorization. The "app" itself is the R
code running on your local machine. Once `wflow_use_github()` finishes, the
authorization is deleted, and nothing (and no one) can access your account].

If you decline the offer from `wflow_use_github()` to automatically create the
GitHub repository, you need to manually create it. To do this, login to your
account on GitHub and create a new repository following these
[instructions][new-repo]. The screenshot below shows the menu in the topright of
the webpage.

<img src="img/github-new-repo.png" alt="Create a new repository on GitHub."
  style="display: block; margin: auto; border: black 1px solid">

<p class="caption" style="text-align: center;">
Create a new repository on GitHub.
</p>

Note that in this tutorial the GitHub repository also has the name
"myproject." This isn't strictly necessary (you can name your GitHub repository
whatever you like), but it's generally good organizational practice to use the
same name for both your GitHub repository and the local directory on your
computer.

Next, you need to send your files to GitHub. Push your files to GitHub with the
function `wflow_git_push()`:^[Unfortunately this can fail for many different
reasons. If you already regularly use `git push` in the Terminal, you will
probably want to continue using this. If you don't have Git installed on your
computer and thus must use `wflow_git_push()`, you can search the [git2r
Issues](https://github.com/ropensci/git2r/issues) for troubleshooting ideas.]

```{r wflow-git-push}
wflow_git_push(dry_run = TRUE)
```

Using `dry_run = TRUE` previews what the function will do. Remove this argument
to actually push to GitHub. You will be prompted to enter your GitHub username
and password for authentication^[If you'd prefer to use SSH keys for
authentication, please see the section [Setup SSH
keys](wflow-02-customization.html#setup-ssh-keys).]. Each time you make changes
to your project, e.g. run `wflow_publish()`, you will need to run
`wflow_git_push()` to send the changes to GitHub.

Lastly, now that your code is on GitHub, you need to tell GitHub that you want
the files in `docs/` to be published as a website. Go to Settings -> GitHub
Pages and choose "master branch docs/ folder" as the Source
([instructions][publish-docs]). Using the hypothetical names above, the
repository would be hosted at the URL `myname.github.io/myproject/`^[It may take
a few minutes for the site to be rendered.]. If you scroll back down to the
GitHub Pages section of the Settings page, you can click on the URL there.

[new-repo]: https://docs.github.com/articles/creating-a-new-repository
[publish-docs]: https://docs.github.com/articles/configuring-a-publishing-source-for-github-pages

## Add a new analysis file

Now that you have a functioning website, the next step is to start analyzing
data! Create a new R Markdown file, save it as `analysis/first-analysis.Rmd`,
and open it in your preferred text editor (e.g. RStudio). Alternatively, you can
use the convenience function `wflow_open()`, which will create the file (and
open it if you are using RStudio):

```{r create-file, eval=FALSE}
wflow_open("analysis/first-analysis.Rmd")
```

```{r create-file-hidden, echo=FALSE}
# Don't want to actually open the file when building the vignette in RStudio
wflow_open("analysis/first-analysis.Rmd", edit_in_rstudio = FALSE)
```

Now you are ready to start writing! Go ahead and add some example code. If you
are using RStudio, press the Knit button to build the file and see a preview in
the Viewer pane. Alternatively from the R console, you can run `wflow_build()`
again (this function can be run from the base directory of your project or any
subdirectory).

Check out your new file `first-analysis.html`. Near the top you will see the
workflowr reproducibility report. If you click on the button, the full menu will
drop down. Click around to learn more about the reproducibility safety checks,
why their important, and whether or not the file passed or failed each one.
You'll notice that the first check failed because the R Markdown file had
uncommitted changes. This is OK now since the file is a draft. Once you are
ready to publish it to share with others, you can use `wflow_publish()` to
ensure that any changes to the R Markdown file are committed to the Git
repository prior to generating the results.

In order to make it easier to navigate to your new file, you can include a link
to it on the main index page. First open `analysis/index.Rmd` (optionally using
`wflow_open()`). Second paste the following line into `index.Rmd`:

```
Click on this [link](first-analysis.html) to see my results.
```

```{r edit-index, include=FALSE}
cat("\nClick on this [link](first-analysis.html) to see my results.\n",
    file = "analysis/index.Rmd", append = TRUE)
```

This uses the Markdown syntax for creating a hyperlink (for a quick reference
guide in RStudio click "Help" -> "Markdown Quick Reference"). You specify the
HTML version of the file since this is what comprises the website. Click Knit
(or run `wflow_build()` again) to check that the link works.

Now run `wflow_status()` again. As expected, two files need attention.
`index.Rmd` has status "Mod" for modified. This means it is a published file
that has subsequently been modified. `first-analysis.Rmd` has status "Scr" for
Scratch. This means not only is the HTML not published, but the R Markdown file
is not yet being tracked by Git.

```{r wflow-status-newfile}
wflow_status()
```

To publish the new analysis and the updated index page, again use
`wflow_publish()`:

```{r wflow-publish-newfile, eval=FALSE}
wflow_publish(c("analysis/index.Rmd", "analysis/first-analysis.Rmd"),
              "Add my first analysis")
```

```{r wflow-publish-newfile-hidden, echo=FALSE}
# Don't want to actually open the website when building the vignette
wflow_publish(c("analysis/index.Rmd", "analysis/first-analysis.Rmd"),
              "Add my first analysis", view = FALSE)
```

Lastly, push the changes to GitHub or GitLab with
`wflow_git_push()`^[Alternatively you can run `git push` in the Terminal or use
the RStudio Git Pane.] to deploy these latest changes to the website.

## The workflow

This is the general workflow:^[Note that some workflowr functions are also
available as [RStudio Addins][rstudio-addins]. You may prefer these compared to
running the commands in the R console, especially since you can [bind the addins
to keyboard shortcuts][rstudio-addins-shortcuts].]

[rstudio-addins]: https://rstudio.github.io/rstudioaddins/
[rstudio-addins-shortcuts]: https://rstudio.github.io/rstudioaddins/#keyboard-shorcuts

1. Open a new or existing R Markdown file in `analysis/` (optionally using
`wflow_open()`)

1. Perform your analysis in the R Markdown file (For RStudio users: to quickly
develop the code I recommend executing the code in the R console via Ctrl-Enter
to send one line or Ctrl-Alt-C to execute the entire code chunk)

1. Run `wflow_build()` to view the results as they will
appear on the website (alternatively press the Knit button in RStudio)

1. Go back to step 2 until you are satisfied with the result

1. Run `wflow_publish()` to commit the source files (R Markdown files or other
files in `code/`, `data/`, and `output/`), build the HTML files, and commit the
HTML files

1. Push the changes to GitHub or GitLab with `wflow_git_push()` (or `git push`
in the Terminal)

This ensures that the code version recorded at the top of an HTML file
corresponds to the state of the Git repository at the time it was built.

The only exception to this workflow is if you are updating the aesthetics of
your website (e.g. anytime you make edits to `analysis/_site.yml`). In this case
you'll want to update all the published HTML files, regardless of whether or not
their corresponding R Markdown files have been updated. To republish every HTML
page, run `wflow_publish()` with `republish = TRUE`. This behavior is only
previewed below by specifying `dry_run = TRUE`.

```{r republish}
wflow_publish("analysis/_site.yml", republish = TRUE, dry_run = TRUE)
```

## Next steps

To learn more about workflowr, you can read the following vignettes:

* [Customize your research website](wflow-02-customization.html)
* [Migrating an existing project to use workflowr](wflow-03-migrating.html)
* [How the workflowr package works](wflow-04-how-it-works.html)
* [Frequently asked questions](wflow-05-faq.html)
* [Hosting workflowr websites using GitLab](wflow-06-gitlab.html)
* [Sharing common code across analyses](wflow-07-common-code.html)
* [Alternative strategies for deploying workflowr websites](wflow-08-deploy.html)
* [Reproducible research with workflowr (workshop)](wflow-09-workshop.html)
* [Using large data files with workflowr](wflow-10-data.html)

## Further reading

* For advice on using R Markdown files to organize your analysis, read the
chapter [R Markdown workflow](https://r4ds.had.co.nz/r-markdown-workflow.html) in
the book [R for Data Science](https://r4ds.had.co.nz/) by Garrett Grolemund and
Hadley Wickham
