---
title: "Getting started"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(astgrepr)
```

This vignette will give you the basic knowledge so that you can start using
`astgrepr`. If you want to know more about advanced rules and other topics, I
invite you to read the docs of the Rust crate
[`ast-grep`](https://ast-grep.github.io/guide/pattern-syntax.html), on which
this package is built. 


## First steps

My main incentive for building this package was to provide a faster linter for
R code. [`lintr`](https://lintr.r-lib.org/) is a great tool, but I'm jealous of 
the Python ecosystem that has a lightning-fast linter 
([`Ruff`](https://docs.astral.sh/ruff/)).

Therefore, as a motivation for this vignette, let's say we have the following 
code and that we want to find bad patterns:

```{r}
src <- "x <- rnorm(100, mean = 2)
any(is.na(y))
plot(x)
any(is.na(x))
any(duplicated(variable))"
```

I can already see two of them:

* `any(is.na())` is slower than `anyNA()` ([`lintr`](https://lintr.r-lib.org/reference/any_is_na_linter.html))
* `any(duplicated())` is slower than `anyDuplicated() > 0` ([`lintr`](https://lintr.r-lib.org/reference/any_duplicated_linter.html))

Let's start by building the abstract syntax tree (AST) corresponding to this
code. This has to be the first step, all other functions depend on this tree:

```{r}
root <- src |>
  tree_new() |>
  tree_root()

root
```


## Rules and nodes

"Rules" are one of the key elements of `astgrepr`. They basically define what
we are looking for in the code. One can build a simple rule with `ast_rule()`:

```{r}
ast_rule(id = "any_na", pattern = "any(is.na($VAR))")
```

There are many arguments in `ast_rule()`, and one can also include `pattern_rule()`
and `relational_rule()` but we keep it simple for now. Once a rule is created,
it can be applied on a node:

```{r}
root |> 
  node_find(
    ast_rule(id = "any_na", pattern = "any(is.na($VAR))"),
    ast_rule(id = "any_dup", pattern = "any(duplicated($VAR))")
  )
```

We can see that most `astgrepr` functions will return a nested list. Lists are 
nested on two levels: rules and nodes. For each rule, there is a specific number
of nodes that were matched. 

Here, `node_find()` returned a list of two rules, and each of them contains a 
single node. This is expected: `node_find()` stops after the first node that 
matches the rule. If we want to look for all nodes that match this rule, we can 
use `node_find_all()`:

```{r}
found_nodes <- root |> 
  node_find_all(
    ast_rule(id = "any_na", pattern = "any(is.na($VAR))"),
    ast_rule(id = "any_dup", pattern = "any(duplicated($VAR))")
  )

found_nodes
```

More generally, most functions come with a single-node and a multi-node variants.
For instance, , we use `node_text_all()` to extract the text corresponding to 
each node and `node_range_all()` to get their start and end coordinates in the
original code^[Note that in each sublist, the first value refers to the row and 
second one to the column. Also, those values are 0-indexed, so `1` corresponds
to the second row/column.]: 

```{r}
found_nodes |> 
  node_text_all()
found_nodes |> 
  node_range_all()
```

Let's sum up what we have. So far, we have the original code (`root$text()`),
the location (`$range()`) and content (`$text()`) of the patterns we were looking
for. This is already enough to build a linter^[Of course, more work is needed to
make the IDE report those lints, but this is outside the scope of `astgrepr`.].

`astgrepr` offers another feature: code rewriting.


## Modifying nodes

Wouldn't it be nice if our IDE (say, RStudio) could automatically fix those
patterns?

To do so, we need two new functions: `node_replace_all()` and `tree_rewrite()`.
The first one takes a list of replacements for each rule, and the second one
rewrites a node based on those replacements. First, let's see what `node_find_all()`
looks like:

```{r}
nodes_to_replace <- root |>
  node_find_all(
    ast_rule(id = "any_na", pattern = "any(is.na($VAR))"),
    ast_rule(id = "any_dup", pattern = "any(duplicated($VAR))")
  )

nodes_to_replace
fixes <- nodes_to_replace |>
  node_replace_all(
    any_na = "anyNA(~~VAR~~)",
    any_dup = "anyDuplicated(~~VAR~~) > 0"
  )

fixes
```

It returns a nested list (once again) with the replacement for each node and the
coordinates indicating where this replacement should be inserted. To finalize our
code rewrite, we now need to apply those changes to the original tree with 
`tree_rewrite()`:

```{r}
# original code
cat(src)
# new code
tree_rewrite(root, fixes)
```

And that's it. Building a linter or a code rewriter is a massive effort that 
is not among `astgrepr` objectives, but I hope this tool can serve as a 
foundation to build one.
