---
title: "Logical Operators"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{logicals-vignette}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

To start, load the package.

```{r setup}
library(extraoperators)
```

# Logical Comparisons

This section covers basic logical comparisons and shows how they might
be done in base `R` versus using the `extraoperators` package. Many of
these are quite simple, but are defined so that later operators are
possible.

First let's define our "data" as some numbers stored in `sample_numbers`.

```{r}

sample_numbers <- c(9, 1, 5, 3, 4, 10, 99)

``` 

Now we can do a series of simple logical comparisons which return a
logical vector of `TRUE` or `FALSE`.

```{r}

## base R: greater than 3?
sample_numbers > 3

## base R: greater than or equal to 3?
sample_numbers >= 3

## base R: less than 3?
sample_numbers < 3

## base R: less than or equal to 3?
sample_numbers <= 3

```

Unfortunately, we cannot use `<` or `>` in custom operators, so we use
the substitutions: `g = >` and `l = <` and `e = =`. So that `ge = <=`
etc.

```{r}

## extraoperators: greater than 3?
sample_numbers %g% 3

## extraoperators: greater than or equal to 3?
sample_numbers %ge% 3

## extraoperators: less than 3?
sample_numbers %l% 3

## extraoperators: less than or equal to 3?
sample_numbers %le% 3

```

So far there is no real gain in using `extraoperators` but this changes
for more complex operations. What if we want to know if our values
fall within some range? This is a fairly common task, such as saying
that valid ages must be between 0 and 100 years.

```{r} 

## base R: greater than 3 and less than 10?
sample_numbers > 3 & sample_numbers < 10

## base R: greater than or equal to 3 and less than 10?
sample_numbers >= 3 & sample_numbers < 10

## base R: greater than 3 and less than or equal to 10?
sample_numbers > 3 & sample_numbers <= 10

## base R: greater than or equal to 3 and less than or equal to 10?
sample_numbers >= 3 & sample_numbers <= 10

```

Base `R` accomplishes this through chaining of
operations. `extraoperators` has built in range operators.


```{r} 

## extraoperators: greater than 3 and less than 10?
sample_numbers %gl% c(3, 10)

## extraoperators: greater than or equal to 3 and less than 10?
sample_numbers %gel% c(3, 10)

## extraoperators: greater than 3 and less than or equal to 10?
sample_numbers %gle% c(3, 10)

## extraoperators: greater than or equal to 3 and less than or equal to 10?
sample_numbers %gele% c(3, 10)

```

Finally, `extraoperators` includes a `not in` operator, `%!in%`.

```{r}

## base R: not in 3 or 10
!sample_numbers %in% c(3, 10)

## extraoperators: not in 3 or 10
sample_numbers %!in% c(3, 10)

``` 

The next sections show a few examples of these operators augmented by
prefixes: `?`, `s` and `a`.

# Indices (Which Values?)

Sometimes we want to use a logical comparison and identify indices,
such as to use in a loop. `extraoperators` does this by prefixing
operators with `?` for "which".

```{r}

## base R: what are the indices that match 3 and 10?
which(sample_numbers %in% c(3, 10))

## extraoperators: what are the indices that match 3 and 10?
sample_numbers %?in% c(3, 10)

## base R: what are the indices for numbers between 3 and 10?
which(sample_numbers > 3 & sample_numbers < 10)

## extraoperators: what are the indices for numbers between 3 and 10?
sample_numbers %?gl% c(3, 10)

``` 

This can be readily incorporated in other code for further processing.

# Subsetting

Another fairly common task is selecting only certain observations.
For example, we might want to calculate the average of numbers within
a plausible range (e.g., excluding outliers).
In `extraoperators` subsetting is done by adding an `s` prefix.

```{r}

## base R: subset to only numbers between 3 and 10
mean(subset(sample_numbers, sample_numbers > 3 & sample_numbers < 10))

## or equivalently 
mean(sample_numbers[sample_numbers > 3 & sample_numbers < 10])

## extraoperators: subset to only numbers between 3 and 10
mean(sample_numbers %sgl% c(3, 10))

```

Subsetting can be especially useful in quick exploratory
analyses. Graphs are easily hard to read if there are extreme
values. Subsetting makes it fast to "zoom in" on a specific range.

# All (or None)

Finally, you might have some quality controls in place for data. For
example asserting that all ages are between 0 and 100. In `extraoperators`
this is done by adding the prefix `a`.

```{r}

## base R: are all numbers between 0 and 10?
all(sample_numbers > 0 & sample_numbers < 10)

## extraoperators: are all numbers between 0 and 10?
sample_numbers %agl% c(0, 10)

## extraoperators: are all numbers between 0 and 100?
sample_numbers %agl% c(0, 100)

```

If you want to know the opposite, are no numbers between 0 and 100, we
can negate the whole operation.

```{r}

## extraoperators: are NO numbers between 0 and 100?
!sample_numbers %agl% c(0, 100)

## extraoperators: are NO numbers between 55 and 60?
!sample_numbers %agl% c(55, 60)

```

There are also expanded all, subset, and which operators for equals
and not equals.

```{r}

## extraoperators: are all values equal?
sample_numbers %a==% sample_numbers

## extraoperators: are all values NOT equal?
c(1, 3, 5) %a!=% c(5, 1, 3)

```

# Chaining

In language, it is fairly natural to make a statement like this:
"In my study, age should be between 18 to 65 and not be missing."
In `R`, the usual implementation of this is more equivalent to:
"In my study, age should be greater than 18 and age should be less
than 65 and age should not be missing." `extraoperators` tries to
facilitate something closer to the cleaner original statement using
the chaining operator, `%c%`. The chaining operator chains a set of 
operations on the right hand side with the argument on its left hand
side passed to each. To accomplish this, the right hand side must be
quoted.

```{r}
age <- c(19, 30, 90, 50, NA, 45)
age %c% "(> 18 & < 65) & !is.na"
```

Because the right hand side of the chaining operator is a character 
string that is parsed, it is possible to do some special things in
it. `is.na`, `!is.na`, `is.nan`, and `!is.nan` are special characters 
that do not require any further value to work correctly. As shown,
parentheses also work, which allows fine grained control over exactly
what is intended. For example if we expect only adults are in the
study, but those who refused to report their age were coded -9 and
people who failed to complete the questionnaire at all are missing.

```{r}
age <- c(19, 30, 90, 50, NA, 16, -9)
age %c% "(> 18 | == -9) & !is.na"
```

As with all operators, there are prefixes for all, subset, and which.

```{r}
age %ac% "(> 18 | == -9) & !is.na"

age %ac% "> -Inf"
age %ac% "> -Inf & !is.na"
age %ac% "> -Inf | is.na"

age %sc% "(> 18 | == -9) & !is.na"

age %?c% "(> 18 | == -9) & !is.na"
```


# Interval Notation Operator

In math, interval notation often is used. For example, we might write:
\(x \in (1, 5) \cup [6, \infty)\)
to indicate that *x* is between the intervals 1 to 5 (not including 1
or 5) or between 6 and positive infinity, including 6 but not positive
infinity.
The interval notation operator, `%e%` let's you use fairly similar
language in `R`. "|" is the union operator and "&" is the intersect
operator. Variables are allowed but no functions as these cannot be
parsed.

```{r}

c(1, 2, 6, 300) %e% "(1, 5) | [6, Inf)"

## this is OK
x <- max(mtcars$mpg)
c(1, 2, 6, 300) %e% "(1, 5) | [6, x)"

## ## this would NOT be OK
## c(1, 2, 6, 300) %e% "(1, 5) | [6, max(mtcars$mpg))"


```

# Regular Expressions

Sometimes you want to pattern match. For example, you might want to
find all variable names that match a certain pattern. The `%grepl% operator
can help here, built off `R`'s `grepl` function.
  
```{r}
## sample dataset
data <- data.frame(
  ID = c(1, 2, 3),
  cesd_1 = c(4, 5, 6),
  cesd_2 = c(7, 8, 9),
  cesd_total = c(11, 13, 15)
)

## find all variables that start with "cesd"
names(data)[grepl("^cesd", names(data))]

## and the less typing version using operators
names(data) %sgrepl% "^cesd"

```