---
title: "tikatuwq: From raw water quality data to CONAMA report"
author: "tikatuwq developers"
output:
  rmarkdown::html_vignette:
    number_sections: true
vignette: >
  %\VignetteIndexEntry{tikatuwq: From raw water quality data to CONAMA report}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4,
  dpi = 96,
  message = FALSE,
  warning = FALSE,
  fig.alt = "Figure generated by tikatuwq package"
)
```

## Introduction

The tikatuwq package provides a reproducible workflow for Brazilian water quality assessment. It implements common indices such as the IQA/WQI and Carlson or Lamparelli trophic state indices, together with CONAMA 357/2005 compliance checks, plotting helpers and basic reporting tools.

The package now includes a real dataset containing water quality measurements collected by INEMA (Environmental Agency of Bahia) during monitoring campaigns between 2021 and 2024 in the Rio Buranhem watershed, Porto Seguro, Bahia, Brazil. This dataset is included for demonstration and reproducibility purposes, closely following the analytical workflow presented in this vignette. All variable names correspond to real monitoring attributes.

In this vignette we demonstrate a typical pipeline: reading a raw CSV file, handling censored values (ND / <LD / <LOQ), cleaning units, computing indices, checking legal compliance and generating simple plots and a textual report. The goal is to show an end-to-end example that users can adapt to their own datasets.

This workflow is particularly useful for environmental monitoring programs that need to comply with Brazilian regulations and produce standardized reports for regulatory agencies.

## Package overview and included real data

We start by loading the package and examining the included real dataset from Buranhem River/INEMA:

```{r setup-package}
library(tikatuwq)
library(dplyr)

data("wq_demo", package = "tikatuwq")

# Inspect structure
str(wq_demo)
head(wq_demo)
```

The `wq_demo` dataset contains 20 rows and 14 columns, including:
- `rio`: river name (BURANHEM)
- `ponto`: monitoring point identifier
- `data`: sampling date
- `ph`, `od`, `turbidez`, `dbo`: water quality parameters
- `coliformes`: fecal coliforms
- `p_total`, `nt_total`: phosphorus and nitrogen
- `temperatura`, `tds`: temperature and total dissolved solids
- `lat`, `lon`: coordinates

## Reading and validating water quality data

The first step in any analysis is reading the raw data. The `read_wq()` function handles common CSV formats with robust parsing:

```{r read-data, eval=FALSE}
# Example: reading from a CSV file
# df <- read_wq("path/to/your/data.csv")
```

For this vignette, we use the demo dataset. In practice, `read_wq()` handles:
- Comma or semicolon delimiters
- Comma or dot decimal separators
- Unit suffixes in numeric columns (e.g., "0.04 mg/L")
- Date formats (YYYY-MM-DD or DD/MM/YYYY)
- Coordinate normalization if needed

Next, we validate that required columns are present:

```{r validate-data}
df <- wq_demo

# Validate required columns
df <- validate_wq(df)

# Check structure
str(df)
```

The `validate_wq()` function ensures that the minimal set of columns required for IQA calculation and CONAMA checks is present.

## Handling censored values (ND / <LD / <LOQ)

Water quality data often contains censored values, where concentrations are below the detection limit. The tikatuwq package provides explicit handling for these cases.

```{r censored-values}
# Example with censored values
# If your CSV contains values like "<0.01", "<LD", or "ND", read_wq() handles them
# with the nd_policy parameter:

# df_with_nd <- read_wq("data.csv", nd_policy = "ld2")
```

The `nd_policy` parameter accepts four options:
- `"ld2"` (default): uses half of the detection limit
- `"ld"`: uses the detection limit value
- `"zero"`: replaces with 0
- `"na"`: replaces with NA

For this demo, the dataset does not contain censored values, but in real data you would see them parsed automatically when using `read_wq()`.

## Cleaning units and basic QA/QC

The `clean_units()` function validates parameter ranges and can convert units if specified:

```{r clean-units}
# Clean and validate units
df_clean <- clean_units(df)

# If units need conversion (e.g., ug/L to mg/L for phosphorus):
# df_clean <- clean_units(df, units_map = list(p_total = "ug/L"))
```

The function:
- Validates typical ranges (pH, OD, turbidity, etc.)
- Warns about outliers
- Converts units when `units_map` is provided

## Computing indices (IQA and trophic state)

### Water Quality Index (IQA)

The IQA combines multiple parameters into a single score (0-100):

```{r compute-iqa}
# Compute IQA
df_iqa <- iqa(df_clean, na_rm = TRUE)

# View results
head(df_iqa[, c("ponto", "IQA", "IQA_status")])

# Summary
summary(df_iqa$IQA)
table(df_iqa$IQA_status)
```

The `IQA_status` column provides qualitative classification (Very Poor, Poor, Fair, Good, Excellent).

### Trophic State Index (IET)

For lakes and reservoirs, trophic state indices indicate eutrophication level:

```{r compute-iet, eval=FALSE}
# Carlson IET (requires secchi depth, chlorophyll, total phosphorus)
# df_iet <- iet_carlson(df, .keep_ids = TRUE)

# Lamparelli IET
# df_iet <- iet_lamparelli(df, .keep_ids = TRUE)
```

Note: IET functions require specific parameters (secchi depth, chlorophyll) that are not in `wq_demo`. Use them when analyzing lentic (still water) systems.

## Checking CONAMA 357/2005 compliance

The CONAMA Resolution 357/2005 establishes limits for water quality parameters according to water use classes (1-4 and special). The package provides several functions to check compliance:

```{r conama-check}
# Check compliance for class 2 (default)
df_conama <- conama_check(df_iqa, classe = "2")

# View compliance columns (one per parameter)
head(df_conama[, grep("_ok$", names(df_conama))])
```

### Summary and reports

```{r conama-summary}
# Long-format summary
summary_long <- conama_summary(df_conama, classe = "2")
head(summary_long)

# Report table (violations only, formatted)
report_tab <- conama_report(df_conama, classe = "2", only_violations = TRUE, pretty = TRUE)
print(report_tab)

# Textual summary
summary_text <- conama_text(df_conama, classe = "2", only_violations = TRUE)
cat(summary_text, sep = "\n")
```

## Generating plots

### IQA visualization

```{r plot-iqa}
library(ggplot2)

# Bar plot of IQA by point
p1 <- plot_iqa(df_iqa)
print(p1)
```

### Time series

```{r plot-series}
# Time series of a parameter
p2 <- plot_series(df_iqa, "turbidez", facet = "ponto")
print(p2)
```

### Box plots

```{r plot-box}
# Box plots by point
p3 <- plot_box(df_iqa, "od", by = "ponto")
print(p3)
```

### Heatmap

```{r plot-heatmap}
library(tidyr)

# Reshape to long format
df_long <- df_iqa %>%
  dplyr::select(data, ponto, turbidez, od, dbo, ph) %>%
  pivot_longer(cols = c(turbidez, od, dbo, ph),
               names_to = "parametro",
               values_to = "valor")

# Heatmap
p4 <- plot_heatmap(df_long)
print(p4)
```

## Generating a textual analysis and report

The `generate_analysis()` function produces human-readable paragraphs summarizing water quality:

```{r generate-analysis}
# Generate analytical text
analysis_text <- generate_analysis(
  df_iqa,
  classe_conama = "2",
  incluir_tendencia = FALSE,  # Set TRUE if you have temporal data
  contexto = list(river = "Demo River", period = "2025")
)

cat(paste(analysis_text, collapse = "\n\n"))
```

For a full HTML report:

```{r render-report, eval=FALSE}
# Generate HTML report (requires rmarkdown)
# report_path <- render_report(
#   df_iqa,
#   meta = list(river = "Demo River", period = "2025"),
#   output_dir = tempdir()
# )
# 
# # Open in browser
# browseURL(report_path)
```

## Summary and next steps

This vignette demonstrated a complete workflow:

1. **Read data**: `read_wq()` with censored value handling
2. **Validate**: `validate_wq()` to check required columns
3. **Clean units**: `clean_units()` for unit conversion and validation
4. **Compute indices**: `iqa()` for water quality index
5. **Check compliance**: `conama_check()`, `conama_summary()`, `conama_report()`
6. **Visualize**: `plot_iqa()`, `plot_series()`, `plot_box()`, `plot_heatmap()`
7. **Report**: `generate_analysis()` and `render_report()`

### Next steps

- Explore temporal trends with `trend_param()` and `plot_trend()`
- Use `param_analysis()` functions for parameter-specific analysis
- Create interactive maps with `plot_map()` if coordinates are available
- Adapt the workflow to your own datasets

For more details, see:
- The methods vignette for index calculations and trend analysis
- Package documentation: `help(package = "tikatuwq")`
- Online documentation: https://tikatuwq.github.io/tikatuwq/