---
title: "Getting Started with gtregression"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with gtregression}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r, echo=FALSE, out.width='200px'}
knitr::include_graphics("../man/figures/gtregression_hex.png")
```

# gtregression

`gtregression` is an R package that simplifies regression modeling and
generates publication-ready tables using the `gtsummary` ecosystem. It
supports a variety of regression approaches with built-in tools for
model diagnostics, selection, and confounder identification—all designed
to provide beginner and intermediate R users with clean, interpretable
output.

This package was created with the aim of empowering R users in low- and
middle-income countries (LMICs) by offering a simpler and more
accessible coding experience. We sincerely thank the authors and
contributors of foundational R packages such as `gtsummary`, `MASS`,
`RISKS`, `dplyr`, and others—without whom this project would not have
been possible.

## Table of Contents

-   [Vision](#vision)
-   [Features](#features)
-   [Installation](#installation)
-   [Quick Start](#quick-start)
-   [Key Functions](#key-functions)


## Vision {#vision}

At its core, `gtregression` is more than just a statistical tool—it is a
commitment to open access, simplicity, and inclusivity in health data
science. Our team is driven by the vision of empowering researchers,
students, and public health professionals in LMICs through
user-friendly, well-documented tools that minimize coding burden and
maximize interpretability.

We believe in the democratization of data science and aim to promote
open-source resources for impactful and equitable research globally.

## Features {#features}

-   Supports multiple regression approaches:
    -   Logistic (logit)
    -   Log-binomial
    -   Poisson / Robust Poisson
    -   Negative Binomial
    -   Linear Regression
-   Univariable and multivariable regression
-   Confounder identification using crude and adjusted estimates
-   Stepwise model selection (AIC/BIC/adjusted R²)
-   Stratified regression support
-   Formatted outputs using `gtsummary`
-   Built-in example datasets: `PimaIndiansDiabetes2`, `birthwt`, `epil`

## Installation {#installation}

``` r
# Install from CRAN
install.packages("gtregression")

# Or install the development version from GitHub
devtools::install_github("ThinkDenominator/gtregression")
```

## Quick Start {#quick-start}

``` r
# Load necessary libraries
library(gtregression)

# Load example dataset
data("data_PimaIndiansDiabetes", package="gtregression")

# Convert diabetes outcome to binary and create categorical variables
pima_data <- data_PimaIndiansDiabetes |>
  mutate(diabetes = ifelse(diabetes == "pos", 1, 0)) |>
  mutate(bmi = case_when(
    mass < 25 ~ "Normal",
    mass >= 25 & mass < 30 ~ "Overweight",
    mass >= 30 ~ "Obese",
    TRUE ~ NA_character_),                                       
    bmi = factor(bmi, levels = c("Normal", "Overweight", "Obese")),
    age_cat = case_when(
      age < 30 ~ "Young",
      age >= 30 & age < 50 ~ "Middle-aged",
      age >= 50 ~ "Older"),
    age_cat = factor(age_cat, levels = c("Young", "Middle-aged", "Older")),
    npreg_cat = ifelse(pregnant > 2, "High parity", "Low parity"),
    npreg_cat = factor(npreg_cat, levels = c("Low parity", "High parity")),
    glucose_cat= case_when(glucose<=140~ "Normal", glucose>140~"High"),
    glucose_cat= factor(glucose_cat, levels = c("Normal", "High")),
    bp_cat = case_when(
      pressure < 80 ~ "Normal",
      pressure >= 80 ~ "High"
    ),
    bp_cat= factor(bp_cat, levels = c("Normal", "High")),
    triceps_cat = case_when(
      triceps < 23 ~ "Normal",
      triceps >= 23 ~ "High"
    ),
    triceps_cat= factor(triceps_cat, levels = c("Normal", "High")),
    insulin_cat = case_when(
      insulin < 30 ~ "Low",
      insulin >= 30 & insulin < 150 ~ "Normal",
      insulin >= 150 ~ "High"
    ),
    insulin_cat = factor(insulin_cat, levels = c("Low", "Normal", "High"))
  ) |>
  mutate(
    dpf_cat = case_when(
      pedigree <= 0.2 ~ "Low Genetic Risk",
      pedigree > 0.2 & pedigree <= 0.5 ~ "Moderate Genetic Risk",
      pedigree > 0.5 ~ "High Genetic Risk"
    )
  ) |>
  mutate(dpf_cat = factor(dpf_cat, 
              levels = c("Low Genetic Risk", 
                          "Moderate Genetic Risk", 
                          "High Genetic Risk"))) |>
  mutate(diabetes_cat= case_when(diabetes== 1~ "Diabetes positive", 
                                TRUE~ "Diabetes negative")) |>
  mutate(diabetes_cat= factor(diabetes_cat, 
                        levels = c("Diabetes negative","Diabetes positive" )))

# Descriptive statistics table
exposures <- c("bmi", "age_cat", "npreg_cat", "bp_cat", "triceps_cat",
               "insulin_cat", "dpf_cat")

# Create a descriptive table by diabetes category
des_tbl = descriptive_table(data= pima_data, 
                             exposures = exposures, 
                             by= "diabetes_cat")
                             
# Check the data compatibility
dissect(pima_data)

# Univariable regression
uni_tbl = uni_reg(
  data = pima_data,
  outcome = "diabetes",
  exposures = exposures,
  approach = "logit"
)

# check models and summaries
uni_tbl$models
uni_tbl$model_summaries

# Plot univariable regression results
plot_reg(uni_tbl, 
         title = "Univariable Regression Results")
         
# multivariable regression
multi_tbl = multi_reg(
  data = pima_data,
  outcome = "diabetes",
  exposures = exposures,
  approach = "logit"
)

# check models and summaries
multi_tbl$models
multi_tbl$model_summaries

# Plot univariable regression results
plot_reg(multi_tbl, 
         title = "Multivariable Regression Results")

# combined plots
plot_reg_combine(
  uni_tbl, 
  multi_tbl, 
  title = "Univariable vs Multivariable Regression Results")
  
# combine the tables
merge_table(des_tbl, uni_tbl, multi_tbl, 
            spanners = c("**Descriptive**",
            "**Univariate**", 
            "**Multivariable**"))

# Save the table as a Word document
save_table(des_tbl, filename = "des_tbl", format = "docx")

save_docx(
  tables = list(des_tbl, uni_tbl, multi_tbl),
  filename = "Outputs.docx")
  
# Stratified regression
stratified_uni_reg(pima_data,
                     outcome= "diabetes",
                     exposures =c("bmi", "insulin_cat", "age_cat", "dpf_cat"),
                     approach = "logit",
                     stratifier = "glucose_cat")
                     
stratified_multi_reg(pima_data,
                     outcome= "diabetes",
                     exposures =c("bmi", "insulin_cat", "age_cat", "dpf_cat"),
                     approach = "logit",
                     stratifier = "glucose_cat")
                     
# Check model convergence
check_convergence(pima_data, 
                  exposures = exposures, 
                  outcome = "diabetes", 
                  approach = "logit", 
                  multivariate = F)
                  
check_convergence(pima_data, 
                  exposures = exposures, 
                  outcome = "diabetes", 
                  approach = "logit", 
                  multivariate = T)


# identify confounders
identify_confounder(pima_data,
                    outcome = "diabetes",
                    exposure = "npreg_cat",
                    potential_confounder = "bp_cat",
                    approach = "logit")
                     
# check interactions
interaction_models(pima_data,
                   outcome,
                   exposure = "bmi",
                   effect_modifier = "glucose_cat",
                   covariates = c("insulin_cat", "age_cat", "dpf_cat"),
                   approach = "logit")
```

## Key Functions {#key-functions}

### Descriptive & Compatibility Tools

| Function Name        | Purpose                               |
|----------------------|---------------------------------------|
| `descriptive_table()`| Summarise exposures by outcome groups |
| `dissect()`          | Check outcome-exposure compatibility  |

### Regression Functions - Fit univariate and multivariable models

| Function Name | Purpose                              |
|---------------|--------------------------------------|
| `uni_reg()`   | Univariable regression (OR/RR/IRR/β) |
| `multi_reg()` | Multivariable regression             |

### Regression Functions by stratifier

| Function Name            | Purpose                             |
|--------------------------|-------------------------------------|
| `stratified_uni_reg()`   | Stratified univariable regression   |
| `stratified_multi_reg()` | Stratified multivariable regression |

### Model Diagnostics & Selection

| Function Name         | Purpose                                          |
|-----------------------|--------------------------------------------------|
| `check_convergence()` | Evaluate model convergence and max fitted values |
| `select_models()`     | Stepwise model selection (AIC/BIC/adjusted R²)   |

### Confounding & Interaction

| Function Name           | Purpose                                           |
|------------------------|------------------------------------------------|
| `identify_confounder()` | Confounding assessment via % change or MH method  |
| `interaction_models()`  | Compare models with and without interaction terms |

### Plots & Exports

| Function Name        | Purpose                                        |
|----------------------|------------------------------------------------|
| `plot_reg()`         | Forest plot for a single regression model      |
| `plot_reg_combine()` | Side-by-side forest plots for uni/multi models |
| `modify_table()`     | Customize column labels or output structure    |
| `save_table()`       | Export table to `.html`, `.csv`, `.docx`       |
| `save_docx()`        | Save table as Word document (`.docx`)          |
| `save_plot()`        | Save plot as `.png`, `.pdf`, etc.              |
| `merge_tables()`      | Combine descriptive and regression tables      |

## Conclusion 

The `gtregression` package simplifies regression coding and produces 
publication-ready tables with interpretation notes. It enables beginners to 
explore a variety of regression models with ease, transparency, and 
reproducibility. Explore the documentation for each function to discover 
additional options and customization features.
