---
title: "Unlocking DrugBank: Parsing and Visualizing Mechanistic Data"
author: "Mohammed Ali"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Unlocking DrugBank: Parsing and Visualizing Mechanistic Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "docs/articles/",
  out.width = "100%",
  warning = FALSE,
  message = FALSE
)
```

## Introduction

The **DrugBank** database is the global gold standard for drug information, containing rich data on chemical structures, pharmacology, and molecular targets. However, the raw data is provided as a massive, complex XML file that is difficult to use in statistical analysis.

`dbparser` solves this problem. It transforms that complex XML into a **Drugverse Object (`dvobject`)**—a structured collection of R data frames ready for analysis.

In this tutorial, we will demonstrate how to:

1.  Parse the database.
2.  Explore the data structure.
3.  Visualize key insights using `dplyr` and `canvasXpress`.

## 1. Loading Data

### The Parsing Workflow
In a real-world scenario, you would parse the full XML database (downloaded from [DrugBank](https://go.drugbank.com/releases/latest)) using `parseDrugBank()`.

```r
library(dbparser)

# Parse the full XML file (this creates a dvobject)
# drugbank_db <- parseDrugBank("drugbank_full_database.xml")
```

### Loading Sample Data
For this vignette, we will load curated sample datasets included in the package. These represent specific tables that would be found inside the full `dvobject`.

```{r load_data}
suppressPackageStartupMessages({
  library(tidyr)
  library(dplyr)
  library(tibble)
  library(canvasXpress)
  library(dbparser)
})

# Load sample tables representing parts of the parsed dvobject
drugs <- readRDS(system.file("drugs.RDS", package = "dbparser"))
drug_groups <- readRDS(system.file("drug_groups.RDS", package = "dbparser"))
drug_targets_actions <- readRDS(system.file("targets_actions.RDS", package = "dbparser"))
```

## 2. Analysis: The Drug Landscape

What does the universe of approved drugs look like? Let's analyze the composition of the database by drug type (Small Molecule vs. Biotech).

```{r analysis_type}
# Prepare data: Count drugs by type
type_stat <- drugs %>% 
  group_by(type) %>% 
  summarise(Count = n()) %>% 
  arrange(desc(Count)) %>% 
  column_to_rownames("type")

# Visualize
canvasXpress(
  data             = type_stat,
  graphType        = "Bar",
  title            = "Composition of DrugBank: Drug Types",
  showSampleNames  = FALSE,
  legendPosition   = "right"
)
```

## 3. Analysis: Approval Status

Drugs are categorized into groups such as "approved", "investigational", or "experimental". How does the complexity (Biotech vs. Small Molecule) differ across these groups?

```{r analysis_groups}
# Prepare data: Cross-tabulate Type vs Group
group_stat <- drugs %>% 
  full_join(drug_groups, by = "drugbank_id") %>% 
  group_by(type, group) %>% 
  summarise(count = n(), .groups = 'drop') %>% 
  pivot_wider(names_from = group, values_from = count, values_fill = 0) %>% 
  column_to_rownames("type")

# Visualize with a Stacked Bar Chart
canvasXpress(
  data           = group_stat,
  graphType      = "Stacked",
  graphOrientation = "horizontal",
  title          = "Drug Types by Approval Status",
  xAxisTitle     = "Number of Drugs",
  legendPosition = "bottom",
  xAxis2Show     = FALSE
)
```

## 4. Analysis: Molecular Mechanisms

One of DrugBank's most valuable features is the detailed information on how drugs interact with their targets (Proteins, Enzymes, etc.). Are drugs mostly inhibitors, agonists, or antagonists?

```{r analysis_targets}
# Prepare data: Top 10 most common Mechanisms of Action
targetActionCounts <- drug_targets_actions %>% 
    group_by(action) %>% 
    summarise(Count = n()) %>% 
    arrange(desc(Count)) %>% 
    slice_head(n = 10) %>% 
    column_to_rownames("action")

# Visualize
canvasXpress(
  data            = targetActionCounts,
  graphType       = "Bar",
  graphOrientation = "vertical",
  colorBy         = "Count",
  title           = "Top 10 Mechanisms of Action",
  xAxisTitle      = "Number of Interactions",
  showSampleNames = FALSE,
  legendPosition  = "none"
)
```

## 5. Next Steps: Integrated Pharmacovigilance

Now that you have mastered the mechanistic data in DrugBank, you can combine it with real-world data!

`dbparser` now supports **OnSIDES** (Adverse Events) and **TWOSIDES** (Drug-Drug Interactions).

Check out the **[Integrated Pharmacovigilance Vignette](drugbank_nside.html)** to learn how to merge these databases to perform polypharmacy risk analysis.
