---
title: "Finding communities in large datasets"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{large_datasets_communities}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(dendroNetwork)
```

## Community detection in very large datasets

When using larger datasets of tree-ring series, calculating the table with similarities can take a lot of time, but finding communities even more. It is therefore recommended to use of parallel computing for Clique Percolation: `clique_community_names_par(network, k=3, n_core = 4)`. This reduces the amount of time significantly. For most datasets `clique_community_names()` is sufficiently fast and for smaller datasets `clique_community_names_par()` can even be slower due to the parallelisation. Therefore, the funtion `clique_community_names()` should be used initially and if this is very slow, start using `clique_community_names_par()`.

The workflow is similar as described in the `vignette("dendroNetwork")`, but with minor changes:

1.  load network.

2.  compute similarities.

3.  find the maximum clique size: `igraph::clique_num(network)` .

4.  detect communities for each clique size separately:

    -   `com_cpm_k3 <- clique_community_names_par(network, k=3, n_core = 6)`.

    -   `com_cpm_k4 <- clique_community_names_par(network, k=4, n_core = 6)`.

    -   and so on until the maximum clique size.

5.  merge these into a single `data frame` by `com_cpm_all <- rbind(com_cpm_k3,com_cpm_k4, com_cpm_k5,... )` .

6.  create table for use in cytoscape with all communities: `com_cpm_all <- com_cpm_all |> dplyr::count(node, com_name) |> tidyr::spread(com_name, n)` .

7.  Continue with the visualisation in Cytoscape, see the relevant section in the `vignette("dendroNetwork")` .
