---
title: "scholar: Analyse citation data from Google Scholar"
author: "Guangchuang Yu, James Keirstead and Gregory Jefferis"
date: "`r Sys.Date()`"
output:
  prettydoc::html_pretty:
    toc: true
    theme: cayman
    highlight: github
  pdf_document:
    toc: true
vignette: >
  %\VignetteIndexEntry{scholar introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
  %\VignetteDepends{ggplot2}
  %\VignetteDepends{yulab.utils}
---

```{r style, echo=FALSE, results="asis", message=FALSE}
knitr::opts_chunk$set(tidy = FALSE,
		   message = FALSE)

has_scholar <- yulab.utils::has_internet("https://scholar.google.com") 
```


```{r echo=FALSE, results="hide", message=FALSE, eval=has_scholar}
library("scholar")
library("ggplot2")
theme_set(theme_minimal())
```


# Retrieving basic information

```{r eval=has_scholar}
## Define the id for Richard Feynman
id <- 'B7vSqZsAAAAJ'

## Get his profile
get_profile(id)
```

# Retrieving publications


`get_publications()` return a `data.frame` of publication records. It contains
information of the publications, including *title*, *author list*, *page
number*, *citation number*, *publication year*, *etc.*.

The `pubid` is the article ID used by Google Scholar and the identifier
that is used to retrieve the citation history of a selected publication.

```{r eval=has_scholar}
## Get his publications (a large data frame)
p <- get_publications(id)
head(p, 3)
```


# Retrieving citation data

```{r eval=has_scholar}
## Get his citation history, i.e. citations to his work in a given year
ct <- get_citation_history(id)

## Plot citation trend
library(ggplot2)
ggplot(ct, aes(year, cites)) + geom_line() + geom_point()
```


Users can retrieve the citation history of a particular publication with
`get_article_cite_history()`.


```{r eval=has_scholar}
## The following publication will be used to demonstrate article citation history
as.character(p$title[1])

## Get article citation history
ach <- get_article_cite_history(id, p$pubid[1])

## Plot citation trend
ggplot(ach, aes(year, cites)) +
    geom_segment(aes(xend = year, yend = 0), linewidth=1, color='darkgrey') +
    geom_point(size=3, color='firebrick')
```

# Comparing scholars

You can compare the citation history of scholars by fetching data with 
`compare_scholars`.
```{r eval=has_scholar}
# Compare Feynman and Stephen Hawking
ids <- c('B7vSqZsAAAAJ', 'DO5oG40AAAAJ')

# Get a data frame comparing the number of citations to their work in
# a given year
cs <- compare_scholars(ids)
```

```{r echo=FALSE, results="hide", message=FALSE}
has_cs <- FALSE

if (has_scholar && !is.null(cs)) {
  has_cs <- TRUE
}
```


```{r eval=has_cs}
## remove some 'bad' records without sufficient information
cs <- dplyr::filter(cs, !is.na(year) & year > 1900) 

ggplot(cs, aes(year, cites, group=name, color=name)) + 
  geom_line() + theme(legend.position="bottom")
```

```{r eval=has_scholar}
## Compare their career trajectories, based on year of first citation
csc <- compare_scholar_careers(ids)
```

```{r echo=FALSE, results="hide", message=FALSE}
has_csc <- FALSE

if (has_scholar && !is.null(csc)) {
  has_csc <- TRUE
}
```


```{r eval=has_csc}
ggplot(csc, aes(career_year, cites, group=name, color=name)) + 
  geom_line() + geom_point() +
  theme(legend.position = "inside", 
    legend.position.inside=c(.2, .8)
  )
```

# Visualizing and comparing network of coauthors

```{r eval=has_scholar}
# Be careful with specifying too many coauthors as the visualization of the
# network can get very messy.
coauthor_network <- get_coauthors('DO5oG40AAAAJ', n_coauthors = 4)

coauthor_network
```

```{r echo=FALSE, results="hide", message=FALSE}
has_coauthor <- FALSE

if (has_scholar && (nrow(coauthor_network) > 1)) {
  has_coauthor <- TRUE
}
```

And then we have a built-in function to plot this visualization.

```{r eval=has_coauthor}
plot_coauthors(coauthor_network)
```

Note however, that these are the coauthors listed in Google Scholar profile and not coauthors from all publications.

# Formatting publications for CV

The `format_publications` function can be used for example in conjunction with the [`vitae`](https://pkg.mitchelloharawild.com/vitae/) package to format publications in APA Style. The short name of the author of interest (e.g., of the person whose CV is being made) can be highlighted in bold with the `author.name` argument. The function after the pipe allows rmarkdown to format them properly, and the code chunk should be set to `results = "asis"`.

### APA style

```{r results = "asis", eval=has_scholar}
format_publications("DO5oG40AAAAJ", "Guangchuang Yu") |> head() |> cat(sep='\n\n')
```

### Numbering format

```{r results = "asis", eval=has_scholar}
format_publications("DO5oG40AAAAJ", "Guangchuang Yu") |> head() |> print(quote=FALSE)
```
