%\VignetteIndexEntry{econet}
%\VignetteEngine{R.rsp::tex}
\documentclass[nojss]{jss}
\usepackage{amsmath,array,multirow,bm,amssymb,amsthm,orcidlink,thumbpdf,lmodern}
\graphicspath{{Figures/}}

\newtheorem{definition}{Definition}

\author{Marco Battaglini~\orcidlink{0000-0001-9690-0721}\\Cornell University, EIEF, NBER
  \And Valerio Leone Sciabolazza~\orcidlink{0000-0003-2537-3084}\\Sapienza University of Rome, CEIS
  \AND Eleonora Patacchini~\orcidlink{0000-0002-3510-2969}\\ Cornell University, EIEF
  \And Sida Peng\\Microsoft Research}
\Plainauthor{Marco Battaglini, Valerio Leone Sciabolazza, Eleonora Patacchini, Sida Peng}

\title{\pkg{econet}: An \proglang{R}~Package for Parameter-Dependent Network Centrality Measures}
\Plaintitle{\pkg{econet}: An R Package for Parameter-Dependent Network Centrality Measures}
\Shorttitle{\pkg{econet}: Parameter-Dependent Network Centrality Measures in \proglang{R}}

\Abstract{
  The \proglang{R}~package \pkg{econet} provides methods for
  estimating parameter-dependent network centrality measures with
  linear-in-means models.  Both nonlinear least squares and maximum likelihood
  estimators are implemented.  The methods allow for both link and node
  heterogeneity in network effects, endogenous network formation and the
  presence of unconnected nodes.  The routines also compare the explanatory
  power of parameter-dependent network centrality measures with those of
  standard measures of network centrality.  Benefits and features of the
  \pkg{econet} package are illustrated using data from
  \cite{Battaglini+Patacchini:2018} and
  \cite{Battaglini+Sciabolazza+Patacchini:2020}.
}

\Keywords{network econometrics, heterogeneous peer effects, endogenous network formation, least-square estimators, maximum likelihood estimators, \proglang{R}}
\Plainkeywords{network econometrics, heterogeneous peer effects, endogenous network formation, least-square estimators, maximum likelihood estimators, R}

\Volume{102}
\Issue{8}
\Month{April}
\Year{2022}
\Submitdate{2018-07-05}
\Acceptdate{2021-08-17}
\DOI{10.18637/jss.v102.i08}

\Address{
  Marco Battaglini\\
  Department of Economics\\
  Cornell University\\
  Ithaca, NY 14850, United States of America\\
  \emph{and} EIEF \emph{and} NBER\\
  E-mail: \email{battaglini@cornell.edu}\\

  Valerio Leone Sciabolazza\\
  Department of Economics and Law\\
  Sapienza University of Rome\\
  Rome, 00161, Italy\\
  \emph{and} CEIS\\
  E-mail: \email{valerio.leonesciabolazza@uniroma1.it}

  Eleonora Patacchini\\
  Department of Economics\\
  Cornell University\\
  Ithaca, NY 14850, United States of America\\
  \emph{and} EIEF\\
  E-mail: \email{ep454@cornell.edu}\\

  Sida Peng\\
  Office of Chief Economist\\
  Microsoft Research\\
  Redmond, WA 14865, United States of America\\
  E-mail: \email{sidpeng@microsoft.com}
}

\begin{document}

\vspace*{-0.5cm}

\section{Introduction} \label{sec:intro}

Since its inception, network analysis has mostly focused on the discovery of
topological properties of network structures.  This has changed dramatically
over the past ten years.  An emerging literature in economics has shown that
network centrality measures, which were traditionally viewed as descriptive,
have an interpretation within equilibrium models of behavior.  The pioneer
paper is \cite{Ballester+Armengol+Zenou:2006}.  This paper considers a model
in which an agent's effort is triggered by the effort of his/her socially
connected peers.  It shows that the equilibrium  levels of effort are linear
functions of the agent's position in the network as measured by an indicator
within the family of Katz-Bonacich centralities.  Katz-Bonacich centralities
\citep{Katz:1953,Bonacich:1972,Bonacich:1987} are network centrality
measures that count all nodes that can be reached through a direct or
indirect path, penalizing in different ways the contributions of distant
nodes in determining a given node's centrality.  The discount factor is
captured by a parameter, thus making the measures of centrality
parameter-dependent.  The sociological literature has been treating this
parameter as a nuisance parameter, and arbitrarily setting it to any value
smaller than one.  However, the contribution of
\cite{Ballester+Armengol+Zenou:2006} is to show that this parameter captures
the strength of peer effects or social interactions that stem from the
aggregation of dyadic peer influences.  More specifically, the empirical
counterpart of the \cite{Ballester+Armengol+Zenou:2006} equilibrium
condition is a linear model of social interactions, where the individual
levels of effort are linear functions of the levels of effort of the
connected agents.  The parameter capturing this influence is then used to
measure the individual's importance in the network.  Since then, a
burgeoning empirical literature has used the linear-in-means model to prove
that peer effects and the individual position in the social network play an
important role in explaining many social and economic outcomes, including
consumer behavior, voting patterns, job search, information diffusion,
innovation adoption, international trade and risk sharing \citep[see
e.g.,][for recent
reviews]{An:2011,An:2015a,Jackson+Rogers+Zenou:2017,Hsieh:2020,Zenou:2016}.

Very recently \cite{Battaglini+Patacchini:2018} present a new theory of
competitive vote-buying to study how interest groups allocate campaign
contributions when legislators care about the behavior of other legislators
to whom they are socially connected.  The model provides an alternative
microfoundation for network measures within the family of Katz-Bonacich
centralities.  This theory predicts that, in equilibrium, campaign
contributions are proportional to a parameter-dependent measure of network
centrality, similar to the one proposed by
\cite{Ballester+Armengol+Zenou:2006}.  While
\cite{Ballester+Armengol+Zenou:2006} is a purely theoretical paper,
\cite{Battaglini+Patacchini:2018} test the theory with data from five recent
United States (US) Congresses.  In doing so, it confronts a variety of empirical
challenges.  For example, the theories described above show the importance
of combining the information on network centrality with additional
information on characteristics of the agents, since these characteristics
can magnify or reduce the role played by a central agent.  Moreover, the
theories provide a framework to study the role of network endogeneity.  In
fact, when agents are strategic in choosing their peers, and omitted
variables (such as social skills) drive both agent's behavior and social
connectedness, the estimation of the peer effects parameter might be flawed
\citep{Manski:1993}.\footnote{See also \cite{An:2015a} and
\cite{VanderWeele+An:2013} for a discussion on the difficulties in studying
peer effects.} \cite{Battaglini+Sciabolazza+Patacchini:2020} derive a model
to control for network endogeneity allowing for a two-stage correction \`{a}
la Heckman \citep{Heckman:1979} and demonstrate the relevance of alumni
connections in shaping politicians' legislative effectiveness.  The common
trait between all these theoretical models is that they establish a link
between observable outcomes associated to a network node (for example,
educational attainments as proxies of effort levels in
\cite{Ballester+Armengol+Zenou:2006}; the money received by politicians in
\cite{Battaglini+Patacchini:2018}; the levels of legislative effectiveness
in \cite{Battaglini+Sciabolazza+Patacchini:2020}) and the respective
centrality of the node.  This theoretical link can be used to estimate the
parameter in the centrality measure using the observable outcomes.  This is
useful because, for example, it allows to test for network effects or to
acquire a deeper knowledge of the topological features of the respective
networks.

The routines contained in the package \pkg{econet} \citep{econet} for \proglang{R}
\citep{R} allow for implementation of a number of variations of the
linear-in-means model to obtain alternative centrality measures within the
family of Katz-Bonacich centrality.  Both nonlinear least squares (NLLS) and
maximum likelihood (ML) estimators are provided.  Several methods for
dealing with the identification of network effects are implemented. 
Moreover, the \pkg{econet} package allows for comparison of the explanatory
power of parameter-dependent network centrality measures with those of
standard measures of network centrality \citep{Wasserman+Faust:1994}.  As a
result, \pkg{econet} expands the large set of tools available to
\proglang{R} users interested in network analysis.  Specifically, it has at
least four merits.  First, it complements the \proglang{R}~packages
implementing traditional individual-level centrality measures for binary
networks, \pkg{igraph} \citep{igraph} and \pkg{sna} \citep{sna}, and
weighted networks, \pkg{tnet} \citep{tnet}, and group-level centrality
measures for both binary and weighted networks, \pkg{keyplayer}
\citep{An+Liu:2016}, by introducing new eigensolutions-based
techniques to rank individual agents' centrality.  Second, whereas previous
packages, such as \pkg{btergm} \citep{btergm}, \pkg{hergm} \citep{hergm},
the \pkg{statnet} suite \citep{statnet}, and \pkg{xergm} \citep{xergm},
created environments for modeling the statistical processes underlying
network formation, \pkg{econet} provides the first framework to investigate
the socio-economic processes operating on networks (i.e.,~peer effects). 
Third, it completes the collection of functions for modeling spatial
dependence in cross-sectional data provided by \pkg{spdep} \citep{spdep} and
\pkg{splm} \citep{splm}, by allowing the users to: i) consider the presence
of unconnected nodes, and ii) address network endogeneity.  Finally, it
equips the \proglang{R} archive with routines still unavailable in other
commonly used software for the investigation of relational data, such as
\proglang{MATLAB} \citep{MATLAB}, \proglang{Pajek} \citep{Batagelj:2003},
\proglang{Python} \citep{Python} and \proglang{Stata} \citep{Stata}.  The
example we use to showcase the functionality of our \proglang{R}~package is
taken from \cite{Battaglini+Patacchini:2018} and
\cite{Battaglini+Sciabolazza+Patacchini:2020}.  The \proglang{R}~package
\pkg{econet} \citep{econet} is available from the Comprehensive \proglang{R} Archive Network (CRAN) at
\url{https://CRAN.R-project.org/package=econet}

The rest of the paper is organized as follows.  Section~\ref{sec:theory}
briefly reviews the theoretical background of the different approaches used
to model the socio-economic processes operating on the network. 
Section~\ref{sec:models} discusses the key elements for the estimation of
parameter-dependent centralities, and presents a general taxonomy of the
models implemented by the \proglang{R}~package \pkg{econet}. 
Section~\ref{sec:endogeneity} lays out various models and methods to deal
with network endogeneity.  Section~\ref{sec:econet} demonstrates the use of
the main functions of the package \pkg{econet} to determine agent's
centrality with examples.  Section~\ref{sec:conclusion} concludes.

%% -- Manuscript 
\section{Microeconomic foundation} \label{sec:theory}

This section provides a theoretical background for the network models of
peer effects implemented by the \proglang{R}~package \pkg{econet}.  Many
network centrality measures have been introduced in the literature, each
capturing different aspects of network topology.  Which is the correct way
to measure how central is an agent in a network?  In this section we
describe three economic models that derive conditions under which the
Katz-Bonacich centralities that can be estimated using econet are the
correct measures of an agent centralities.  The aim is to acquaint the
researcher interested in working with \pkg{econet} of the different
theoretical premises of these models, so that he/she can choose the model
and the relative \pkg{econet} functions most appropriate for conducting
his/her investigation.

Model A describes the competition between two or more lobbyists who
distribute monetary contributions among $n$ legislators to influence
their votes.  Each lobbyist aims at maximizing the number of legislators
that vote for his/her own preferred policy option.  Two legislators are
socially connected if they derive utility from voting in the same way.  The
model provides conditions under which, in the unique Nash equilibrium of the
game, the money promised to legislator $i$ is proportional to the
Katz-Bonacich centrality of $i$.  The model therefore provides a
clear economic interpretation to the Katz-Bonacich centrality and
illustrates it relevance in this context.  This model was first presented in
\cite{Battaglini+Patacchini:2018}.

Model B studies the extent to which social connections influence the
legislative effectiveness of members of the US Congress.  By legislative
effectiveness we mean the ability of a legislator to pass legislation.  In
this model, the effectiveness in passing legislation of Congress member
$i$ is described by a ``production function'' in which the inputs are
the Congress member $i$'s effort and the effectiveness of all other
socially connected Congress members, each weighted by the strength of their
social link with $i$.  To determine the optimal level of effort, a
legislator needs to predict the equilibrium effectiveness of all the
socially linked Congress members, who in turn need to do the same in a Nash
equilibrium.  Here too, the model provides conditions under which the
effectiveness of a Congress member in equilibrium is proportional to a
weighted version of the Katz-Bonacich centrality of the legislator, in which
the weights are a specific function of the legislator's characteristics. 
This model was first presented in
\cite{Battaglini+Sciabolazza+Patacchini:2020}.

Models A and B are not alternative ways to represent the same economic
problem.  Models A and B describe completely different social interactions:
the first, a competitive game between two lobbyists; the second, a
cooperative game between $n$ legislators.\footnote{The difference is
not just in the interpretation.  The models are formally different games in
a game theoretic sense: the set of players are different, the strategy space
is different, the payoffs are different.} Models A and B are therefore
relevant to our discussion because they show how the similar centrality
measure can emerge as relevant in a completely different context.  Since the
tools provided in \pkg{econet} can be used to estimate both generalities,
these examples illustrate how \pkg{econet} can be useful in studying
completely different social problems.

Section~\ref{sec:alternative} also shortly discusses the popular network
model of peer effects by \cite{Ballester+Armengol+Zenou:2006}. 
Interestingly, the prediction of this model is the same of that derived from
Model B.  This implies that the functions contained in \pkg{econet} to
estimate Model B can also be used to test the predictions of the model by
\cite{Ballester+Armengol+Zenou:2006}.  The difference between model B and
\cite{Ballester+Armengol+Zenou:2006} is in the way the strategic environment
is modeled.  In model B, the productivity of $i$ is affected by the
productivity of the other socially connected players; in
\cite{Ballester+Armengol+Zenou:2006}, it is assumed that the cost of effort
of $i$ depends on the effort level of the other players.  In
\cite{Ballester+Armengol+Zenou:2006}, the actions (i.e.,~the levels of
effort) are predicted to be equal to the Katz-Bonacich centralities; in
model B, it is the outcomes that are predicted equal to the centralities. 
Model B is better suited for empirical analysis as the effectiveness can
often be observed and measured, but not effort.  Models that have attempted
to test the predictions of \cite{Ballester+Armengol+Zenou:2006} have
approximated effort with output, but this approximation is not possible if
other unobserved factors affect output.  The model in
\cite{Ballester+Armengol+Zenou:2006}, however is an important reference
point because it is one of the first models to study these issues. The tools
provided by \pkg{econet} are also useful in estimating the parameters of the
model in \cite{Ballester+Armengol+Zenou:2006}.

\subsection{Setup of model A}\label{sec:modelA}

\cite*{Battaglini+Patacchini:2018}, BP henceforth, consider a model in which
a legislature with n members chooses between one of two alternatives: a new
policy, denoted by $A$, and a status quo policy, denoted by $B$.\footnote{We
present here a simplified version of the model in BP for brevity.  We refer
to the original paper for the more general version.} Legislator $i$'s
utility of voting for policy $p$, denoted by $U^{i}({\bf x}(p))$, is:
%
\begin{equation}
U^{i}({\bf x}(p))=\omega \left( s^{i}(p)\right) +\phi
\sum_{j}g_{i,j}x_{j}(p)+\varepsilon _{p}^{i}  \label{U1}
\end{equation}%
%
The first term in Equation~\ref{U1} is the utility of the interest groups'
contributions: $s^{i}(p)$ is the contribution pledged in exchange for a vote
for $p$, and $\omega \left( s\right) $ is the utility that legislator $i$'s
receives from a contribution $s$.  The function $\omega (\cdot )$ is
increasing, concave and differentiable with $\lim_{s\rightarrow 0}\omega
^{\prime }(s)=\infty $, $\lim_{s\rightarrow \infty }\omega ^{\prime }(s)=0$. 
The second term in Equation~\ref{U1} describes the social interaction effects.  The
social network is described by a $n\times n$ matrix $G$ with generic element
$g_{i,j}>0$, $x_{j}(p)$ is an indicator function equal to one if legislator
$j$ chooses $p$ and zero otherwise, and $g_{i,j}$ measures the strength of
the social influence of legislator $j$ on legislator $i$.  The final term in
Equation~\ref{U1} represents other exogenous factors that may affect $i$'s
preference for or aversion to voting for $p$.  The terms are normalized so
that $\varepsilon _{A}^{i}=\varepsilon ^{i}$, where $\varepsilon ^{i}$ can
be positive or negative, and $\varepsilon _{B}^{i}$ is set at
zero.\footnote{Obviously, it is natural to assume that the legislators care
about the outcome of the vote.  This effect of their vote is proportional to
the probability of being pivotal: that is, the case in which $A$ and $B$
votes are tied, or one of them is one vote below the other.  The exact
pivotal probabilities are computed and incorporated in the legislators'
expected utilities in the analysis in BP.  Here we omit these terms for
simplicity, since in any case they are very small in an election with
hundreds of voters.}

Two interest groups, also denoted $A$ and $B$, attempt to influence the
policy outcome.  Interest group $A$ is interested in persuading as many
legislators as possible to chose policy $A$; interest group $B$, instead, is
interested in persuading the legislators to choose policy $B$.  Each
interest group is endowed with a budget $W$ and promises a contingent
payment to each legislator that follows its recommendation.  Specifically,
interest group $A$ promises a vector of payments ${\bf
s}_{A}=(s_{A}^{1},\dots, s_{A}^{n})$ to the legislators, where $s_{A}^{i}$ is
the payment received by legislator $i$ if he chooses $A$; similarly,
interest group $B$ promises a vector of payments ${\bf
s}_{B}=(s_{B}^{1},\dots,s_{B}^{n}) $ to the legislators, where $s_{B}^{i}$ is
the payment received by legislator $i$ if he votes for $B$.

Legislator $i$ is willing to vote for $A$ if and only if $E\left[
U_{B}^{i}(x)-U_{A}^{i}(x)\right] \leq 0$.  It is assumed that the interest
groups do not know with certainty the legislators' preferences, and so are
unable to perfectly forecast how payments affect their voting behavior: so
$\varepsilon ^{i}$ is assumed to be an independent, uniformly distributed
variable with mean zero and density $\Psi >0$, whose realization is observed
only by $i$.  Observing that the probability that legislator $i$ votes for
$A$ is $\varphi _{i}=\E(x_{i}(A))$, we therefore have that $i$ votes for $A$
only if:
%
\begin{equation}
\varepsilon^{i}\leq \omega (s_{A}^{i})-\omega (s_{B}^{i})+\phi
\sum\nolimits_{j}g_{i,j}\left( 2\varphi _{j}-1\right) ,  \label{p1}
\end{equation}%
%
From Equation~\ref{p1}, we have that in an interior solution in which all
probabilities are in $(0,1)$, the legislators' probabilities of choosing
$A$, $\varphi =(\varphi _{1},\dots,\varphi _{n})$, are characterized by the
non linear system:
%
\begin{equation}
\left( 
\begin{array}{c}
\varphi _{1} \\ 
\dots \\ 
\varphi _{n}%
\end{array}%
\right) =\left( 
\begin{array}{c}
1/2+\Psi \left( \omega (s_{A}^{1})-\omega (s_{B}^{1})+\phi
\sum\nolimits_{j}g_{1,j}\left( 2\varphi _{j}-1\right) \right) \\ 
\dots \\ 
1/2+\Psi \left( \omega (s_{A}^{n})-\omega (s_{B}^{n})+\phi
\sum\nolimits_{j}g_{n,j}\left( 2\varphi _{j}-1\right) \right)%
\end{array}%
\right)  \label{sys1}
\end{equation}%
%
that gives a unique vector of equilibrium probabilities $\varphi
(s)=\{\varphi _{1}(s),\dots,\varphi _{n}(s)\}$.

The game proceeds as follows.  In stage 1, the lobbyists simultaneously
commit to a vector of payments ${\bf s}_{A}$ and ${\bf s}_{B}$, without
observing $\varepsilon^{i}_{p}$.  A strategy for interest group $l$
is a probability distribution over the set of feasible transfers $S$, that
is: \[S=\{s:\sum\nolimits_{i}s^{i}\leq W,\text{ }s^{i}\geq 0\text{ {\it for}
} i=1,\dots,n\}.\]

In stage 2, the Congress members see the vector of payments and the shocks
$\varepsilon^{i}_{p}$s and optimally decide how to vote.  The lobbyists
therefore expect the Congress members to vote with probabilities $\varphi
(s)=\{\varphi _{1}(s),\dots,\varphi _{n}(s)\}$ given by Equation~\ref{sys1}.

A pair of strategies constitute a Nash equilibrium if they are mutually
optimal: the strategy of interest group $A$ maximizes the expected
number of legislators who adopt $A$ given $\varphi$ and interest
group $B$'s strategy; and the strategy of interest group $B$
minimizes the expected number of legislators who adopt $A$ given
$\varphi$ and interest group $A$'s strategy.

Interest group $A$ solves:
%
\begin{equation}
\max_{{\bf s}_{A}\in S}\left\{ \sum_{i}\left[ \varphi _{i}({\bf s}_{A},{\bf s%
}_{B})\right] \right\}  \label{eq2}
\end{equation}%
%
taking ${\bf s}_{B}$ as given. Interest group $B$'s problem is the
mirror image of $A$'s problem, as it attempts to minimize the objective
function of Equation~\ref{eq2} taking ${\bf s}_{A}$ as given. The equilibrium
solution must satisfy the first order condition:
%
\begin{equation}
\sum\nolimits_{j}\partial \varphi _{j}({\bf s}_{A},{\bf s}_{B})/\partial
s_{l}^{i}=\lambda _{l}\text{ and }\sum\nolimits_{j=1}^{n}s_{l}^{j}=W\text{
for }i=1,\dots,n,\text{ }l=A,B  \label{eq3}
\end{equation}
%
where $\lambda_{l}$ is the Lagrangian multiplier associated with the budget
constraints $\sum\nolimits_{i}s_{l}^{i}\leq W\,$in interest group $l$'s
problem.  BP show that the problem of Equation~\ref{eq2} is well behaved and fully
characterized by Equation~\ref{eq3}; in equilibrium, moreover, $A$ and $B$ have the
same Lagrangian multipliers $\lambda_{A}=\lambda _{B}=\lambda _{\ast}$, so
the first order condition is:
%
\[
D{\bf \varphi }^{\top}\cdot 1=\lambda _{\ast } 
\]%
%
where $D{\bf \varphi }^{\top}{\bf =}(\partial \varphi _{1}^{\ast }/\partial
s_{A}^{i},\dots,\partial \varphi _{n}^{\ast }/\partial s_{A}^{i})$.

To understand the relationship between lobbying and centralities we need to
``unpack'' the voting probabilities.
Differentiating Equation~\ref{sys1} and rearranging, we obtain:
%
\begin{equation}
D{\bf \varphi}=\Psi \left[ I-2\Psi \phi \cdot \boldsymbol{G}\right] ^{-1}D{\bf \omega}
\label{fi_3}
\end{equation}%
%
where $D{\bf \varphi}$ and $D{\bf \omega}$ are the Jacobians of,
respectively, ${\bf \varphi }$ and ${\bf \omega}$. Using Equation~\ref{fi_3}, we
can rewrite the first order condition for the optimality of the lobbyists as:%
%
\begin{eqnarray}
D{\bf \varphi }^{\top}\cdot 1 &=&\Psi \cdot D{\bf \omega }^{\top}\cdot \left(
I-\phi ^{\ast }\cdot \boldsymbol{G}^{\top}\right) ^{-1}\cdot {\bf 1}=\lambda _{\ast }
\label{fi_5} \\
&\Rightarrow &D{\bf \omega }^{\top}\cdot {\bf b}\left( {\bf \phi }^{\ast
},\boldsymbol{G}^{\top}\right) =\lambda _{\ast }/\Psi   \nonumber
\end{eqnarray}
%
where $\phi ^{\ast }=2\Psi \phi $ and ${\bf b}\left( {\bf \phi }^{\ast },%
{\bf G}^{\top}\right) $ is the vector of Bonacich centralities of the matrix $%
\boldsymbol{G}^{\top}$ with parameter $\phi ^{\ast }$.  Note that $D{\bf
\omega}$ is a vector of zeros except for its $i$-th element that is equal to
$\omega ^{\prime}(s_{\ast }^{i})$.  \ We can therefore write our necessary
and sufficient condition of Equation~\ref{fi_5} as:
%
\begin{equation}
b_{i}\left( \phi ^{\ast },\boldsymbol{G}^{\top}\right) \cdot \omega ^{\prime }(s_{\ast
}^{i})=\lambda _{\ast }\text{ for }i=1,\dots,n  \label{fi_4}
\end{equation}%
%
where, without loss in generality, we have incorporated the constant $\Psi$
in the Lagrangian multiplier $\lambda _{\ast}$. The necessary and
sufficient condition of Equation~\ref{fi_4} shows the determinants of the interest
group's monetary allocation. The interest group chooses $s_{\ast}^{i}$
equalizing the marginal cost of resources to its marginal benefit. \ The
marginal cost is measured by the Lagrangian multiplier $\lambda_{\ast}$ of
Equation~\ref{eq2}.  The marginal benefit is measured by the increase in expected
votes for $A$.  Equation~\ref{fi_4} makes clear that, because of network
effects, the direct benefit of making a transfer to $i$ is magnified by a
factor that is exactly equal to $b_{i}\left(\phi
^{\ast},\boldsymbol{G}^{\top}\right) $, the Bonacich centrality of $i$ in
$\boldsymbol{G}^{\top}$ with a constant $\phi ^{\ast}$.

\subsection{Setup of model B}\label{sec:modelB}

\cite*{Battaglini+Sciabolazza+Patacchini:2020}, BLP henceforth, consider a
congress comprised of $n$ legislators, where ${\cal N}=\{1,\dots,n\}$ is the
set of legislators.  Each legislator has a pet legislative project that
she/he wants to implement.  The goal of each legislator is to maximize
her/his legislative effectiveness, measured by the probability of
implementing the project.  They assume that legislator $i$'s legislative
effectiveness at the $r$-th congress $\mathbf{y}_{\mathbf{r}, i}$ is a
function of $i$'s characteristics, her/his effort and the legislative
effectiveness of all the legislators that $i$ has befriended.  Specifically,
the technology is assumed to be:
%
\begin{equation}
\mathbf{y}_{\mathbf{r}, i}=A_{r, i}+\varphi \sqrt{\sum_{j}g_{i,j}\mathbf{y}_{\mathbf{r}, j}}\cdot l_{i}
\label{E}
\end{equation}
%
Equation~\ref{E} represents the ``production
function'' for legislative effectiveness.  The first term,
$A_{r, i}$, is a fixed effect idiosyncratic to $i$.\footnote{This term may
include a variety of characteristics that have been highlighted in the
existing literature as important for effectiveness: the legislator's
seniority, gender and race (potentially in the presence of
discrimination) and the legislator's position in the committee system and
party hierarchy.} The second term, which is new in our model, captures the
importance of social connections.  The social network is described by a $%
n\times n$ matrix $\boldsymbol{G}$ with the generic element $g_{i,j}$ that
measures the strength of the social influence of legislator $j$ on
legislator $i$.  The adjacency matrix can be as simple as tracking the
connections among legislator $i$ and $j$, for example, $g_{ij}=1$ if $ i$ is
connected to $j$ ($j\neq i$) and $g_{ij}=0$ otherwise.  We set $g_{ii}=0$. 
The level of effort is $l_{i}$; the cost of exerting a level of effort
$l_{i}$ is $\left( l_{i}\right) ^{2}/2$.

A strategy for a legislator is described by a function $l_{i}:{\cal
T\rightarrow }\left[ 0,1\right]$, mapping $i$'s type $A_{i}$ to an effort
level.  It is assumed that when the floor opens for business, each
legislator $i$ chooses her/his own level of effort $l_{i}$ simultaneously,
taking as given the social network and her/his own expectations of the other
legislators' effectiveness.\ Given the optimal reaction functions, the
levels of effectiveness are endogenously determined by Equation~\ref{E}.\footnote{
This approach is similar to that in general equilibrium theory in economics
where consumers choose their optimal consumption taking prices as given:
here legislators choose their levels of effort taking the other legislators'
effectiveness as given.  As in general equilibrium theory (where prices are
endogenous since they need to clear markets), here the levels of
effectiveness are endogenous since they must satisfy the externality
Equation~\ref{E} given the optimal effort levels.}

The optimal level of effort $l^{i}$ by a type $i=1,\dots,n$ solves the
problem:
%
\begin{equation*}
\max\limits_{l_{i}}\left\{ A_{i}+\varphi \left(
\sqrt{\sum_{j=1}^{n}g_{i,j}\mathbf{y}_{\mathbf{r}, j}}\right) \cdot
l_{i}-\left( l_{i}\right) ^{2}/2\right\} \text{,} \label{E_0}
\end{equation*}%
%
taking $\mathbf{y}_{\mathbf{r}}=(\mathbf{y}_{\mathbf{r},
1},\dots,\mathbf{y}_{\mathbf{r}, n})$ as given.  Substituting the solution to
this maximization problem in condition of Equation~\ref{E}, we obtain that the
equilibrium levels of legislative efficiency for a type $i=1,\dots ,n$ are
given by: $\mathbf{y}_{\mathbf{r}, i}=A_{r, i}+\frac{\varphi
^{2}}{2}\sum_{j=1}^{n}g_{i,j}\mathbf{y}_{\mathbf{r}, j}$.  These equations
can be expressed in matrix form as:%
%
\begin{equation}
\left[ I-\left( \varphi ^{2}/2\right) \cdot \boldsymbol{G}\right] \cdot \mathbf{y}_{\mathbf{r}}={\bf A_r}
\label{E_01}
\end{equation}%
%
where $\mathbf{y}_{\mathbf{r}}=(\mathbf{y}_{\mathbf{r},
1}(\boldsymbol{G},{\bf A_r}),\dots,\mathbf{y}_{\mathbf{r},
n}(\boldsymbol{G},{\bf A_r}))^{\prime }$ is the vector of legislative
effectiveness $\mathbf{y}_{\mathbf{r}, i}(\boldsymbol{G},{\bf A_r})$ solving
Equation~\ref{E_01}, and ${\bf A_r}=(A_{r,1},\dots ,A_{r,n})^{\prime}$ is the vector
of types' characteristics.  The equilibrium levels of effectiveness are
therefore uniquely defined as:
%
\[
{\bf (}\mathbf{y}_{\mathbf{r}, 1}(\boldsymbol{G},{\bf
A_r}),\dots,\mathbf{y}_{\mathbf{r}, n}(\boldsymbol{G},{\bf A_r}){\bf
)}^{\prime }=\left[ I-\left( \varphi ^{2}/2\right) \cdot
\boldsymbol{G}\right] ^{-1}{\bf A_r.}
\]%
%
that is the weighted Katz-Bonacich centrality of legislator $i$ in network
$\boldsymbol{G} $ with discount factor $\varphi ^{2}/2$\ and weights ${\bf
A_r}=(A_{r,1},\dots,A_{r,n})^{\prime}$.  In the presence of social
spillovers among connected legislators (i.e.,~$\varphi >0$), however, the
effectiveness of any legislator depends on the characteristics of all other
legislators, with each legislator weighted using their distance in the
network (the weights given by the rows of $\left[I-\left( \varphi
^{2}/2\right) \cdot \boldsymbol{G}\right] ^{-1}$).  The standard model is
nested as a special case of the more general model (with{\bf \ }$\varphi
=0$), and so we are able to test if social connections improve the fit of
our estimates of ${\bf E}$.

\subsection{Alternative setup}\label{sec:alternative}

Alternative microfoundations are games with linear-quadratic utilities that
capture linear externalities in agents' actions.  A popular setup is a
social network model of peer effects with conformity preferences.  Let
$\mathbf{y}_{\mathbf{r}, i}$ denote the legislator $i$'s legislative
effectiveness at the $r$-th congress.  Denote by
$\overline{\mathbf{y}}_{\mathbf{r}, i}$ the average effort of individual
$i$'s peers, given by:%
%
\begin{equation*}
\overline{\mathbf{y}}_{\mathbf{r}, i}=\frac{1}{\bar g_{i}}\sum_{j=1}^{n}g_{i, j}\mathbf{y}_{\mathbf{r}, j,}
\label{aver}
\end{equation*}%
%
Each legislator $i$ at the congress $r$ selects an effort
$\mathbf{y}_{\mathbf{r}, i}$, and obtains a payoff $u_{r,
i}(\mathbf{y}_{\mathbf{r}})$ that depends on the effort profile
$\mathbf{y}_{\mathbf{r}}$ in the following way:%
%
\begin{equation}
u_{r, i}(\mathbf{y}_{\mathbf{r}})=\left( a_{r, i}+\eta _{r}+\varepsilon _{r,
i}\right) \mathbf{y}_{\mathbf{r}, i,}- \frac{1}{2}\mathbf{y}_{\mathbf{r},
i,}^{2}-\frac{d}{2}\,(\mathbf{y}_{\mathbf{r},
i,}-\overline{\mathbf{y}}_{\mathbf{r}, i})^{2}
\label{utility}
\end{equation}%
%
where $d>0$.  The benefit part of this utility function is given by $\left(
a_{r, i}+\eta _{r}+\varepsilon _{r, i}\right)\mathbf{y}_{\mathbf{r}, i,}$
while the cost is $\frac{1}{2}\mathbf{y}_{\mathbf{r}, i,}^{2}$; both are
increasing in own effort $\mathbf{y}_{\mathbf{r}, i,}$.  In this part,
$a_{r, i}$ denotes the agent's ex-ante \textsl{idiosyncratic heterogeneity,}
which is assumed to be deterministic, perfectly \textsl{ observable} by all
individuals in the network and corresponds to the observable characteristics
of individual $i$ and to the observable average characteristics of
individual $i$'s peers.  To be more precise, $a_{r,i}$ can be written as:
%
\begin{equation*}
a_{r, i}=\sum_{m=1}^{M}\beta _{m}x_{r, i}^{m}+\frac{1}{\bar g_{i}}%
\sum_{m=1}^{M}\sum_{j=1}^{n}\theta _{m}g_{ij}\,x_{r, i}^{m}  \label{MUI}
\end{equation*}%
%
where $x_{r, i}^{m}$ is a set of $M$ variables accounting for observable
differences in individual characteristics of individual $i$, and $\beta
_{m},\theta _{m}$ are parameters.  In the utility function of Equation~\ref{utility})
$\eta _{r}$ denotes the unobservable network characteristics and
$\varepsilon _{r, i}$ is an error term, meaning that there is some
uncertainty in the benefit part of the utility function.  Both $\eta _{r}$
and $\varepsilon _{r, i}$ are observed by the individuals but not by the
researcher.  \ The second part of the utility function
$\frac{d}{2}\,(\mathbf{y}_{\mathbf{r},
i,}-\overline{\mathbf{y}}_{\mathbf{r}, i})^2$ reflects the influence of
peers' behavior on own behavior.  It is such that each individual wants to
minimize the \emph{social distance} between herself and her reference group,
where $d$ is the parameter describing the \emph{taste for conformity}. 
Here, the individual loses utility $\frac{d}{2}\,(\mathbf{y}_{\mathbf{r},
i,}-\overline{\mathbf{y}}_{\mathbf{r}, i})^2$ from failing to conform to
others.  This is the standard way economists have been modeling conformity
(see, among others, \citealp{Akerlof1980}, \citealp{Bernheim1994},
\citealp{Kandel1992}, \citealp{Akerlof1997}, \citealp{Fershtman1998},
\citealp{Patacchini2012},
\citealp{PatacchiniRainone2012}).\footnote{\cite{Ballester+Armengol+Zenou:2006}
and \cite{Armengol2009} present similar microfoundations for peer effects
where agents' behavior depends on the aggregate (rather than average)
behavior of peers.} The social norm $\overline{\mathbf{y}}_{\mathbf{r}, i}$\
can be interpreted as peers' social status.  Observe that the social norm
here captures the differences between individuals due to network effects. 
It means that individuals have different types of friends and thus different
reference groups $\overline{\mathbf{y}}_{\mathbf{r}, i}$.  As a result, the
social norm each individual $i$ faces is endogenous and depends on her
location in the network as well as the structure of the network.

In this game where agents choose their effort level $y_{i,k}\geq 0$
simultaneously, there exists an unique Nash equilibrium (see, e.g.,~\cite{Patacchini2012} given by:%
%
\begin{equation*}
\mathbf{y}_{\mathbf{r}, i,}^{\ast }=\phi \frac{1}{\bar g_{i}}\sum_{j=1}^{n_{k}}g_{ij}\mathbf{y}_{\mathbf{r}, j,}^{\ast
}+\left( 1-\phi \right) \left( a_{r, i}+\eta _{r}+\varepsilon _{r, i}\right) 
\label{FOC}
\end{equation*}%
%
where $\phi =d/(1+d)$. \ The optimal effort level depends on the individual
ex ante heterogeneity ($a_{r, i})$, on the unobserved network characteristics
($\eta _{r}$) and it is increasing with the average effort of the reference
group. 

\section{Network models of peer effects} \label{sec:models}

Following the notation defined in the previous section, for each network $r$
with adjacency matrix $\boldsymbol{G}=[g_{ij}]$, the $k$-th power of
$\boldsymbol{G}$ given by $\boldsymbol{G}^{k}=
\boldsymbol{G}\overset{(k\text{times})}{\boldsymbol{...}}\boldsymbol{G}$
keeps track of direct and indirect connections in $r$.  More precisely, the
$(i,j)$-th cell of $\boldsymbol{G}^{k}$ gives the number of paths of length
$k$ in $r$ between $i$ and $j$.  In particular,
$\boldsymbol{G}^{0}=\boldsymbol{I}$.
%
\begin{definition}[\citealp{Katz:1953,Bonacich:1987}] \label{Def1}
%
Given a vector $\boldsymbol{u}\in \mathbb{R}_{+}^{n}$, and $\phi\geq 0$ a
small enough scalar, the vector of Katz-Bonacich centralities of parameter
$\phi$ in network $g$ is defined as:
%
\begin{equation}
\boldsymbol{b}\left( g,\phi \right) =\left( \boldsymbol{I}-\phi
\boldsymbol{G}\right) ^{-1}\boldsymbol{u=}\sum\limits_{p=0}^{\infty}\phi
^{p} \boldsymbol{G}^{p}\boldsymbol{u}.
\label{KB}
\end{equation}
\end{definition}
%
The reduced form of the first order necessary and sufficient condition for
optimality in the behavioral model A developed by BP (see
Equation~\ref{fi_4}), can be written as:
%
\begin{equation}
\mathbf{y}_{\mathbf{r}}=\alpha \cdot \left( \boldsymbol{I}-\phi \boldsymbol{G%
}\right) ^{-1}+X_{r}^\top\mathbf{\beta }+\mathbf{\epsilon }_{r},
\label{s_r_2}
\end{equation}
%
where $\mathbf{y}_{\mathbf{r}}$ is the vector of outcomes for the $n$ agents
in network $r$,\footnote{In the context investigated by BLP, $\mathbf{y}$ is
the amount of money received by legislators from interest groups in support
for their electoral campaign.  Observe that model A can be applied to the
study of peer effects in other contexts where the behavior of the agents
under study is consistent with the theory underlying this model.  The same
reasoning applies for model B.  The interested readers is referred to the
recent reviews by
\cite{An:2011,An:2015a,Jackson+Rogers+Zenou:2017,Hsieh:2020,Zenou:2016} for
a comprehensive review of the many empirical applications of network models
of peer effects.} $X_{r}$ is a matrix collecting the characteristics of the
agents and $\mathbf{\epsilon }_{r}$ is a random error term.  The
coefficients $\alpha $, $\phi $ and $\mathbf{\beta }$ are the parameters to
estimate.  Model~\ref{s_r_2} can be written as
%
\begin{equation*}
\mathbf{y}_{r}=\alpha \cdot \boldsymbol{b}_{1}\left( g,\phi \right)
+X_{r}^\top\mathbf{\beta }+\mathbf{\epsilon }_{r},
\end{equation*}%
%
where $\boldsymbol{b}_{1}\left(g,\phi \right) =\boldsymbol{b}\left(g,\phi
\right) $ from Equation~\ref{KB} when $\boldsymbol{u}=1$.\newline For a
sample with $\bar{r}$ networks, one can stack up the data by defining
$y=(\mathbf{y}_{1}^\top,\cdots,\mathbf{y}_{\bar{r}}^\top)^\top$,
$\mathbf{\epsilon }=(\boldsymbol{\epsilon }_{1}^\top,\cdots
,\boldsymbol{\epsilon }_{_{\bar{r}}}^\top)^\top$,
$\mathbf{b}(\phi)=\left(\boldsymbol{b}\left(g,\phi
\right)^\top,\dots ,\boldsymbol{b}\left(g,\phi\right)^\top\right)^\top$,
$X=(\boldsymbol{X}_{1}^\top,\dots ,\boldsymbol{X}_{\overline{r}}^\top)$ and
$G=\mathrm{diag}\{\boldsymbol{G}_{r}\}_{r=1}^{_{\bar{r}}}$.  Observe that
the generic matrix $\boldsymbol{G}_{r}$ has dimension $n_{r}\times n_{r}$,
and $G$ has dimension $n\times n$, with $n=\sum_{r=1}^{\bar{r}} n_{r}$.  For
the entire sample, the model is:
%
\begin{equation}
y=\alpha \cdot \mathbf{b(}\phi \mathbf{)}+X\mathbf{\beta}+\mathbf{\epsilon }
% \tag{19 \code{lim}}
\label{NLLS}
\end{equation}
%
We extend Model~\ref{NLLS} by accounting for heterogeneity in network
spillovers.  Specifically, Model~\ref{NLLS} becomes:
%
\begin{equation}
y=\alpha \lbrack I-G(\phi I+\gamma \Lambda )]^{-1}1+X\mathbf{\beta
}+\mathbf{\epsilon }
% \tag{20 \code{het}}
\label{BP_2}
\end{equation}
% \setcounter{equation}{20}\noindent 
%
where $\Lambda =I\otimes z$ is a matrix with the values in the vector $z$ on
the diagonal, and all other values are 0.  Matrix $I$ has dimension $n
\times n$ and the vector $z$ has dimension has dimension $1\times n$.  The
vector $z$ represents a given characteristic of the agents.  Consistently,
the term $\gamma$ allows for the possibility that agents with different
characteristics may be more or less susceptible to social spillovers.

The reduced form of the first order necessary and sufficient condition for
optimality in the behavioral model developed by BLP (see
Equation~\ref{E_01}), is:
%
\begin{equation*}
\mathbf{y}_{r}=\left(I-\phi G\right) ^{-1}(\alpha
+X_{r}^\top\mathbf{\beta )}+\mathbf{\epsilon }_{r}
\label{BLP0}
\end{equation*}
%
which can be rewritten as:
%
\begin{equation}
\mathbf{y}_{r}=\boldsymbol{b}_{2}\left( g,\phi \right)+\mathbf{\epsilon}_{r}
% \tag{22 \code{lim}}
\label{BLP}
\end{equation}
%
where $\boldsymbol{b}_{2}\left( g,\phi \right) =\boldsymbol{b}\left( g,\phi
\right) $ from Equation~\ref{KB} when $\boldsymbol{u}=(\alpha
+\boldsymbol{X}_{r}^\top\mathbf{\beta)}$.  Equation~\ref{BLP} shows that, in
this case as well, the optimal behavior is proportional to a centrality
measure within the Family~\ref{KB}.  When we consider the case where the
parameter $\phi$ associated with network externalities is not constant
across agents, Model~\ref{BLP} in matrix formulation becomes:
%
\begin{equation}
y=(I-\theta \Lambda G)^{-1}(\alpha +X\mathbf{\beta })+\mathbf{\epsilon}
% \tag{23 \code{het\_l}}
\label{BLP_3a}
\end{equation}
%
\begin{equation}
y=(I-\eta G\Lambda)^{-1}(\alpha +X\mathbf{\beta })+\mathbf{\epsilon}
% \tag{24 \code{het\_r}}
\label{BLP_3b}
\end{equation}
%
In Equations~\ref{BLP_3a} and \ref{BLP_3b}, $\Lambda $\ is an identity
matrix with dimension $n \times n$.  In Equation~\ref{BLP_3a}, $\theta
=\theta _{0}+\theta _{1}z$, where $\theta _{0}$\ is a rescaling factor, and
$\theta _{1}$ quantifies the interaction between the adjacency matrix $G$
and the vector $z$.  Consequently, $\theta _{1}$ measures the extent to
which the peers of an agent with a given characteristic are susceptible to
her/his influence.  Similarly, in Equation~\ref{BLP_3b}, $\eta =\eta
_{0}+\eta _{1}z$, where $\eta _{0}$ is a rescaling factor, and $\eta _{1}$
measures the extent to which an agent with a given characteristic $z$ is
more susceptible to the influence of her/his peers.\footnote{For additional
details on these models see the online appendix of the paper by
\cite{Battaglini+Sciabolazza+Patacchini:2020}.}

In addition, BLP also consider the possibility of heterogeneous links
(rather than nodes).  We consider the case in which agents belong to two
different groups and interactions are different between and within groups. 
To allow for group effects, one can reorder the matrix $G$ so that the first
$n_{1}$ columns refer to agents in the first group, and the other
$n_{2}=n-n_{1}$ columns refer to agents in the second group.  The matrix $G$
can now be divided into four submatrices.  The submatrices in the main
diagonal (of dimensions $n_{1}\cdot n_{1}$ and $n_{2}\cdot n_{2})$ collect
the interactions within groups, whereas the remaining two submatrices (of
dimensions $n_{1}\cdot n_{2}$ and $n_{2}\cdot n_{1})$ collect interactions
between groups.  $G$ can thus be decomposed in two $n\cdot n$ matrices
$G_{wit}$ and $G_{btw}$, with $G=G_{wit}+G_{btw}$.  $G_{wit}$\ is a matrix
that has the same top left and bottom right components as $G$ and it is zero
otherwise, and $G_{btw}$ is a matrix that has the same bottom left and top
right components as $G$ and it is zero otherwise.  As a result, Model~\ref{BLP} becomes
%
\begin{equation}
y=(I-\phi _{1}G_{wit}-\phi _{2}G_{btw})^{-1}(\alpha +X\mathbf{\beta })+
\mathbf{\epsilon }
% \tag{25 \code{par}}
\label{BLP_4}
\end{equation}
% \setcounter{equation}{25}\noindent 
%
where $\phi _{1}$ captures within-group spillovers, and $\phi _{2}$
between-group spillovers.  The taxonomy of models outlined above shows that
centrality measures should be calculated according to the most appropriate
behavioral model describing the agents' behaviors and the requested level of
heterogeneity of agents and network links.  In BP and BLP we bring these
theories to the data and find support for their predictions in different
contexts.

\subsection{Estimation} \label{sec:estimation}

Models~\ref{NLLS} and~\ref{BLP} cannot be estimated by a simple OLS
regression in which $\mathbf{y}$ represents the dependent variables and
$\mathbf{b}(\phi)$ and $X$ are the independent variables because
$\mathbf{b}(\phi)$ is a nonlinear function of a parameter to be estimated,
$\phi $.  We can, however, obtain estimates for $\alpha $, $\phi$ and
$\mathbf{\beta}$ using NLLS or ML.

The NLLS requests solving the nonlinear least-squares problem for
Equation~\ref{NLLS} and~\ref{BLP}.  This task is performed by \pkg{econet}
using the Levenberg-Marquardt algorithm implemented by \cite{Box:1969} in
the \proglang{R}~package \pkg{minpack.lm} developed by \cite{minpack.lm}. 
The details of how the NLLS works in practice can be found in
\cite{More:1978}.

The ML estimation requests ML functions that can be derived as follows, by
assuming $\mathbf{\varepsilon}\sim N(0,\sigma ^{2}I).$ For
Equation~\ref{NLLS}, assume $\mathbf{\varepsilon }\sim N(0,\sigma ^{2}I)$,
the log likelihood function is
%
\begin{equation*}
\ln \left( L\right) =-\frac{n}{2}\ln \left( 2\pi \right) -\frac{1}{2}\ln
\sigma ^{2}-\frac{1}{2}\left[y-(I-\phi G)^{-1}1-X\mathbf{\beta }%
\right] ^{\prime }\left[ y-(I-\phi G)^{-1}1-X\mathbf{\beta }\right]
/\sigma ^{2}
\end{equation*}
%
where $n$\ is the total sample size.  Equation~\ref{BP_2} will have the same
form as Equation~\ref{NLLS}, except substituting $\phi G$\ with $G(\phi
I+\gamma \Lambda)$.

In a similar fashion, we consider the following maximum likelihood functions
for Equation~\ref{BLP}, ~\ref{BLP_3a}, ~\ref{BLP_3b} and ~\ref{BLP_4}.  For
Equation~\ref{BLP}, the log likelihood function corresponding
to Equation~\ref{MLE_BLP} is:
%
\begin{equation}
y=\frac{n}{2}\ln \left( 2\pi \right) -\frac{1}{2}\ln \lvert\Omega \rvert-\frac{1}{2}%
\left[ y-(I-\phi G)^{-1}X\mathbf{\beta }\right] ^{\prime }\Omega ^{-1}\left[
y-(I-\phi G)^{-1}X\mathbf{\beta }\right]  
\label{MLE_BLP}
\end{equation}
%
where $\Omega =\sigma ^{2}(I-\phi G)^{-1}(I-\phi G^{\prime })^{-1}$. 
Equation~\ref{BLP_3a}, ~\ref{BLP_3b} and ~\ref{BLP_4} have the same
likelihood function as above except substituting $\phi G$\ with $\theta
\Lambda G$\ or $\eta G\Lambda $\ or $\phi _{1}G_{wit}-\phi _{2}G_{btw}$. 
When the network is sparse, isolates (i.e.,~individuals with no neighbors)
will exist.In this case, we can modify the likelihood function to expedite
the process.  Rewrite
%
\begin{equation*}
G=\left( 
\begin{array}{cc}
G_{n_{1}\times n_{2}}^{c} & \mathbf{0}_{n_{1}\times n_{2}} \\ 
\mathbf{0}_{n_{2}\times n_{1}} & \mathbf{0}_{n_{2}\times n_{1}}%
\end{array}\right)
\end{equation*}
%
where $n_{1}$ is the size of all connected individuals and $n_{2}$ is the
sizes of isolates.
%
\begin{equation*}
\left( I-\phi G\right) ^{-1}=\left( 
\begin{array}{cc}
\left(I-\phi G_{n_{1}\times n_{2}}^{c}\right) ^{-1} & \mathbf{0}_{n_{1}\times n_{2}}
\\
\mathbf{0}_{n_{2}\times n_{1}} & I_{n_{2}\times n_{1}}
\end{array}\right)
\end{equation*}
%
Define $y=\left( y^{c^{\prime }},y^{u^{\prime }}\right) ^{\prime }$ and
$X=\left( X^{c^{\prime }},X^{u^{\prime }}\right) ^{\prime }$.  The
likelihood function (Equation~\ref{MLE_BLP}) can be transformed into:
%
\begin{equation}
\begin{split}
y = &\frac{n_{1}}{2}\ln \left( 2\pi \right) -\frac{1}{2}\ln \lvert\Omega
_{1}\rvert-\frac{1}{2}\left[ y^{c}-(I-\phi G^{c})^{-1}X^{c}\mathbf{\beta }
\right] ^{\prime}\Omega _{1}{}^{-1}\left[y^{c}-(I-\phi G^{c})^{-1}X^{c}
\mathbf{\beta }\right] + \\
& \frac{n_{2}}{2}\ln \left( 2\pi \right) - \frac{n_{2}}{2}\sigma
^{2}-\frac{1}{2}\left[ y^{u}-X^{u}\mathbf{\beta}\right] ^{\prime }\left[
y^{u}-X^{u}\mathbf{\beta}\right]
\label{ML_ISO}
\end{split}
\end{equation}
%
Where $\Omega _{1}=\sigma ^{2}(I-\phi G^{c})^{-1}(I-\phi G^{c})^{-1}$.

Our package allows the adjacency matrix to be used as a direct input.  This
setting simplifies the data processing procedure compared with other
\proglang{R}~packages like \pkg{spdep} when dealing with social network data.  Packages
like \pkg{spdep} are designed for spatial data.  In this environment, the
network data is required to be imported as neighbor pairs.  However, social
network data differs from spatial data since isolates may exists (nodes that
have no connection with all other nodes).  Networks containing isolates are
not compatible with the data structure for packages like \pkg{spdep}.

Our package not only provides a way to get around this problem but also
proposes an efficient algorithm when including those isolates.  Instead of
inverting the entire adjacency matrix, we show in the algebra above that one
only needs to invert the adjacency matrix for connected nodes.  The
likelihood function can be written as a sum of the spatial auto-regressive
(SAR) likelihood function for connected nodes and a standard linear
likelihood function of isolates (see Equation~\ref{ML_ISO}).

\section{Addressing network endogeneity} \label{sec:endogeneity}

In many real-world contexts, the network topology is the result of the
choices of the agents as much as their behavior over the observed
topologies.  As a result, the data structure can be endogenous, and
inference neglecting this issue would be invalid.  The simplest way to
tackle the problem is to model network formation using a homophily model
\cite[see
e.g.,][]{Fafchamps+Gubert:2007,Mayer+Puller:2008,Lai+Reiter:2017,Apicella+Marlowe+Fowler+Christakis:2012,Attanasio+Barr+Cardenas+Genicot+Meghir:2012}
where the existence of a link between $i$ and $j$, $g_{i,j}$, is explained
by the distance between $i$ and $j$ in terms of characteristics, according
to the model
%
\begin{equation}
g_{i,j}=\delta _{o}+\sum_{l}\delta _{l+1}\lvert x_{i}^{l}-x_{j}^{l}\rvert+u_{i,j}
\label{NF1}
\end{equation}
%
where $x_{i}^{l}$ for $l=1,\dots, L$ are $i$'s characteristics.  As standard in
the literature on dyadic link formation, the main assumption underlying
Model~\ref{NF1} is dyadic independence, i.e.,~the assumption that each
agent's choices are not influenced by others' decisions, and therefore each
link in the network occurs with the same probability.\footnote{This
approach can be extended to dyadic dependence using latent space models or
exponential random graph models \citep[see][for a discussion]{An:2011}.}

\cite{Fafchamps+Leij+Goyal:2010} and \cite{Graham:2015} suggest a variation
of this model where this assumption can be tested.  They suggest to include
in the model the length of the shortest distance between $i$ and $j$
\citep{Fafchamps+Leij+Goyal:2010}, or the number of shared friends between
$i$ and $j$ \citep{Graham:2015}.

Let us denote this additional variable$, \kappa _{i,j}$:
%
\begin{equation}
g_{i,j}=\delta _{o}+\delta _{2}\kappa _{i,j}+\sum_{l}\delta
_{l+w}\lvert x_{i}^{l}-x_{j}^{l}\rvert +u_{i,j}  
\label{NF2}
\end{equation}
%
A statistically significant estimate of the parameter $\delta _{2}$ would
suggest that the presence of a link depends on the presence of links at path
lengths higher than 2 (or on the number of shared friends), thus indicating
a violation of the hypothesis of dyadic independence.

A different variation of Model~\ref{NF1} is proposed by
\cite{Graham:2016}.  The model in \cite{Graham:2016} accounts for agents'
unobserved heterogeneity by adding fixed effects for agents $i$ and $j$:
%
\begin{equation}
g_{i,j}=\delta _{o}+\delta _{1}\omega _{i,j}+\sum_{l}\delta
_{l+1}\lvert x_{i}^{l}-x_{j}^{l} \rvert+\iota _{i}+\iota _{j}+u_{i,j},  \label{NF3}
\end{equation}
%
These models of network formation can help mitigate concerns about network
endogeneity in linear models of peer effects, such as Models~\ref{NLLS}
and~\ref{BLP}, in one of two ways: i) they can be used to predict network
connections on the basis of exogenous agents' characteristics and then use
the predicted network topology as an instrument for the actual network
structure, or ii) they can be used as a first step selection equation to
derive a correction for network endogeneity \`{a} la Heckman.\footnote{See
\cite{Heckman:1979}, who first proposed this technique.  For different
applications of the Heckman approach in spatial statistics and for network
models see \cite{Johnnson+Moon:2019,Qu+Lee:2015}.}

BLP follow approach ii.  Under the assumption that $\varepsilon
_{r}=(\varepsilon _{r,1},\dots ,\varepsilon _{r, n})^{\prime}$ and
$\{(u_{i,j,r})\}_{i,j}$\ are jointly normal with $\E(\epsilon
_{i,r}^{2})=\sigma _{\epsilon }^{2}$, $\E(\epsilon_{i,r}u_{i,j,r})=\sigma
_{\epsilon u }$ for all $i\neq j$, $\E(u_{i,j,r}u _{ik,r})=\sigma _{u }^{2}\
\forall j=k$, and $\E(u_{i,j,r}u _{i,k,r})=0\ \forall j\neq k$, the expected
value of the error term conditional on the link formation is $\E(\epsilon
_{i,r} \mid \{u_{i,j,r}\}_{j \neq i})=\psi \cdot \sum_{j\neq i}u _{i,j,r}$, where
$\psi =\sigma _{\epsilon u }/\sigma _{u }^{2}$.  It follows that Model~\ref{BLP}
can be written as:
%
\begin{equation}
\mathbf{y}_{r}=\left[ I-\phi \boldsymbol{G}\right] ^{-1}\cdot \left[ \alpha
\cdot \mathbf{1}+\boldsymbol{X}_{r}\mathbf{\beta +}\psi \mathbf{\xi }_{r}%
\mathbf{+\varepsilon }_{r}\right]   \label{final_0}
\end{equation}%
%
where $\xi _{i,r}=\sum_{j\neq i}u_{i,j,r}$ with $\mathbf{\xi
}_{r}=(\xi_{i,r},\dots,\xi _{n,r})^{^{\prime }}$.  The term $\psi \mathbf{\xi
}_{r}$ now captures the selection bias.  The model can then be estimated
with NLLS (see Section~\ref{sec:models}).  Under these assumptions, approach
i) and approach ii) should produce similar results since both deliver
consistent estimators.

Under approach i), however, inference is complicated because the selectivity
term $\xi$ is a generated regressor from a previous estimation and no closed
form solution is available for the NLLS standard errors estimates in a
network context.  For this reason, BLP use bootstrapped standard errors. 
Because of the inherent structural dependency of network data, the design of
the resampling scheme for this bootstrap procedure needs special
consideration.  The residual vector in Equation~\ref{final_0} does not
contain i.i.d.  elements, and one cannot sample with replacement from this
vector.  Then, BLP use the residual bootstrap procedure, which is common in
spatial econometrics \citep[see][]{Anselin:1990}, where resampling is
performed on the structural errors, under the assumption that they are
i.i.d.  In practice, the vector of structural errors are derived from
$\mathbf{\varepsilon}=[I-\widehat{\phi}G]\boldsymbol{u}$, where
$\boldsymbol{u}$ is the residual vector from Equation~\ref{NF3}.

An important challenge in using the two step procedures for both approach i)
and ii) is finding exclusion restrictions, that is factors that affect
network formation only.\footnote{Technically, the two step model is
identified even using exactly the same set of regressors in both stages
since the dyad-specific repressors used in the first stage (the network
formation stage) are expressed in absolute values of differences.  These
differences do not appear in the outcome equation.  Identification is thus
achieved by exploiting non-linearities specific to the network structure of
our model.  While this strategy has been used in the applied network
literature \citep[see e.g.,][]{GPI:2013,CSL:2016}, it may be a tenous source
of identification in some cases.} This is notoriously a difficult task.  BLP
use an original instrument: connections between agents which are made during
adolescence.  Those connections are powerful predictors of social contacts
later on in life, but clearly predetermined to decisions taken in the
adulthood.

\section[Implementing econet]{Implementing \pkg{econet}}\label{sec:econet}

We now turn the discussion to the implementation of the functions contained
in \pkg{econet}.  \pkg{econet} implements the set of linear models of social
interactions introduced in Section~\ref{sec:theory}, where an agent's outcome is a function
of the outcomes of the connected agents in the network.  The routines
provide both NLLS and ML estimators.  The possible sources of endogeneity
that could hinder the identification of a causal effect in the model can be
addressed by implementing the two-step correction procedures described in
Section~\ref{sec:models}.  The estimated parameter capturing the impact of the social
network on an agent's performance is then used to measure the individual
importance in the network, obtaining a weighted version of Katz-Bonacich
centrality.  Finally, the explanatory power of the parameter-dependent
centrality can be compared with those of standard measures of network
centrality.  It is worth emphasizing that \pkg{econet} allows the inclusion
of unconnected agents, for whom the Katz-Bonacich centrality is constant.

Specifically, \pkg{econet} provides four functions.  The first one is
\code{net\_dep}, which allows one to estimate a model of social interactions
and compute the relative weighted Katz-Bonacich centralities of the agents. 
Different behavioral models can be chosen (i.e.,~those provided by BP and
BLP).  Moreover, the hypothesis of homogeneous or heterogeneous spillovers can
be tested.  The second function is \code{boot}, which is built to obtain
valid inference when the NLLS estimator with Heckman correction is used. 
The third function is \code{horse\_race}, which allows one to compare the
explanatory power of parameter-dependent centralities relative to other
centrality measures.  The fourth function is \code{quantify}, and it is used
to assess the effect of control variables in the framework designed by BLP.

\subsection{Detailing the functions}

The modeling choices presented so far are implemented by the function
\code{net\_dep}. The first three arguments of this function are: i) \code{formula}, an object
of class `\code{formula}' which specifies the independent variable and the
controls; ii) \code{data}, an object of class `\code{data.frame}' containing
the values of the variables included in \emph{formula}, iii) \code{G}, an
object of class `\code{Matrix}' where the generic element $g_{ij}$ is used to
track the connection between $i$ and $j$ in the social network.  \code{G}
can be unweighted (i.e.,~$g_{ij}=1$ if $i$ and $j$ are connected, and 0
otherwise), or weighted (e.g.,~$g_{ij}$ collects the intensity of the
relations between $i$ and $j$) and column-normalized.  The matrix must be
arranged in the same order of the data, and row and column names must
indicate agents' ids.

The next two arguments in \code{net\_dep}, \code{model} and
\code{hypothesis}, are used to specify the model to be estimated through an
object of class `\code{character}', as documented in Table~\ref{tab:table1}. 
Specifically, the argument \code{model} indicates the framework to be
applied: i.e.,~it is set to \code{Model\_A} or \code{Model\_B} to implement
respectively the framework by BP and BLP.  The argument \code{hypothesis }
is necessary to indicate whether peer effects are assumed to be homogeneous
\code{model = "lim"} or heterogeneous (\code{model = c("het", "het\_l",
"het\_r")}).
%
\begin{table}[t!]
\centering
\resizebox{\textwidth}{!}{
\begin{tabular}{lllp{6cm}p{5.5cm}}
\hline
Model & Hypothesis & Equation & \multicolumn{2}{l}{Centrality measure
$\boldsymbol{b}\left(g,\phi \right)$} \\ \hline
A & \code{lim} & \ref{NLLS} & 
$\phi$: homogeneous & $\boldsymbol{b}\left(g,\phi \right) =\left(I-\phi
G\right) ^{-1}1$\\
& \code{het} & \ref{BP_2} & $\phi$: heterogeneous by node type &
$\boldsymbol{b}\left( g,\phi \right) =[I-G(\phi I+\gamma \Lambda )]^{-1}1$\\ \hline
B & \code{lim} & \ref{BLP} & $\phi$: homogeneous &
$\boldsymbol{b}\left( g,\phi \right) =\left( I-\phi G\right) ^{-1}1$\\
& \code{het\_l} & \ref{BLP_3a} & $\phi$: heterogeneous outgoing influence by
node type & $\boldsymbol{b}\left( g,\phi \right) =(I-\theta \Lambda G)^{-1}1$\\
& \code{het\_r} & \ref{BLP_3b} & $\phi$: heterogeneous ingoing influence by node
type & $\boldsymbol{b}\left(g,\phi \right) =(I-\eta G\Lambda )^{-1}1$\\
& \code{par} & \ref{BLP_4} & $\phi$: heterogeneous by link type &
$\boldsymbol{b}\left(g,\phi \right)=(I-\phi_{1}G_{wit}\phi_{2}G_{btw})^{-1}1$\\
\hline
\end{tabular}
}
\caption{\label{tab:table1} Field specification in \code{net\_dep}.}
\end{table}
%

The argument \code{z} is used to specify the source of heterogeneity for the
peer effects parameter $\phi$ (i.e.,~the variable $z$ in
Equations~\ref{BP_2},~\ref{BLP_3a},~\ref{BLP_3b}), or the groups in which the
network should be partitioned when \code{model = "par"}.  Specifically,
\code{z} is a numeric vector where the generic element $i$ refers to agent's
$i$ characteristic (e.g.,~it takes 1 if $i$ is a female, and 0 otherwise).

In order to correct for the potential bias arising from network endogeneity,
we include in the function four arguments: i) \code{endogeneity}, a logical
object equal to \code{TRUE} if \code{net\_dep} should implement the two-step
correction procedure as e.g.,~in Model~\ref{final_0}, and \code{FALSE}
otherwise; ii) \code{correction}, an object of class `\code{character}' that
is set to indicate whether \code{net\_dep} should implement a Heckman
correction (\code{correction = "heckman"}), or an instrumental variable
approach (\code{correction = "iv"}); iii) \code{exclusion\_restriction}, a
object of class `\code{Matrix}' used to specify the matrix to instrument the
endogenous network; iv) \code{first\_step}, an object of class
`\code{character}' which specifies the network formation model to be used in
the first step of the procedure for both \code{correction = "heckman"} and
\code{correction = "iv"}.  This argument can be equal to \code{standard}
(Equation~\ref{NF1}), \code{shortest} \citep[Equation~\ref{NF2}, as
in][]{Fafchamps+Leij+Goyal:2010}, \code{coauthor} \citep[Equation~\ref{NF2},
as in][]{Graham:2015}, \code{fe} (Equation~\ref{NF3}), and \code{degree},
where the difference in degree centrality between agents is an additional
regressor of Equation~\ref{NF1}.

Finally, the argument \code{estimation} allows the user to choose the
estimation technique.  This is an object of class `\code{character}' that can
be one of two options: \code{NLLS} (nonlinear least squares), which
implements the Levenberg-Marquardt optimization algorithm for solving the
nonlinear least-squares problem using the function \code{nlsLM} contained in
the \proglang{R}~package \pkg{minpack.lm} \citep{minpack.lm} or \code{MLE}
(maximum likelihood), which uses the function \code{mle2} of the
\proglang{R}~package \pkg{bbmle} to implement the quasi-Newton method by
\cite{bbmle} for maximum-likelihood bound constrained optimization.

The complete list of all the inputs of the function \code{net\_dep} is
available in the help page of the package \pkg{econet}, and it is accessible
from \proglang{R} by running the code \code{?net\_dep}.

The output of \code{net\_dep} consists of a list of three objects: i) the
point estimates and relative standard errors of the model's parameters; ii)
the vector of agents' network centrality; iii) the point estimates and
relative standard errors of the parameters of the first stage model, if
\code{endogeneity = TRUE}.

We provide below a list of examples to illustrate the functionality of
\code{net\_dep}.  Since in all examples we reject the hypothesis of
normality of the errors, the NLLS estimation method is
used.\footnote{Observe that in the examples presented in the paper, a set of
starting values is used to estimate NLLS with the function \code{net\_dep}. 
When these are not provided by the user, \code{net\_dep} uses some random
values, and NLLS takes significantly more time to converge in this case. 
The reader interested in learning more about how starting values should be
used when running NLLS estimations can refer to \cite{Box:1969}.  Additional
details on how to specify starting values with \code{net\_dep} can be found
by running the code \code{?net\_dep} from \proglang{R}.}

\vspace*{-0.2cm}

\subsubsection{Exercise 1: Katz-Bonacich centrality with parameter constant
across agents}

In the first example, we estimate the association between a Congress
member's network centrality and the amount of dollars he received from
interest groups to finance his electoral campaign for the 111th Congress
using Model~\ref{NLLS}.  The variables used to control for the effect of
legislators' characteristics are: party affiliation (\code{party}); gender
(\code{gender}); chairmanship (\code{nchair}); whether or not the Congress
member has at least one connection in the network (\code{isolate}).

The network used for this exercise represents the connections between agents
which are made during adolescence.  The network is constructed using
information on the educational institutions attended by the Congress
members.  Specifically, we assume that a tie exists between two Congress
members if they graduated from the same institution within eight years of
each other.  We set a link between two Congress members, $g_{ij}$, to be
equal to the number of schools they both attended within eight years of each
other; then we row normalize the social weights so that $\sum_{i}g_{ij}=1$
for any $i$.  This analysis is a simplified version (in terms of both data
and controls included in the model specification) of the analysis in BP.

\vspace*{-0.4cm}

\begin{CodeChunk}
\begin{CodeInput}
R> library("econet")
R> set.seed(2)	
R> data("a_db_alumni", package = "econet")
R> data("G_alumni_111", package = "econet")
R> db_model_A <- a_db_alumni
R> G_model_A <- a_G_alumni_111
R> are_factors <- c("party", "gender", "nchair", "isolate")
R> db_model_A[are_factors] <- lapply(db_model_A[are_factors], factor)
R> db_model_A$PAC <- db_model_A$PAC/1e+06
R> f_model_A <- formula("PAC ~ gender + party + nchair + isolate")
R> starting <- c(alpha = 0.47325, beta_gender1 = -0.26991,
+    beta_party1 = 0.55883, beta_nchair1 = -0.17409,
+    beta_isolate1 = 0.18813, phi = 0.21440)
R> lim_model_A <- net_dep(formula = f_model_A, data = db_model_A,
+    G = G_model_A, model = "model_A", estimation = "NLLS",
+    hypothesis = "lim", start.val = starting)
R> summary(lim_model_A)
\end{CodeInput}
\begin{CodeOutput}
Call:
Main Equation: PAC ~ alpha * solve_block(I - phi * G) %*% Ones + 
beta_gender1 * gender1 + beta_party1 * party1 + 
beta_nchair1 * nchair1 + beta_isolate1 * isolate1
              Estimate Std. Error t value Pr(>|t|)    
alpha          0.47325    0.17969   2.634  0.00876 ** 
beta_gender1  -0.26991    0.10504  -2.570  0.01052 *  
beta_party1    0.55883    0.08363   6.682  7.5e-11 ***
beta_nchair1  -0.17409    0.19004  -0.916  0.36016    
beta_isolate1  0.18813    0.18196   1.034  0.30178    
phi            0.21440    0.27005   0.794  0.42768    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC: 1042.55  loglik: -514.28
\end{CodeOutput}
\end{CodeChunk}
%
In this model, the estimate of the spillover effect ($\phi$), assumed to be
the same for all Congress members, is positive (even though is not
statistically significant in this example): The estimated value of $\phi$ is
used to calculate the weighted Katz-Bonacich centrality of the agents (as
mentioned, this value can be extracted from the second object stored in the
output of \code{net\_dep}: e.g.,~\code{lim_model_A\$centrality}).  The
estimate of the intercept $\alpha$ directly measures the impact of this
network centrality measure.  Its interpretation is akin to the one of an
estimated coefficient in a linear regression model.  We find a positive
effect of a legislator's centrality on campaign contributions, showing that
more connected Congress members are likely to receive more attention from
interest groups.  Specifically, when the Katz-Bonacich centrality of agent
$i$ increases by one unit, the amount of dollars received by
$i$ from interest groups increases by $0.47\times 1,000,000 = 470,000\$$. 
The same logic can be applied to interpret the other estimated coefficients.

Let us now repeat the same exercise for Model~\ref{BLP}.  We use a
simplified version (in terms of both data and controls included in the model
specification) of the analysis presented in BLP.  Our goal here is to
investigate the association of legislative networks with Congress members'
legislative effectiveness score (LES).  Network ties are defined here as the
number of bills that $j$ cosponsored with $i$.  Also in this case, we impose
that $\sum_{i}g_{ij}=1$.  The underlying idea is that legislators'
productivity is affected by the productivity of the other legislators with
whom they interact.  However, while alumni connections are formed during
adolescence and can thus be reasonably assumed to be exogenous to a Congress
member's political activity, cosponsorships are instead endogenous, since
legislators are clearly strategic in choosing with whom to cosponsor a bill. 
The function \code{net\_dep} allows the users to control for network
endogeneity using the Heckman correction procedures described in Section~\ref{sec:estimation}.
The legislators' characteristics used in this context
are the same used in the previous example except the variable
\code{isolate}, since all Congress members have at least one cosponsorship
link.
%
\begin{CodeChunk}
\begin{CodeInput}
R> data("db_cosponsor", package = "econet")
R> data("G_alumni_111", package = "econet")
R> db_model_B <- db_cosponsor
R> G_model_B <- G_cosponsor_111
R> G_exclusion_restriction <- G_alumni_111
R> are_factors <- c("gender", "party", "nchair")
R> db_model_B[are_factors] <- lapply(db_model_B[are_factors] , factor)
R> f_model_B <- formula("les ~gender + party + nchair")
R> starting <- c(alpha = 0.23952, beta_gender1 = -0.22024,
+    beta_party1 = 0.42947, beta_nchair1 = 3.09615,
+    phi = 0.40038, unobservables = 0.07714)
R> lim_model_B <- net_dep(formula = f_model_B, data = db_model_B,
+    G = G_model_B, model = "model_B", estimation = "NLLS",
+    hypothesis = "lim", endogeneity = TRUE, 
+    correction = "heckman", first_step = "standard", 
+    exclusion_restriction = G_exclusion_restriction, 
+    start.val = starting)
R> summary(lim_model_B)
\end{CodeInput}
\begin{CodeOutput}
Call:
Main Equation: les ~ solve_block(I - phi * G) %*% (alpha * Ones + 
beta_gender1 * gender1 + beta_party1 * party1 + 
beta_nchair1 * nchair1 + beta_unobservables * unobservables)
                   Estimate Std. Error t value Pr(>|t|)    
alpha               0.23952    0.07130   3.359 0.000851 ***
beta_gender1       -0.22024    0.14052  -1.567 0.117787    
beta_party1         0.42947    0.10111   4.247 2.65e-05 ***
beta_nchair1        3.09615    0.25951  11.931  < 2e-16 ***
phi                 0.40038    0.06136   6.525 1.90e-10 ***
beta_unobservables  0.07714    0.06138   1.257 0.209519    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC: 1342.51  loglik: -664.25
\end{CodeOutput}
\begin{CodeInput}
R> summary(lim_model_B, print = "first.step")
\end{CodeInput}
\begin{CodeOutput}
First step:  y ~ exclusion_restriction + gender1 + party1 + nchair1
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            1.481e-03  3.809e-05  38.887  < 2e-16 ***
exclusion_restriction  4.657e-03  4.129e-04  11.279  < 2e-16 ***
gender1               -6.638e-05  2.135e-05  -3.109  0.00188 ** 
party1                 2.366e-03  1.940e-05 121.922  < 2e-16 ***
nchair1               -4.180e-04  3.436e-05 -12.163  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R2: 0.07
\end{CodeOutput}
\end{CodeChunk}
%
The output of function \code{summary(lim_model_B, print = "first.step")}
reports the estimates of the first step Model~\ref{NF1}.  The
interpretation of the results from the first step model follows the standard
interpretation of the results of a linear probability model.  Being
connected in the alumni network (\code{G\_exclusion\_restriction}) has a
positive and significant impact on the probability that two legislators will
cosponsor a bill, hence the alumni network is an important predictor of the
cosponsorship network (\code{G\_Model\_B}).

The output of the function \code{summary(lim_model_B)} presents the
estimation results of Model~\ref{final_0}.  Because of the two-step
procedure, standard errors of Model~\ref{BLP} are bootstrapped using the
function \code{boot}.  This function takes the following arguments: i)
\code{fit}, the first object of \code{net\_dep}'s output; ii) \code{group},
a numeric vector used to specify if the resampling should be performed
within specific groups; iii) \code{niter}, an object of class `\code{numeric}' which
indicates the iterations of the bootstrap; iv) \code{weights}, a logical
object equal to \code{TRUE} if the object \code{fit} is estimated with the
argument \code{to\_weight} different from \code{NULL}, and \code{FALSE}
otherwise (the second is the default option); \code{parallel}, a logical
object equal to \code{TRUE} if the user wants to use parallel computation,
and \code{FALSE} otherwise (the second is the default option);
\code{ncores}, an object of class `\code{numeric}' which indicates the number of
cores to be used for running parallel computation.\footnote{Please note that
this function can take considerable time to run.  In addition, the argument
\code{parallel} has been fully tested only within the Windows operating
system.  We plan in the future to make this option available across other
major operating systems.}
%
\begin{CodeChunk}
\begin{CodeInput}
R>  boot_lim_estimate <- boot(object = lim_model_B, hypothesis = "lim",
+    group = NULL, niter = 2, weights = FALSE, parallel = FALSE, 
+    ncores = NULL)
R>  boot_lim_estimate
\end{CodeInput}
\begin{CodeOutput}
                   coefficient boot.Std.Error boot.t.value boot.p.value
alpha               0.23951648     0.08747214     2.738203 6.432331e-03
beta_gender1       -0.22023753     0.14368757    -1.532753 1.260671e-01
beta_party1         0.42946569     0.11875662     3.616352 3.339708e-04
beta_nchair1        3.09614813     0.24416950    12.680323 1.487650e-31
phi                 0.40038484     0.06001437     6.671483 7.756648e-11
beta_unobservables  0.07714043     0.05965438     1.293123 1.966581e-01
\end{CodeOutput}
\end{CodeChunk}
%
The results of this second exercise show a positive and significant network
effects ($\phi$) on the effectiveness of agents, meaning that Congress
members benefit from their interactions with the colleagues conscripted to
their own causes.  It is worth noting that network endogeneity does not seem
to be a major concern in this simple example, since the correlation between
the unobservables of link formation and outcome equation ($\phi$ in Model~\ref{BLP})
is not statistically significant.

Observe that while in Model~\ref{NLLS} the estimated effect of network
centrality is captured by one parameter $\alpha$, in Model~\ref{BLP} it
requires further elaboration since it varies with individual
characteristics.  For the $k$-th covariate in Model~\ref{BLP}, if $\phi
>0$, the centrality's marginal effect is $(I-\varphi G)^{-1}(I\beta _{k})$,
which is a $n\cdot n$ matrix with its $(i,j)$-th element representing the
effect of a change in the characteristic $k$ for agent $\,j$ on the outcome
of agent $i.$ The diagonal elements capture the direct effect of a marginal
change in the characteristic $k$ for agent $\,i.$ The elements outside the
diagonal instead capture the indirect effects, that is the effects on the
outcome of $i$ triggered by variation of the characteristic $k$ in other
agents.  The direct effects are comparable to the OLS estimated effects
without considering the network effects.  The important difference in
comparing the estimates of the covariates in the models with and without
network effects is precisely that when $\phi>0$, the marginal effect of the
$k$-th\ covariate in Model~\ref{BLP} is not just $\beta_{k}$ but it also
depends on the individual's position in the network (i.e.,~on the
individual's network centrality).

The function \code{quantify} allows the user to run this task. 
Specifically, it provides the estimated impacts of the agents'
characteristics with network effects (which is equivalent to the estimated
impacts of agents' network centrality by characteristic) and compares them
with the OLS estimates.  Because, as we said before, the marginal effects of
characteristics are different for different agents, the function
\code{quantify} reports the mean, standard deviation, maximum and minimum
for both direct and indirect effects.  It requires the argument \code{fit},
which is the output of either the function \code{net\_dep} or
\code{horse\_race} (discussed below).
%
\begin{CodeChunk}
\begin{CodeInput}
R> quantify(fit = lim_estimate_model_B)
\end{CodeInput}
\begin{CodeOutput}
                      beta Direct_mean Direct_std Direct_max Direct_min
beta_gender1       -0.2202     -0.2205     0.0002    -0.2202    -0.2217
beta_party1         0.4295      0.4299     0.0003     0.4323     0.4295
beta_nchair1        3.0961      3.0992     0.0025     3.1162     3.0961
beta_unobservables  0.0771      0.0772     0.0001     0.0776     0.0771
                   Indirect_mean Indirect_std Indirect_max Indirect_min
beta_gender1             -0.0003       0.0005       0.0000      -0.0149
beta_party1               0.0007       0.0009       0.0290       0.0000
beta_nchair1              0.0047       0.0068       0.2091       0.0000
beta_unobservables        0.0001       0.0002       0.0052       0.0000
\end{CodeOutput}
\end{CodeChunk}
%
The estimation results show the mean, standard deviation, maximum and
minimum for both direct and indirect effects.  Perhaps unsurprisingly, it
appears that the indirect effects are smaller than the direct
effects.\footnote{For additional details on the interpretation of the
estimated parameters of network models with peer effects see
\cite{LeSage:2009}, Chapter~2.7.}

As already shown, agents' centrality measures can be accessed using the
operator \code{\$} with \code{net\_dep}'s output, e.g.,~\code{lim_model_B\$centrality},
and it can be used for different
applications.  For example, we can use it to rank agents' positions in the
Congress social space, or to study the centrality distribution in the
Republican and the Democratic party, as we do in Figure \ref{fig:figure1}.
%
\begin{figure}[t!]
\centering
\includegraphics[width=0.72\textwidth, trim=0 5 0 5, clip]{Figure1}
\caption{\label{fig:figure1} BLP model: Distribution of Parameter-Dependent
Network Centrality.}
\end{figure}
%

\subsubsection{Exercise 2: Katz-Bonacich centrality with heterogeneous by
node parameter}

The code below estimates Model~\ref{BP_2} using gender as the relevant
dimension of heterogeneity.  Gender is a dummy variable which takes 1 if the
legislator is a female, and 0 otherwise.
%
\begin{CodeChunk}
\begin{CodeInput}
R> z <- as.numeric(as.character(db_model_A[, "gender"]))
R> f_het_model_A <- formula("PAC ~ party + nchair + isolate")
R> starting <- c(alpha = 0.44835, beta_party1 = 0.56004,
+    beta_nchair1 = -0.16349, beta_isolate1 = 0.21011,
+    beta_z = -0.26015, phi = 0.34212, gamma = -0.49960)
R> het_model_A <- net_dep(formula = f_het_model_A, data = db_model_A,
+    G = G_model_A, model = "model_A", estimation = "NLLS",
+    hypothesis = "het", z = z, start.val = starting)
R> summary(het_model_A)
\end{CodeInput}
\begin{CodeOutput}
Call:
Main Equation:  PAC ~ alpha * solve_block(I - G %*% (phi * I + 
gamma * G_heterogeneity)) %*% Ones + beta_party1 * party1 + 
beta_nchair1 * nchair1 + beta_isolate1 * isolate1 + beta_z * z
              Estimate Std. Error t value Pr(>|t|)    
alpha          0.44835    0.17942   2.499   0.0128 *  
beta_party1    0.56004    0.08342   6.713 6.21e-11 ***
beta_nchair1  -0.16349    0.18984  -0.861   0.3896    
beta_isolate1  0.21011    0.17927   1.172   0.2418    
beta_z        -0.26014    0.10655  -2.441   0.0150 *  
phi            0.34212    0.25377   1.348   0.1783    
gamma         -0.49960    0.34662  -1.441   0.1502    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC: 1042.34  loglik: -513.17
\end{CodeOutput}
\end{CodeChunk}
%
The results show that in this simple example $\gamma$\ is not significant,
suggesting that female Congress members do not have a different level of
influence than their male peers.  Note that the weighted Katz-Bonacich
centrality derived from the spillover effect are stored in the object
\code{het\_model\_A\$centrality}.

If we use Equation~\ref{BLP_3a} and~\ref{BLP_3b}, we distinguish between
outgoing and incoming influence, that is if females are more (or less) able
to influence and to be influenced by their peers respectively.  The code
below implements this analysis.  For ease of exposition, we do not consider
the possible endogeneity of the social network.
%
\begin{CodeChunk}
\begin{CodeInput}
R> z <- as.numeric(as.character(db_model_B[, "gender"]))
R> f_het_model_B <- formula("les ~ party + nchair")
R> starting <- c(alpha = 0.23952, beta_party1 = 0.42947,
+    beta_nchair1 = 3.09615, beta_z = -0.12749,
+    theta_0 = 0.42588, theta_1 = 0.08007)
R> het_model_B_l <- net_dep(formula = f_het_model_B, data = db_model_B,
+    G = G_model_B, model = "model_B", estimation = "NLLS",
+    hypothesis = "het_l", z = z, start.val = starting)
R> starting <- c(alpha = 0.04717, beta_party1 = 0.51713,
+    beta_nchair1 = 3.12683, beta_z = 0.01975,
+    eta_0 = 1.02789, eta_1 = 2.71825)
R> het_model_B_r <- net_dep(formula = f_het_model_B, data = db_model_B,
+    G = G_model_B, model = "model_B", estimation = "NLLS",
+    hypothesis = "het_r", z = z, start.val = starting)
R> summary(het_model_B_l)
\end{CodeInput}
\begin{CodeOutput}
Call:
Main Equation:  les ~ solve_block(I - (theta_0 * I - 
theta_1 * G_heterogeneity) %*% G) %*% (alpha * Ones + 
beta_party1 * party1 + beta_nchair1 * nchair1 + beta_z * z)
             Estimate Std. Error t value Pr(>|t|)    
alpha         0.22740    0.07584   2.998  0.00287 ** 
beta_party1   0.41382    0.10198   4.058 5.88e-05 ***
beta_nchair1  3.07797    0.26148  11.771  < 2e-16 ***
beta_z       -0.12749    0.21199  -0.601  0.54789    
theta_0       0.42587    0.07418   5.741 1.77e-08 ***
theta_1       0.08007    0.12320   0.650  0.51611    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC: 1343.68  loglik: -664.84
\end{CodeOutput}
\begin{CodeInput}
R> summary(het_model_B_r)
\end{CodeInput}
\begin{CodeOutput}
Call:
Main Equation: les ~ solve_block(I - G %*% (eta_0 * I - 
eta_1 * G_heterogeneity)) %*% (alpha * Ones + 
beta_party1 * party1 + beta_nchair1 * nchair1 + beta_z * z)
             Estimate Std. Error t value Pr(>|t|)  
alpha         0.04717    0.06867   0.687  0.49251    
beta_party1   0.51713    0.09259   5.585 4.13e-08 ***
beta_nchair1  3.12683    0.25680  12.176  < 2e-16 ***
beta_z        0.01976    0.15098   0.131  0.89595    
eta_0         1.02790    0.22400   4.589 5.85e-06 ***
eta_1         2.71825    0.91547   2.969  0.00315 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC: 1336.63  loglik: -661.31
\end{CodeOutput}
\end{CodeChunk}
%
The results shows that females do not seem to be able to influence their
socially connected peers ($\theta$ is not significant), but they are helpful
to their colleagues ($\eta_{1}>0$).  In terms of network analysis, this
implies that all else being equal, Congress members located close to female
colleagues benefit from their position since they can leverage females to be
more effective in their legislative activity.  In this case, the weighted
Katz-Bonacich centralities are stored in \code{het_model_B_l\$centrality}
and \code{het_model_B_r\$centrality}.

\vspace*{-0.15cm}

\subsubsection{Exercise 3: Katz-Bonacich centrality with heterogeneous by link parameter}

In this last exercise, we show how to explore the hypothesis that relations
within and between parties might have a different impact on the Congress
members' LES.  The code below estimates Model~\ref{BLP_4} where the
within and between effects are shaped by party membership.

\vspace*{-0.25cm}

\begin{CodeChunk}
\begin{CodeInput}
R> z <- as.numeric(as.character(db_model_B[, "party"]))
R> starting <- c(alpha = 0.242486, beta_gender1 = -0.229895,
+    beta_party1 = 0.42848, beta_nchair1 = 3.0959,
+    phi_within = 0.396371, phi_between = 0.414135)
R> party_model_B <- net_dep(formula = f_model_B, data = db_model_B,
+    G = G_model_B, model = "model_B", estimation = "NLLS",
+    hypothesis = "par", z = z, start.val = starting)
R> summary(party_model_B)
\end{CodeInput}
\begin{CodeOutput}
Call:
Main Equation:  les ~ solve_block(I - phi_within * G_within - 
phi_between * G_between) %*% (alpha * Ones + beta_gender1 * gender1 +
beta_party1 * party1 + beta_nchair1 * nchair1)
             Estimate Std. Error t value Pr(>|t|)    
alpha         0.24249    0.08988   2.698 0.007251 ** 
beta_gender1 -0.22990    0.14120  -1.628 0.104221    
beta_party1   0.42848    0.12623   3.394 0.000751 ***
beta_nchair1  3.09590    0.26008  11.904  < 2e-16 ***
phi_within    0.39637    0.07306   5.425 9.66e-08 ***
phi_between   0.41414    0.24175   1.713 0.087420 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC: 1344.1  loglik: -665.05
\end{CodeOutput}
\end{CodeChunk}
%
The estimation results show that connections within one's own party and
between parties are both significant for advancing a piece of legislation. 
The weighted Katz-Bonacich centrality can be found in the object
\code{party_model_B\$centrality}.

\subsection{Centrality measure comparison}

Network centrality measures adopt different criteria for ranking the
importance of an agent in a network.  As a result, it might be the case that
a network centrality measure robustly predicts how the individual's
importance in the network determine her/his outcome, while other measures of
centrality do not do so as well.  Therefore, one can expect that the outcome
of an agent may be significantly predicted by different centrality measures,
but only the measures which better explain the agents' outcome will remain
significant when these are included together in the same regression model,
while the others will not be distinguished from zero.

The \proglang{R}~package \pkg{econet} allows to evaluate the explanatory
power of parameter-dependent centralities relative to other centrality
measures, which depend on network topology only.  More specifically,
\pkg{econet} considers the following measures, which are computed using the
\proglang{R}~package \pkg{igraph} \citep{igraph}: indegree centrality,
outdegree centrality, degree centrality, betweenness centrality, incloseness
centrality, outcloseness centrality, and closeness centrality.  It also
reports eigenvector centrality, which is calculated using the \proglang{R}
package \pkg{sna} \citep{sna}.\footnote{Indegree centrality is the number of
incoming links of one node; outdegree centrality is the number of outgoing
links of one node; degree centrality is the sum of in and out degree;
betweenness centrality is the number of times a node falls on the shortest
path between two other nodes; incloseness centrality is the inverse of the
average distance of one node from all the other nodes passing through
incoming links; outcloseness centrality is the inverse of the average
distance of one node from all the other nodes passing through outcoming
links; closeness centrality is the sum of in and out closeness, and
eigenvector centrality is proportional to the sum of the centrality of
agent's neighbors \cite[see][for further details]{Jackson:2010}.} We then
implement an augmented version of Models~\ref{NLLS} and~\ref{BLP} where
we add one (or more) centrality measures in the matrix of individual
characteristics $X_{r}^\top.$ By doing so we can run an horse race across
different centrality measures.\footnote{It is worth noting that centrality
measures are computed assuming that the underlying network is fixed. 
However, as pointed out by \cite{An:2015b}, social networks are easily
malleable.  How robust the centrality measure is to the potential changes in
the network is worth further studying.}

The arguments of the function \code{horse\_race} are similar to the
\code{net\_dep} ones.  The different ones are: i) \code{centralities}, an
object or a vector of class `\code{character}' specifying the names of the
centrality measure(s) to be used; ii) \code{directed}, a logical object
which is set to \code{TRUE} if the network is directed, and \code{FALSE}
otherwise; iii) \code{weighted}, a logical object equal to \code{TRUE} if
links between agents have weights, and \code{FALSE} otherwise; and iv)
\code{normalization}, an object of class `\code{character}' which can be used to
normalize centrality measures before the estimations.  \footnote{The options
available are: \code{NULL}, no normalization; \code{bygraph}, divide by the
number of nodes in the network minus 1 (for degree and closeness) or the
number of possible links in the network (betweenness), \code{bycomponent},
divide by the number of nodes in agent's component minus 1 (for degree and
closeness) or the number of possible links in agent's component
(betweenness); \code{bymaxgraph}, divide by the maximum centrality value in
the network; \code{bymaxcomponent}, divide by the maximum centrality value
in agent's component.}

The output of this function is also similar to \code{net\_dep}, i.e.,~a list
containing the results of the main estimates in the first object, a
\code{data.frame} listing the centrality measures in the second object, and
the results of the first step estimation in the third object (if
\code{endogeneity = TRUE}).  An example of how this model is implemented and
results are stored is provided in the following example, where we use a
linear regression model where betweenness centrality is a regressor.
%
\begin{CodeChunk}
\begin{CodeInput}
R> starting <- c(alpha = 0.214094, beta_gender1 = -0.212706,
+    beta_party1 = 0.478518, beta_nchair1 = 3.09234,
+    beta_betweenness = 7.06287e-05, phi = 0.344787) 
R> horse_model_B <- horse_race(formula = f_model_B, 
+    centralities = "betweenness", directed = TRUE, weighted = TRUE, 
+    normalization = NULL, data = db_model_B, G = G_model_B,
+    model = "model_B", estimation = "NLLS", start.val = starting)
R> summary(horse_model_B, centrality = "betweenness")
\end{CodeInput}
\begin{CodeOutput}
Call:
Main Equation:  les ~ gender + party + nchair + betweenness
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  3.300e-01  9.042e-02   3.650 0.000295 ***
gender1     -8.904e-02  1.445e-01  -0.616 0.538011    
party1       7.054e-01  1.139e-01   6.194 1.36e-09 ***
nchair1      3.202e+00  2.644e-01  12.112  < 2e-16 ***
betweenness  1.851e-04  3.985e-05   4.645 4.51e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC: 1362.19  loglik: -675.09
\end{CodeOutput}
\begin{CodeInput}
> summary(horse_model_B)
\end{CodeInput}
\begin{CodeOutput}
Call:
Main Equation: les ~ solve_block(I - phi * G) %*% (alpha * Ones + 
beta_gender1 * gender1 + beta_party1 * party1 + 
beta_nchair1 * nchair1 + beta_betweenness * betweenness)
                   Estimate Std. Error t value Pr(>|t|)    
alpha             2.141e-01  7.676e-02   2.789  0.00552 ** 
beta_gender1     -2.127e-01  1.408e-01  -1.511  0.13162    
beta_party1       4.785e-01  1.105e-01   4.331 1.85e-05 ***
beta_nchair1      3.092e+00  2.593e-01  11.927  < 2e-16 ***
beta_betweenness  7.063e-05  4.561e-05   1.549  0.12219    
phi               3.448e-01  7.101e-02   4.856 1.68e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

AIC: 1341.61  loglik: -663.8
\end{CodeOutput}
\end{CodeChunk}
%
The results of this estimation show evidence that being able to broker
connections in the network, as measured by betweenness centrality in the
object \code{summary(horse_model_B)} is associated with higher legislative
effectiveness.  The effect disappears when we include network effects, that
is when we add betweenness centrality as an additional regressor in the
linear model of social interactions (Equation~\ref{BLP}).  This suggests
that Katz-Bonacich centrality is a more robust predictor of effectiveness in
this context.  Estimates are stored in the object
\code{horse\_estimate\_model\_B}, whereas centrality measures in
\code{horse_model_B\$centrality}.

\section{Conclusions}\label{sec:conclusion}

We have described the key elements for the estimation of parameter-dependent
centralities derived from equilibrium models of behavior, and discussed the
use of the package \pkg{econet} for the implementation of such metrics.  The
methods described in the paper are derived from several modifications to the
linear-in-means model -- for which both nonlinear least squares and maximum
likelihood estimators are provided -- and they allow one to model both link
and node heterogeneity in network effects, endogenous network formation and
the presence of unconnected nodes.  Furthermore, they provide the means to
compare the explanatory power of parameter-dependent network centrality
measures with those of standard measures of network centrality.  A number of
examples are used to walk the reader through the discussion and orientate
the application of these methods to new potential directions of research.

\bibliography{v102i08}

\end{document}
