Abstract
We consider point estimation and inference based on modifications of the profile likelihood in models for dyadic interactions between n agents featuring agent-specific parameters. The maximum-likelihood estimator of such models has bias and standard deviation of order \(n^{-1}\) and so is asymptotically biased. Estimation based on modified likelihoods leads to estimators that are asymptotically unbiased and likelihood ratio tests that exhibit correct size.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A growing literature has uncovered the importance of interactions between agents through networks as drivers for economic and social outcomes. A leading approach to statistical modeling of dyadic interaction is through the inclusion of agent-specific parameters (see, e.g., Snijders 2011 for many references). A specific example that has received substantial attention in the recent literature is the \(\beta \)-model for network formation. There, agent fixed effects serve to capture degree heterogeneity in link formation and the inclusion of dyad-level covariates reflects homophily; see, e.g., Graham (2017), Jochmans (2018), and Dzemski (2019).
Estimation of fixed-effect models for dyadic data is non-standard as the number of parameters grows with the sample size in a similar manner as in the classic incidental-parameter problem for one-way panel data discussed in Neyman and Scott (1948). Under so-called dense-network asymptotics, common parameters in regular models can be consistently estimated, but inference is plagued by asymptotic bias; see (Fernández-Val and Weidner 2016, 2018) and Graham (2017), for examples, discussion, and approaches to bias-correct the point estimator.
In this paper we look at generic estimation problems for undirected dyadic data and consider inference based on modifying the likelihood function in the spirit of Pace and Salvan (2006) and Arellano and Hahn (2006, 2007). In its most general form, the modified likelihood is a bias-corrected version of the profile likelihood, that is, of the likelihood after having profiled-out the nuisance parameters. The adjustment is both general and simple in form, involving only the score and Hessian of the likelihood with respect to the nuisance parameters. The adjustment term removes the leading bias from the profile likelihood and leads to asymptotically unbiased inference and likelihood ratio statistics that are \(\chi ^2\)-distributed under the null. The form of the adjustment can be specialized by using the likelihood structure, as in DiCiccio et al. (1996).
We work out the modifications to the profile likelihood in a linear version of the \(\beta \)-model and (in appendix) in a linear version of the (Bradley and Terry 1952) model for paired comparisons. These simple illustrations give insight in how the adjustments work. We next apply them to the \(\beta \)-model in the simulation designs of Graham (2017). We find that both modifications dramatically improve on maximum likelihood in terms of bias and mean squared error as well as reliability of statistical inference, and that they are considerably more reliable than ex post bias-correction of the maximum-likelihood estimator.
2 Fixed-effect models for dyadic data
We consider data on dyadic interactions between n agents. For each of \(\nicefrac {n(n-1)}{2}\) distinct agent pairs (i, j) with \(i<j\) we observe the random variable \(z_{ij}\), which may be a vector. The density of \(z_{ij}\) (relative to some dominating measure) takes the form \( f(z_{ij};\vartheta ,\beta _i,\beta _j), \) where \(\vartheta \) and \(\beta _1,\ldots ,\beta _n\) are unknown Euclidean parameters. We may observe an outcome \(y_{ij}\) generated by pair (i, j) together with a vector of dyad characteristics \(x_{ij}\), in which case we have \(z_{ij}=(y_{ij}, x_{ij}^\prime )^\prime \), and we could consider the distribution of the outcome conditional on the covariates. In what follows, we take the \(z_{ij}\) to be (conditionally) independent. Models of this form are relevant in many areas. Examples include the analysis of network formation as mentioned before, but also the study of strategic behavior among agents (Bajari et al. 2010), and the construction of rankings (Bradley and Terry 1952). Our goal is to perform inference on \(\vartheta \) treating the \(\beta _i\) as fixed effects.
The log-likelihood is
where we let \(\beta =(\beta _1,\ldots ,\beta _n)^\prime \). For simplicity of exposition we ignore any normalization that may be needed on \(\beta \) to achieve identification. When a normalization of the form \(c(\beta )=0\) is needed, everything to follow goes through on replacing \(\ell (\vartheta ,\beta )\) by the constrained likelihood \(\ell (\vartheta ,\beta )-\lambda \, c(\beta )\), where \(\lambda \) denotes the Lagrange multiplier. We give a detailed example in appendix.
It is useful to recall that the maximum-likelihood estimator of \(\vartheta \) can be expressed as
where \({\hat{\ell }}(\vartheta ) = \ell (\vartheta ,{\hat{\beta }}(\vartheta ))\), with
is the profile likelihood.
Inference based on the profile likelihood performs poorly because the estimation noise in \({\hat{\beta }}(\vartheta )\) introduces non-negligible bias. Moreover, in regular settings,
so that bias and standard deviation are of the same order of magnitude. Consequently, the maximum-likelihood estimator is asymptotically biased.
2.1 Modified profile likelihood
In its simplest form, modified likelihoods can be understood as yielding a superior approximation to the target likelihood
Moreover, the profile likelihood is the sample counterpart to this infeasible likelihood. Replacing \(\beta (\vartheta )\) with \({\hat{\beta }}(\vartheta )\) introduces bias that leads to invalid inference. To see this suppose that
where we introduce
An expansion of the profile likelihood around \(\beta (\vartheta )\) yields
Combining the two expansions and taking expectations then shows that the bias of the profile likelihood is of the form
for
the variance of \(V(\vartheta )\).
Equation (2.1) is a conventional asymptotically linear representation of the estimator of the fixed effects; see, e.g., Rilstone et al. (1996). Low-level conditions for it to go through in specific models are provided in (Fernández-Val and Weidner 2016, 2018). The difficulty in the current case, as opposed to say the one-way panel data model (as dealt with in Hahn and Newey 2004), is to handle the non-sparse nature of the Hessian matrix.
With (2.2) in hand, a modified likelihood is
where we define the plug-in estimators
for matrices
and
In large samples, this modification removes the leading bias from the profile likelihood. Consequently, in large samples, the likelihood ratio statistic has correct size and
will have bias \(o(n^{-1})\). Furthermore, under usual regularity conditions, we have the limit result
as \(n\rightarrow \infty \), where we let
be the Fisher information for \(\vartheta \).
The only point at which the likelihood setting has been used so far is in the statement of the limit distribution of \({\dot{\vartheta }}-\vartheta \), where the expression for the asymptotic variance exploits the information equality. Bias-corrected estimation—using the same formula for the bias as before—thus carries over to more general extremum-type estimation problems; the only change being that, now, the asymptotic variance is \( I(\vartheta )^{-1} \Omega (\vartheta ) \, I(\vartheta )^{-1}. \)
Alternatively, following the arguments in Arellano and Hahn (2007) we can exploit the likelihood structure to get
which validates the alternative modified likelihood
see DiCiccio et al. (1996). Its maximizer, say \(\ddot{\vartheta }\), satisfies the same asymptotic properties as \({\dot{\vartheta }}\).
2.2 Illustration: a linear \(\beta \)-model
Consider the following extension of the classic many normal means problem of Neyman and Scott (1948). Data are generated as
and are independent across dyads. The likelihood function for all parameters (ignoring additive constants) is
Its first two derivatives with respect to the \(\beta _i\) are
and
Let \({\tilde{z}}_i=(n-2)^{-1}\sum _{j>i} z_{ij}+(n-2)^{-1}\sum _{j<i} z_{ji}\) and \({\overline{z}}=(2(n-1)^{-1}\sum _{i=1}^n {\tilde{z}}_i\). Solving for the maximum-likelihood estimator of \(\beta _i\) gives \( {\hat{\beta }}_i = {\tilde{z}}_i - {\overline{z}} \) for any \(\vartheta \). The profile likelihood is therefore
and its maximizer is
Some tedious but straightforward calculations yield
which confirms that the maximum-likelihood estimator of \(\vartheta \) suffers from asymptotic bias. Moreover,
as \(n\rightarrow \infty \).
To set up the modified likelihood, first note that
and that
It is then easily seen that
From this we obtain
and its maximizer
Clearly, this estimator removes the leading bias from the maximum-likelihood estimator. Moreover,
which shows that the remaining bias in the point estimator is small relative to its standard deviation.
As an alternative correction, we may exploit the likelihood structure to adjust the profile likelihood by the term
where c is a constant that does not depend on \(\vartheta \). This yields the modification
whose maximizer satisfies
This estimator is exactly unbiased.
To give an idea of the magnitude of the bias in this problem, Table 1 contains the bias and standard deviation of the estimators \({\hat{\vartheta }}\), \({\dot{\vartheta }}\), and \(\ddot{\vartheta }\) for various sample sizes n and variance parameter fixed to \(\vartheta =1\). These results are invariant to the value of the \(\beta _i\) and can be interpreted as relative bias for general values of \(\vartheta \).
3 Application to the \(\beta \)-model
The \(\beta \)-model of network formation models Bernoulli outcome variables as having success probability
where \(F(a)=(1+e^{-a})^{-1}\) is the logit link function. We now present the results from a Monte Carlo experiment. The designs are borrowed from Graham (2017). All designs are of the following form. Let \(u_i\in \lbrace -1, 1 \rbrace \) so that \({\mathbb {P}}(u_i=1) = \frac{1}{2}\). We generate the dyad covariate as
and the fixed effects as
where \(v_i\sim \textrm{Beta}(\lambda _1,\lambda _2)\). We set \(\mu = - \lambda _1 (\lambda _1+\lambda _2)^{-1}\), so that \(\mu +v_i\) has mean zero and will consider several choices for the parameters \((\gamma _1,\gamma _2)\) and \((\lambda _1,\lambda _2)\). The parameter choices are summarized in Table 2. In the first four designs (A1–A4), the \(\beta _i\) are drawn independently of \(x_{ij}\) from symmetric Beta distributions. In the next four designs (B1–B4) the \(\beta _i\) are generated from skewed distributions that depend on \(u_{i}\) (and thus correlate with the regressor \(x_{ij}\)). For both the designs labeled A and B, the average number of observed links per agent goes down as we move from the first design (A1 and B1) to the fourth design (A4 and B4). The average number of links decreases from about \(50\%\) to \(12\%\). This is clear from the second block of Table 2, which contains the average, minimum, and maximum number of links per agent (in percentages).
We simulate 10, 000 data sets for each design for \(n\in \lbrace 25, 50, 75, 100\rbrace \) and \(\vartheta =1\). Because the results across designs are qualitatively very similar, we present the full set of results only for Design A1 (Table 3). Tables 4 and 5 provide the results for \(n\in \lbrace 50,100 \rbrace \) for all designs. Each table contains the mean and median bias of \(\vartheta \), \({\dot{\vartheta }}\), and \(\ddot{\vartheta }\), along with their standard deviation and their interquartile range (both across the Monte Carlo replications). The tables also provide the empirical size of the likelihood ratio test for the null that \(\vartheta =1\) for theoretical size \(\alpha \in \lbrace .05,.10 \rbrace \). Inference results based on the Wald statistic, using a plug-in estimator of \(I(\vartheta )\), are very similar and not reported for brevity.
Because the results for \(n=100\) can be compared (up to Monte Carlo error) to the numerical results collected in Graham (2017, Table 2), Table 5 contains two additional columns in which we reproduce the results for his analytically bias-corrected maximum-likelihood estimator (\({\hat{\vartheta }}_{\textrm{BC}}\)) and his ‘tetrad logit’ estimator (\({\hat{\vartheta }}_{\textrm{TETRAD}}\)). The latter is based on moment conditions that are free of \(\beta _i\) using a sufficiency argument. Bias-correcting \({\hat{\vartheta }}\) does not salvage the likelihood ratio statistic, and the conditional likelihood function of the ‘tetrad logit’ estimator is a quasi-likelihood and, therefore, does not satisfy the information equality. Hence, the results on size for these two estimators are based on the Wald statistic.
Table 3 clearly shows that both the bias and standard deviation of \({\hat{\vartheta }}\) are of order \(n^{-1}\). Consequently, the likelihood ratio test is size distorted even in large samples. Point estimation through the modified likelihoods gives estimators with small bias relative to their standard error. Even for \(n=25\), the bias is only about \(20\%\) of the bias in maximum likelihood estimator. In larger samples, the estimators are essentially unbiased. Both \({\dot{\vartheta }}\) and \(\ddot{\vartheta }\) are also less volatile than is \({\hat{\vartheta }}\). This phenomenon has been observed elsewhere; we refer to Schumann et al. (2022). Thus, here, bias-correction does not come at the cost of an increase in dispersion. Together with the substantial decrease in mean squared error, inference, too, improves dramatically. The likelihood ratio statistics for \({\dot{\ell }}(\vartheta )\) and \(\ddot{\ell }(\vartheta )\) have near-theoretical size for all n.
To give a more complete picture on inference via modifying the profile likelihood Fig. 1 presents power curves for the likelihood ratio statistic that go along with Table 3. The curves for \({\hat{\ell }}(\vartheta )\) (solid lines) are symmetric but not correctly centered, reflecting the fact that they are size distorted. This is so for all sample sizes and significance levels considered. Modifying the likelihood shifts the power curve so that the likelihood ratio test is (approximately) size correct. This is done without significantly altering the shape of the power curves. For the smallest sample size considered (\(n=25\); upper two plots) there is a small difference in power between the likelihood ratio test for \({\dot{\ell }}(\vartheta )\) (dashed lines) and \(\ddot{\ell }(\vartheta )\) (dashed-dotted lines); the former has slightly higher power than the latter for alternatives \(\vartheta >1\) and slightly less power for \(\vartheta <1\). This difference vanished rapidly as n increases, however, which is in line with the similar performance of both corrections observed in Table 3.
Tables 4 and 5 show that all conclusions from Design A1 carry over to the other designs. Moreover, the introduction of correlation between regressors and heterogeneous coefficients or skewing the distribution from which the latter are drawn does not prevent the modified likelihood to improve on maximum likelihood both in terms of point estimation and inference. A comparison of the two tables clearly shows that both the bias and standard deviation of \({\hat{\vartheta }}\) shrink by a factor of one half as n doubles, again illustrating that both are of order \(n^{-1}\). The subsequent reduction in bias by considering \({\dot{\vartheta }}\) and \(\ddot{\vartheta }\) and improvement in size are manifested for all designs.
Table 5 further shows that the modified-likelihood approach outperforms bias-correction of the maximum-likelihood estimator in Designs A3 and B3 and, in particular, in Designs A4 and B4. There, bias-correction of maximum likelihood introduces rather substantial additional bias relative to \({\hat{\vartheta }}\). The additional bias also leads to a large deterioration of the empirical size of the Wald statistic associated with \({\hat{\vartheta }}_{\textrm{BC}}\), with actual sizes ranging up to seven times the nominal size. This type of sensitivity of analytical bias-correction has equally been observed in panel data applications; see (Dhaene and Jochmans 2015) and Higgins and Jochmans (2023). The performance of the modified likelihood is comparable to Graham’s ‘tetrad logit’ estimator \({\hat{\vartheta }}_{\textrm{TETRAD}}\) in terms of bias, and it tends to be somewhat more accurate in terms of the empirical size of the associated hypothesis tests. Moreover, inference based on the ‘tetrad logit’ estimator is conservative in all designs even though, with \(n=100\) and therefore 4, 950 dyadic observations, the sample size is large. In addition, the ‘tetrad logit’ estimator is computationally prohibitive in large networks.
References
Arellano M, Hahn J (2006) A likelihood-based approximate solution to the incidental parameter problem in dynamic nonlinear models with multiple effects. Unpublished manuscript
Arellano M, Hahn J (2007) Understanding bias in nonlinear panel models: some recent developments. In: Blundell RW, Newey WK, Persson T (eds) Advances in economics and econometrics, vol III. Cambridge University Press, Cambridge
Bajari P, Hong H, Nekipelov D (2010) Estimating static models of strategic interactions. J Bus Econ Statistics 28:469–482
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs: I the method of paired comparisons. Biometrika 39:324–325
Dhaene G, Jochmans K (2015) Split-panel jackknife estimation of fixed-effect models. Rev Econ Stud 82:991–1030
DiCiccio TJ, Martin MA, Stern SE, Young A (1996) Information bias and adjusted profile likelihoods. J Royal Statistical Soc Ser B 58:189–203
Dzemski A (2019) An empirical model of dyadic link formation in a network with unobserved heterogeneity. Rev Econ Statistics 101:763–776
Fernández-Val I, Weidner M (2016) Individual and time effects in nonlinear panel models with large \(N, T\). J Econom 192:291–312
Fernández-Val I, Weidner M (2018) Fixed effect estimation of large-\(t\) panel data models. Ann Rev Econom 10:109–138
Graham BS (2017) An econometric model of link formation with degree heterogeneity. Econometrica 85:1033–1063
Hahn J, Newey WK (2004) Jackknife and analytical bias reduction for nonlinear panel models. Econometrica 72:1295–1319
Higgins A, Jochmans K (2023) Bootstrap inference in fixed-effect models. TSE Working Paper 22–1328
Jochmans K (2018) Semiparametric analysis of network formation. J Bus Econom Statistics 36:705–713
Neyman J, Scott E (1948) Consistent estimates based on partially consistent observations. Econometrica 16:1–32
Pace L, Salvan A (2006) Adjustments of profile likelihood from a new perspective. J Statistical Plann Inference 136:3554–3564
Rilstone P, Srivastava VK, Ullah A (1996) The second-order bias and mean squared error of nonlinear estimators. J Econ 75:369–395
Schumann M, Severini TA, Tripathi G (2022) The role of information bias in panel data likelihoods. Forthcoming in J Econom
Simons G, Yao Y-C (1999) Asymptotics when the number of parameters tends to infinity in the Bradley-Terry model for paired comparisons. Ann Statistics 27:1041–1060
Snijders TAB (2011) Statistical models for social networks. Ann Rev Sociol 37:129–151
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Financial support from the European Research Council, Grant ERC-2016-STG-715787, and from the French Government and the ANR under the Investissements d’ Avenir program, Grant ANR-17-EURE-0010 is gratefully acknowledged. The guest editor and referee gave much appreciated comments on an earlier version of this paper.
Appendix: A linear Bradley–Terry model
Appendix: A linear Bradley–Terry model
As an alternative to the specification of the Neyman and Scott (1948) model with complementarities, now suppose that
independently across dyads. This model is overparameterized as, clearly, the mean of the \(\beta _i\) is not identified. A common normalization in this type of model is \(\sum _{i=1}^n \beta _i=0\) (Simons and Yao 1999), and we will maintain it here. The constrained likelihood is
where \(\lambda \) is the Lagrange multiplier for our normalization constraint. The first-order condition for the constrained problem for \(\beta _i\) for a given \(\vartheta \) equals
This gives
for all i and any \(\vartheta \). Observe that the sign of \({\hat{\beta }}_i\) is driven by the comparison of the magnitudes of \(\sum _{i<j} z_{ij}\) and \(\sum _{i>j} z_{ji}\). Also note that \(\sum _{i=1}^n {\hat{\beta }}_i=0\) holds. We therefore have
and with it, the maximum-likelihood estimator
A calculation shows that \({\mathbb {E}}({\hat{\vartheta }}-\vartheta ) = -2n^{-1}\vartheta \).
It is immediate that
and that
Therefore,
The corresponding estimators are
Both remove the leading bias from the maximum-likelihood estimator, as
but, in this case, neither is exactly unbiased. The first estimator has bias that is strictly negative (for any finite n). The second estimator overcorrects and has strictly positive bias. The second-order bias is monotone in n. We have
for all \(n> 7\). As \(n\rightarrow \infty \),
and \(\Vert \ddot{\vartheta }-{\dot{\vartheta }} \Vert = o_p(n^{-1})\); that is, the two modifications to the likelihood yield asymptotically equivalent estimators.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jochmans, K. Modified-likelihood estimation of fixed-effect models for dyadic data. SERIEs 14, 417–433 (2023). https://doi.org/10.1007/s13209-023-00284-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13209-023-00284-0