1 Introduction

In many relational contexts, a set of events is observed, with each event involving an arbitrary number of actors and even a single actor. These events give rise to a so-called affiliation network in which there are two types of node: actors and events. Following the current literature, see Wang et al. (2009) among others, we refer to this structure as bipartite network, also known as two-mode network, contrarily to the one-mode network having a unique type of nodes. An example, which motivates the present paper, is that of academic articles in top statistical journals (Ji and Jin 2017), typically involving more than two authors but that might also be written by a single researcher. In these applications, the interest is in studying the relations between units, with the aim of modeling separately the tendency of each unit to be involved in an event, and the tendency of each pair of units to cooperate. We are also interested in studying the time evolution of social behaviors and thus the dynamics over time of both these tendencies. The dataset of statistical publications we aim to analyze has also a particular feature that is important for the following developments: the events are only partially ordered, since we know the year of publication of each article, but no order is available between articles published in the same year.

The literature on bipartite networks is mainly based on models having characteristics similar to those for one-mode networks in which direct connections are observed between certain pairs of actors, such as Exponential Random Graph Models (ERGMs; Frank and Strauss 1986; Wasserman and Pattison 1996); for a review see Snijders (2011) and Amati et al. (2018). One of the first model for the analysis of bipartite networks is proposed in Iacobucci and Wasserman (1990) for data with no temporal dimension and is based on an ERGM structure with specific effects for both types of node (i.e., actors and events) and strong assumptions of independence between the response variables. This approach was extended in several directions by Skvoretz and Faust (1999), whereas Wang et al. (2009) presented a flexible class of ERMGs for bipartite networks and related estimation methods. More recently, a review of models for this type of networks has been illustrated by Aitkin et al. (2014), including certain versions of the Rasch (1967) model and the latent class model (Goodman 1974). The approach proposed in the present paper is also related to models for the analysis of longitudinal one-mode networks, such as actor-oriented models (Snijders and van 1997; Snijders et al. 2010), dynamic ERGMs (Robins and Pattison 2001), hidden Markov models (Yang et al. 2011; Matias and Miele 2017; Bartolucci et al. 2018), and the models for relational events described in DuBois et al. (2013), Perry and Wolfe (2013), Butts and Marcum (2017), Stadtfeld et al. (2017), Fox et al. (2016), and Xia et al. (2016). All these mentioned works handle longitudinal data, but not of bipartite networks. The cited actor-oriented models, relational event models and, generally speaking, all models based on a point process approach deal with time-stamped data, whose dynamics is controlled by an intensity function. On the other hand, our approach, similarly to dynamic ERGMs, are based on time-discrete temporal data. Still, there are crucial differences in this regard between the proposed method and ERGM: in the latter the temporal dynamics is parameterized through a Markov transition matrix, whilst we do not require it, and we can also handle data only partially ordered, as below for articles published within the same calendar year.

Furthermore, we stress that, relative to ERGM models, our methodology is not based on network summary statistics: since we directly model the vector of allocations of subjects to events, two equal adjacency matrices can be probabilistically treated as different. This allows for instance to consider the joint absence from an event as a point of closeness between two subjects, whilst in ERGMs it is not possible to disentagle from the adjacency matrix bilateral or unilateral absence from an event. Finally, with our marginal modelling approach we can model parsimoniously only subsets of the whole allocation vectors.

In the current paper we therefore model bipartite temporal networks through a marginal approach on the related event allocation vectors, instead of through the association matrix, as more common in the literature. More formally, for the analysis of bipartite networks, and in particular for the dataset of publications in top statistical journals (Ji and Jin 2017), we represent each event by a vector of response variables \({\varvec{Z}}^{(e)}=(Z_1^{(e)},\ldots ,Z_n^{(e)})'\), with \(Z_i^{(e)}\) equal to 1 if unit i is involved in event e and 0 otherwise. First, our aim is to directly formulate a statistical model for the response vectors \({\varvec{Z}}^{(e)}\) having a meaningful interpretation. In particular, we rely on a marginal model (Bergsma and Rudas 2002; Bergsma et al. 2009) based on first- and second-order effects. The first-order effects correspond to the logit of the marginal distribution of each \(Z_i^{(e)}\) variable and represent the general tendency of actor i to be involved in event e. The second-order effects are the log-odds ratios ( Agresti 2013, Ch. 2) for the marginal distribution of each pair of variables \((Z_i^{(e)},Z_j^{(e)})'\), representing the tendency of actors i and j to be jointly involved in the same event e. However, as we show in detail in the sequel, this parameter may be directly interpreted as the tendency of i and j to cooperate. Moreover, even if we do not directly consider higher order effects, we do not pose any restrictions on these effects. At least to our knowledge, the use of marginal models for the analysis of social network data is new in the statistical literature.

Second, we pay particular attention to the parametrization of the above effects so as to account for the time evolution, and represent individual trajectories in terms of tendency to participate in an event and tendency to cooperate. This feature is common to latent growth models (Bollen and Curran 2006); however, in the proposed approach we use individual fixed parameters, rather than random parameters, applied to polynomials of time of suitable order. Then, the proposed approach is particularly appropriate when the interest is in the evaluation of the behavior of a single actor in terms of the tendencies mentioned above. The possibility to estimate fixed parameters is possible thanks to the amount of information that is typically huge in the applications of interest. For instance, in the motivating example, there are more than three thousand papers that play the role of events. For a related approach, based however on a simpler log-linear parametrization, see Bianchi et al. (2020); see also Bartolucci et al. (2023).

Third, in order to estimate the fixed individual parameters, we rely on a composite likelihood approach (Lindsay 1988; Varin et al. 2011), where, generally speaking, individual components of conditional or marginal densities are multiplied, whether or not independent, and the resulting derivative constitutes an unbiased estimating equation. Composite likelihood methods have found applications in various areas, such as in spatial statistics (Besag 1974, 1975; Heagerty and Lele 1998), in longitudinal and panel studies (Henderson and Shimakura 2003), for the estimation of time-varying correlation matrices (Pakel et al. 2021), to cite a few. In network analysis, Chen et al. (2018) propose a sequential composite likelihood approach to efficiently estimate social intercorrelations in large-scale social networks, whilst Bartolucci et al. (2015) and Asuncion et al. (2010) estimate via composite likelihood, respectively, a Hidden Markov Model for dynamic networks and an Exponential Random Graph Model. In the current proposal, we use a likelihood function based on the marginal distribution of every ordered pair of actors. For each of these pairs, the likelihood component directly depends on the first- and second-order effects described above, and on individual parameters referred to the two actors. Then, to maximize the target function, we propose a simple iterative algorithm with \(O(n^2)\) complexity, that is thus computationally tractable even if the number of actors is large. This is an important feature given the large scale of nowadays social network data; see also the discussion in Vu et al. (2013).

Forth, in presence of many statistical units, we show how to cluster them in groups that are homogenous in terms of tendency to be involved in an event or tendency to cooperate with other units. For this aim, we rely on a classification composite likelihood function that is related to that used for estimating the individual fixed parameters. This allows us to represent trajectories referred to homogeneous groups, rather than to individuals, so as to simplify the interpretation of the evolution of the data structure and of the social perspective of the phenomenon under study.

The paper is organized as follows. In the next section we describe assumptions and interpretation of the proposed approach. In Sect. 3 we outline the method of inference based on the use of fixed effects and clustering techniques. The properties of the proposed estimators are illustrated by a simulation study in Sect. 4, whilst the application is illustrated in Sect. 5. In the last section we draw main conclusions and outline some possible extensions, as the inclusion of third-order effects and of individual covariates. The estimation algorithm is implemented in a series of R functions that we make available to the reader upon request.

2 Proposed model

Let n denote the number of actors and r the number of observed relational events. Also let \(Z_i^{(e)}\) be a binary outcome equal to 1 if the relational event e involves unit i and to 0 otherwise, with \(i=1,\ldots ,n\) and \(e=1,\ldots ,r\). As already mentioned, these variables are collected in the column vector \({\varvec{Z}}^{(e)}=(Z_1^{(e)},\ldots ,Z_n^{(e)})'\), a generic configuration of which is denoted by \({\varvec{z}}=(z_1,\ldots ,z_n)'\). Moreover, let \(Y_{ij}^{(e)}\) be a binary variable equal to 1 if units i and j are involved in event e, and to 0 otherwise. Note that

$$\begin{aligned} Y_{ij}^{(e)}=Z_i^{(e)}Z_j^{(e)}, \end{aligned}$$
(1)

so that the set of variables \(Y_{ij}^{(e)}\) is function of the set of variables \(Z_i^{(e)}\); the vice-versa does not hold (\(Y_{ij}^{(e)}=0\) can be determined by \(Z_i^{(e)}=0\) or by \(Z_j^{(e)}=0\)), confirming that the direct analysis of the \(Y_{ij}^{(e)}\) leads, in general, to an information loss. About this point see also the discussion in Aitkin et al. (2014).

2.1 Marginal effects

The main issue is how to parametrize the distribution of the random vectors \({\varvec{Z}}^{(e)}\). We adopt a marginal parametrization (Bergsma and Rudas 2002; Bergsma et al. 2009) based on hierarchical effects up to a certain order. This parametrization is less common than the log-linear parametrization, adopted even in ERGMs (Frank and Strauss 1986; Wasserman and Pattison 1996), in which

$$\begin{aligned} \log \frac{p({\varvec{Z}}^{(e)}={\varvec{z}})}{p({\varvec{Z}}^{(e)}={\varvec{0}})}={\varvec{g}}({\varvec{z}})^{\prime }{\varvec{\gamma }}, \end{aligned}$$

for all configurations \({\varvec{z}}\) different from the null configuration \({\varvec{0}}\), where \({\varvec{g}}({\varvec{z}})\) is a vector-valued function depending on \({\varvec{z}}\).

Indeed, a marginal parametrization may be expressed on the basis of a sequence of log-linear parametrizations referred to the marginal distribution of selected subset of variables, that is,

$$\begin{aligned} \log \frac{p({\varvec{Z}}_{\mathcal{{M} }}^{(e)}={\varvec{z}}_{\mathcal{{M} }})}{p({\varvec{Z}}_{\mathcal{{M} }}^{(e)}={\varvec{0}}_{\mathcal{{M} }})}={\varvec{g}}_{\mathcal{{M} }}({\varvec{z}}_{\mathcal{{M} }})^{\prime }{\varvec{\gamma }}_{\mathcal{{M} }}, \end{aligned}$$
(2)

where \(\mathcal{{M}}\) is the set of indices of such variables and \({\varvec{Z}}_{\mathcal{{M} }}^{(e)}\) is the corresponding subvector of \({\varvec{Z}}^{(e)}\).

Log-linear parameters are defined as certain sums and differences of logarithms of cell probabilities. Marginal parameters are log-linear parameters calculated from marginal probabilities. In our approach, in particular, we rely on first- and second-order effects, specified, for all e: starting from \(\log p(Z_i^{(e)}=1) = \eta ^{(e)}_{{\cdot }} + \eta _i^{(e)}\) and \(\log p(Z_i^{(e)}=0) = \eta ^{(e)}_{{\cdot }} - \eta _i^{(e)}\) for \(i=1,\ldots ,n\), we can obtain

$$\begin{aligned} \eta _i^{(e)}=\log \frac{p(Z_i^{(e)}=1)}{p(Z_i^{(e)}=0)},\quad i=1,\ldots ,n, \end{aligned}$$
(3)

as a particular case of (2) with \(\mathcal{{M}}=\{i\}\). Also, from the parameterization

$$\begin{aligned} \left\{ \begin{array}{c}\log p(Z^{(e)}_i=1,Z^{(e)}_j=1) = \eta ^{(e)}_{{\cdot \cdot }} + \eta _{i\cdot }^{(e)} + \eta _{\cdot j}^{(e)} + \eta _{ij}^{(e)} \\ \log p(Z^{(e)}_i=0,Z^{(e)}_j=0) = \eta ^{(e)}_{{\cdot \cdot }} - \eta _{i\cdot }^{(e)} - \eta _{\cdot j}^{(e)} + \eta _{ij}^{(e)} \\ \log p(Z^{(e)}_i=1,Z^{(e)}_j=0) = \eta ^{(e)}_{{\cdot \cdot }} + \eta _{i\cdot }^{(e)} - \eta _{\cdot j}^{(e)} - \eta _{ij}^{(e)} \\ \log p(Z^{(e)}_i=0,Z^{(e)}_j=1) = \eta ^{(e)}_{{\cdot \cdot }} - \eta _{i\cdot }^{(e)} + \eta _{\cdot j}^{(e)} - \eta _{ij}^{(e)} \end{array}\right. , \end{aligned}$$

we obtain

$$\begin{aligned} \eta _{ij}^{(e)}=\log \frac{p(Z^{(e)}_i=0,Z^{(e)}_j=0)p(Z^{(e)}_i=1,Z^{(e)}_j=1)}{p(Z^{(e)}_i=0,Z^{(e)}_j=1)p(Z^{(e)}_i=1,Z^{(e)}_j=0)},\quad i,j=1,\ldots ,n,\, j\ne i, \end{aligned}$$
(4)

as a special case of (2) with \(\mathcal{{M}}=\{i,j\}\). In terms of interpretation (see also Bartolucci et al. 2007), we can easily realize that the marginal logit \(\eta _i^{(e)}\) is a measure of tendency of unit i to be involved in the e-th relational event. On the other hand, the log-odds ratio \(\eta _{ij}^{(e)}\) is a measure of the tendency of units i and j to cooperate with reference to the same e-th relational event.

To better interpret the \(\eta _{ij}^{(e)}\) effects, it is worth recalling that the log-odds ratio is a well-known measure of association between binary variables (Agresti 2013, Ch. 2), being 0 in the case of independence. In fact, an alternative expression for this effect is

$$\begin{aligned} \eta _{ij}^{(e)}= & {} \log \frac{p(Z^{(e)}_i=1|Z^{(e)}_j=1)}{p(Z^{(e)}_i=0|Z^{(e)}_j=1)}- \log \frac{p(Z^{(e)}_i=1|Z^{(e)}_j=0)}{p(Z^{(e)}_i=0|Z^{(e)}_j=0)}\nonumber \\= & {} \log \frac{p(Z^{(e)}_j=1|Z^{(e)}_i=1)}{p(Z^{(e)}_j=0|Z^{(e)}_i=1)}- \log \frac{p(Z^{(e)}_j=1|Z^{(e)}_i=0)}{p(Z^{(e)}_j=0|Z^{(e)}_i=0)}, \end{aligned}$$
(5)

corresponding to the increase in the logit of the probability that unit i (or j) is involved in the e-th event, given that unit j (or i) is present in the same event, with respect to the case the latter is not present. More details in this regard are provided in the following section.

Before illustrating how we parametrize in a parsimonious way the marginal effects defined above, it is worth recalling that, apart from the trivial case of \(n=2\) actors, the knowledge of these effects is not sufficient to obtain univocally the corresponding distribution of the vectors \({\varvec{Z}}^{(e)}\). To formulate this argument more formally, let \({\varvec{p}}^{(e)}\) denote the vector containing the \(2^n\) joint probabilities \(p({\varvec{Z}}^{(e)}={\varvec{z}})\) for all possible configurations \({\varvec{z}}\) in lexicographical order. Also let \({\varvec{\eta }}_1^{(e)}\) be the vector containing the first-order effects \(\eta _i^{(e)}\) for \(i=1,\ldots ,n\) and let \({\varvec{\eta }}_2^{(e)}\) denote the corresponding vector of second-order effects \(\eta _{ij}^{(e)}\) for \(i=1,\ldots ,n-1\) and \(j=i+1,\ldots ,n\). It is possible to prove that

$$\begin{aligned} {\varvec{\eta }}^{(e)}=\begin{pmatrix}{\varvec{\eta }}_1^{(e)}\\ {\varvec{\eta }}_2^{(e)}\end{pmatrix}={\varvec{C}} \log ({\varvec{M}}{\varvec{p}}^{(e)}) \end{aligned}$$
(6)

for a suitably defined matrix of contrasts \({\varvec{C}}\) and a marginalization matrix \({\varvec{M}}\) with elements equal to 0 or 1. However this relation is not one-to-one, in the sense that it is not possible to obtain a unique probability vector \({\varvec{p}}^{(e)}\) starting from \({\varvec{\eta }}^{(e)}\).

In order to have an invertible parametrization, the structure of higher order effects must be specified. Just to give the idea, a third-order marginal effect between units i, j, and k may be defined as

$$\begin{aligned} \eta _{ijk}^{(e)}= \eta _{ijk|1}^{(e)}-\eta _{ijk|0}^{(e)}, \end{aligned}$$

where

$$\begin{aligned} \eta _{ijk|z}^{(e)}=\log \frac{p(Z^{(e)}_i=0,Z^{(e)}_j=0|Z_k^{(e)}=z)p(Z^{(e)}_i=1,Z^{(e)}_j=1|Z_k^{(e)}=z)}{p(Z^{(e)}_i=0,Z^{(e)}_j=1|Z_k^{(e)}=z)p(Z^{(e)}_i=1,Z^{(e)}_j=0|Z_k^{(e)}=z)},\quad z=0,1. \end{aligned}$$

This is the difference between the conditional log-odds ratio for units i and j given that unit k is present with respect to the case it is not present. This directly compares to the triangularization effect in an ERGM, as it measures how much the presence of unit k affects the chance that units i and j collaborate. In a similar way we may recursively define effects of order higher than three until order n (Bartolucci et al. 2007), so that including the specification of these effects, the parametrization in (6) becomes invertible.

In the present approach, however, we prefer to focus only on first- and second-order effects as formulated in (3) and (4), leaving the structure of higher-order interactions unspecified. In fact, as we show in detail in Sect. 3, to make inference on these effects it is not necessary to specify the structure of these higher-order effects as we base inference on a pairwise likelihood function. This is an advantage of the marginal parametrization with respect to the log-linear parametrization; the latter does not allow us to directly express the marginal distribution of a subset of variables without specifying the full set of interactions. The way of obtaining each bivariate probability vector

$$\begin{aligned} {\varvec{p}}_{ij}^{(e)}=\begin{pmatrix} p(Z_i^{(e)}=0,Z_j^{(e)}=0)\\ p(Z_i^{(e)}=0,Z_j^{(e)}=1)\\ p(Z_i^{(e)}=1,Z_j^{(e)}=0)\\ p(Z_i^{(e)}=1,Z_j^{(e)}=1) \end{pmatrix} \end{aligned}$$
(7)

on the basis of the parameters \(\eta _{i}^{(e)}\), \(\eta _j^{(e)}\), and \(\eta _{ij}^{(e)}\), which are collected in the vector \({\varvec{\eta }}_{ij}^{(e)}=(\eta _{i}^{(e)},\eta _j^{(e)},\eta _{ij}^{(e)})^{\prime }\), is clarified in the Appendix.

2.2 Interpretation of the log-odds ratios

To clarify the interpretation of the log-odds ratio \(\eta _{ij}^{(e)}\), it is useful to consider that it directly compares with the logit of the probability that there is a connection between units i and j, in the sense that both units are involved in the same event. In fact, from (1) we have

$$\begin{aligned} \tilde{\eta }_{ij}^{(e)}=\log \frac{p(Y_{ij}^{(e)}=1)}{p(Y_{ij}^{(e)}=0)}=\log \frac{p(Z_i^{(e)}=1,Z_j^{(e)}=1)}{1-p(Z_i^{(e)}=1,Z_j^{(e)}=1)}, \end{aligned}$$

that is a commonly used effect in typical social network models; see, for instance, Hoff et al. (2002). There is an important difference between \(\tilde{\eta }_{ij}^{(e)}\) and \(\eta _{ij}^{(e)}\): the former corresponds to the tendency of units i and j to be involved in the same event, but it does not disentangle this joint tendency from the marginal tendency of each single unit to be involved in the same event. For instance, \(\tilde{\eta }_{ij}^{(e)}\) could attain a large value only because both units have, separately, a high tendency to be involved in the event (both authors are very active in their publication strategy) even if there is not a particular “attraction” between them, namely with low values of \(p(Z_i^{(e)}=0,Z_j^{(e)}=0)\). On the other hand, \(\eta _{ij}^{(e)}\) is a proper measure of attraction because, as clearly shown by (5), it corresponds to the increase in the chance that one unit is present in the event given that also the other unit is present in the same event. This difference between parameters \(\eta _{ij}^{(e)}\) and \(\tilde{\eta }_{ij}^{(e)}\) is key in the proposed approach and is of particular relevance in the application of our interest, where each event may involve a variable number of actors and, possibly, also only one actor. Indeed, in our approach the general tendency of unit i to be involved in an event is meaningfully measured by effects \(\eta _i^{(e)}\) defined in (3). Note that effects of this type cannot be directly included in an ERGM.

The above arguments may be further clarified considering that, for given marginal distributions \(p(Z_i^{(e)})\) and \(p(Z_j^{(e)})\), or, in other terms, for fixed \(\eta _i^{(e)}\) and \(\eta _j^{(e)}\), the log-odds ratio \(\eta _{ij}^{(e)}\) is an increasing function of \(p(Z_i^{(e)}=1,Z_j^{(e)}=1)=p(Y_{ij}^{(e)}=1)\) and thus of \(\tilde{\eta }_{ij}^{(e)}\). Moreover, for a given value of this joint probability, \(\eta _{ij}^{(e)}\) is a decreasing function of \(\eta _i^{(e)}\) and \(\eta _j^{(e)}\). In particular, considering that

$$\begin{aligned} p(Z_i^{(e)}=0,Z_j^{(e)}=0)= & {} 1-p(Z_i^{(e)}=1)-p(Z_j^{(e)}=1)+p(Z_i^{(e)}=1,Z_j^{(e)}=1),\\ p(Z_i^{(e)}=0,Z_j^{(e)}=1)= & {} p(Z_j^{(e)}=1)-p(Z_i^{(e)}=1,Z_j^{(e)}=1),\\ p(Z_i^{(e)}=1,Z_j^{(e)}=0)= & {} p(Z_i^{(e)}=1)-p(Z_i^{(e)}=1,Z_j^{(e)}=1), \end{aligned}$$

we can easily realize that

$$\begin{aligned} \frac{\partial \eta _{ij}^{(e)}}{\partial \tilde{\eta }_{ij}^{(e)}}=p(Z_i^{(e)}=1,Z_j^{(e)}=1)[1-p(Z_i^{(e)}=1,Z_j^{(e)}=1)] \sum _{z_1=0}^1\sum _{z_2=0}^1\frac{1}{p(Z_i^{(e)}=z_1,Z_j^{(e)}=z_2)}>0. \end{aligned}$$

Similarly, we have

$$\begin{aligned} \frac{\partial \eta _{ij}^{(e)}}{\partial \eta _i^{(e)}}=-p(Z_i^{(e)}=1)[1-p(Z_i^{(e)}=1)] \sum _{z_1=0}^1\sum _{z_2=0}^1\frac{1}{p(Z_i^{(e)}=z_1,Z_j^{(e)}=z_2)}<0, \end{aligned}$$

with a corresponding expression for \(\partial \eta _{ij}^{(e)}/\partial \eta _j^{(e)}\). An illustration of this behavior is provided in Fig. 1, where for a pair of individuals we represent the value of \(\eta _{ij}^{(e)}\) with respect to \(\eta _i^{(e)}\) (with \(\eta _j^{(e)}=-2\) and \({\tilde{\eta }}_{ij}^{(e)}=-4\)), to \(\eta _j^{(e)}\) (with \(\eta _i^{(e)}=-3\) and \({\tilde{\eta }}_{ij}^{(e)}=-4\)), and to \({\tilde{\eta }}_{ij}^{(e)}\) (with \(\eta _i^{(e)}=-3\) and \(\eta _j^{(e)}=-2\)).

Fig. 1
figure 1

Plot of the log-odds ratio \(\eta _{ij}^{(e)}\) with respect to \(\eta _i^{(e)}\) (solid line), \(\eta _i^{(e)}\) (dashed line), and \(\tilde{\eta }_{ij}^{(e)}\) (dotted line)

Another advantage of \(\eta _{ij}^{(e)}\) with respect to \(\tilde{\eta }_{ij}^{(e)}\) is that the former induces a variational independent parametrization (Bergsma and Rudas 2002b). This means that the joint distribution of \((Z_i^{(e)},Z_j^{(e)})'\) exists for any value in \(\mathbb {R}\) of the first-order effects \(\eta _i^{(e)}\) and \(\eta _j^{(e)}\) and of the second-order effect \(\eta _{ij}^{(e)}\). More formally, the function relating \({\varvec{\eta }}_{ij}^{(e)} = (\eta _{i}^{(e)},\eta _{j}^{(e)},\eta _{ij}^{(e)})'\) with \({\varvec{p}}_{ij}^{(e)}\) defined in (7) is one-to-one for \({\varvec{\eta }}_{ij}^{(e)}\) in \(\mathbb {R}^3\) and \({\varvec{p}}_{ij}^{(e)}\) in the four-dimensional simplex. This has advantages in terms of model interpretation and estimation. On the other hand, effect \(\tilde{\eta }_{ij}^{(e)}\) has a limited range of possible values with bounds depending on \(\eta _i^{(e)}\) and \(\eta _j^{(e)}\), making the joint estimation and interpretation of \(\tilde{\eta }_{ij}^{(e)}, \eta _i^{(e)}\) and \(\eta _j^{(e)}\), more problematic.

2.3 Parametrization of marginal effects

Formulating a model for relational events requires to parametrize, in a parsimonious way, the effects \(\eta _i^{(e)}\) and \(\eta _{ij}^{(e)}\) of main interest. In absence of individual covariates, we propose the following parametrization of the first-order effects:

$$\begin{aligned} \eta _i^{(e)}={\varvec{f}}_1(t_e)^{\prime }{\varvec{\alpha }}_i, \end{aligned}$$
(8)

where \({\varvec{f}}_1(t_e)\) is a vector-valued function specific of time \(t_e\) of each event e. For instance, this function may contain the terms of a polynomial of suitable order of the day or year of event e starting from the beginning of the study. This parametrization is similar to that of a latent trajectory model (Dwyer 1983; Crowder and Hand 1996; Menard 2002), with the difference that, as we clarify in the sequel, each \({\varvec{\alpha }}_i\) is considered as a vector of fixed individual parameters. In any case, it is possible to represent individual trajectories regarding the tendency over time to be present in an event.

Regarding the second-order effects, a natural extension of (8) would lead to a vector of specific parameters for each pair of units. However, to obtain a parsimonious model, we prefer to rely on an additive parametrization of type

$$\begin{aligned} \eta _{ij}^{(e)}={\varvec{f}}_2(t_e)^{\prime }({\varvec{\beta }}_i+{\varvec{\beta }}_j), \end{aligned}$$
(9)

where \({\varvec{f}}_2(t_e)\) is defined as \({\varvec{f}}_1(t_e)\) and, again, vectors \({\varvec{\beta }}_i\) represent the evolution of the tendency, of unit i, to collaborate across time. We use two different functions, \({\varvec{f}}_1(t_e)\) and \({\varvec{f}}_2(t_e)\), to allow for a different order of the involved polynomials of time. Note, however, that the additive structure in (9) implies that \({\varvec{f}}_2(t_e)^{\prime }{\varvec{\beta }}_i\) may be interpreted as the “general” tendency of individual i to collaborate with other individuals in an event at time \(t_e\).

Overall, for each bivariate probability vector \({\varvec{p}}_{ij}^{(e)}\), the parametrization based on (8) and (9) is linear in the parameters. In particular, if we let \({\varvec{\delta }}_i=({\varvec{\alpha }}_i^{\prime },{\varvec{\beta }}_i^{\prime })^{\prime }\), we have that

$$\begin{aligned} {\varvec{\eta }}_{ij}^{(e)}=\begin{pmatrix} {\varvec{D}}_{ij1}&{\varvec{D}}_{ij2} \end{pmatrix} \begin{pmatrix} {\varvec{\delta }}_i\\ {\varvec{\delta }}_j \end{pmatrix}, \end{aligned}$$
(10)

where \({\varvec{D}}_{ij1}\) and \({\varvec{D}}_{ij2}\) are suitable design matrices.

To clarify the proposed parametrization, consider a sample of \(n=9\) individuals for a single event, for different values of the intercepts \(\alpha _i\) (from -3 to -1) and of \(\beta _i\) (from -1 to 1). These values are reported in Table 1 together with certain average probabilities that help to understand the meaning of these parameters. The single \(2\times 2\) tables for each pair of actors are reported in Table 2.

Table 1 Parameters \(\alpha _i\) and \(\beta _i\) together with mean marginal probabilities and probabilities of two actors being involved in the same event
Table 2 Single \(2\times 2\) tables for each pair of actors considered in Table 1

3 Pairwise likelihood inference

Before introducing the proposed methods of inference for the model described above, we clarify the data structure used in applications. We start from the data

$$\begin{aligned} {\varvec{w}}_{ij}^{(e)}=\begin{pmatrix} I(z_i^{(e)}=0,z_j^{(e)}=0)\\ I(z_i^{(e)}=0,z_j^{(e)}=1)\\ I(z_i^{(e)}=1,z_j^{(e)}=0)\\ I(z_i^{(e)}=1,z_j^{(e)}=1)\\ \end{pmatrix}, \end{aligned}$$
(11)

where \(I(\cdot )\) is the indicator function, for \(i=1,\ldots ,n-1\), \(j=i+1,\ldots ,n\), and \(e=1,\ldots ,r\). In our motivating application, \({\varvec{w}}_{ij}^{(e)}\) denotes the joint participation of authors i and j in the scientific article e. For instance \({\varvec{w}}_{ij}^{(e)}=(0,1,0,0)'\) means that only author j is involved in article e, or \({\varvec{w}}_{ij}^{(e)}=(0,0,0,1)'\) that both authors are involved. Moreover, in the applications of interest, it is possible to group events that, by assumption, have the same distribution. For instance, in the application based on the academic articles published by statisticians, it is plausible to assume that for all articles published in the same year the distribution of the binary vector is the same, even because it is not possible to have the precise dates of the publication and thus their precise time order. In other words, it is sensible to group events in homogenous periods \(t=1,\ldots ,t(r)\), where t(e) denotes the time of event e. Then, the relevant information is that contained in the frequency vectors

$$\begin{aligned} \tilde{{\varvec{w}}}_{ij}^{(t)}=\sum _{e:t(e)=t}{\varvec{w}}_{ij}^{(e)}, \end{aligned}$$

and consequently we denote by \(\tilde{{\varvec{p}}}_{ij}^{(t)}\) the corresponding probability vector having the same structure as in (7).

3.1 Fixed-effects estimation

It is possible to estimate the parameters of interest by maximizing the pairwise log-likelihood function (Lindsay 1988; Varin et al. 2011):

$$\begin{aligned} p\ell ({\varvec{\theta }})=\sum _{i=1}^{n-1}\sum _{j=i+1}^n\sum _{e=1}^r[{\varvec{w}}_{ij}^{(e)}]^{\prime }\log {\varvec{p}}_{ij}^{(e)}, \end{aligned}$$

where \({\varvec{\theta }}\) is the vector of all such parameters, that is the collection of individual parameter vectors \({\varvec{\delta }}_i\), \(i=1,\ldots ,n\), used in (10). An alternative expression is

$$\begin{aligned} p\ell ({\varvec{\theta }})= & {} \sum _{i=1}^{n-1}\sum _{j=i+1}^n\ell _{ij}({\varvec{\delta }}_i,{\varvec{\delta }}_j),\nonumber \\ \ell _{ij}({\varvec{\delta }}_i,{\varvec{\delta }}_j)= & {} \sum _{t=1}^{t(r)}[\tilde{{\varvec{w}}}_{ij}^{(t)}]^{\prime }\log \tilde{{\varvec{p}}}_{ij}^{(t)}, \end{aligned}$$
(12)

which is faster to compute as it relies on the frequency vectors defined in (11).

In order to maximize \(p\ell ({\varvec{\theta }})\), it is important to obtain the score vector of each components \(\ell _{ij}({\varvec{\delta }}_i,{\varvec{\delta }}_j)\). To this aim, it is convenient to introduce the log-linear effects \(\tilde{{\varvec{p}}}_{ij}^{(t)}\) which are collected in the vector \(\tilde{{\varvec{\lambda }}}_{ij}^{(t)} = (\lambda _i^{(e)},\lambda _j^{(e)},\lambda _{ij}^{(e)})^{\prime }\), where

$$\begin{aligned} \lambda _i^{(e)}= & {} \log \frac{p(Z_i^{(e)}=1|Z_j^{(e)}=0)}{p(Z_i^{(e)}=0|Z_j^{(e)}=0)},\\ \lambda _j^{(e)}= & {} \log \frac{p(Z_j^{(e)}=1|Z_i^{(e)}=0)}{p(Z_j^{(e)}=0|Z_i^{(e)}=0)},\\ \lambda _{ij}^{(e)}= &\, {} \eta _{ij}^{(e)}, \end{aligned}$$

and e is any of the events at time occasion t. Also let \(\tilde{{\varvec{\eta }}}_{ij}^{(t)}\) denote the corresponding vector of marginal parameters. We have that

$$\begin{aligned} {\varvec{s}}_{ij}({\varvec{\delta }}_i):=\frac{\partial \ell _{ij}({\varvec{\delta }}_i,{\varvec{\delta }}_j)}{\partial {\varvec{\delta }}_i}=\sum _{t=1}^{t(r)}{\varvec{D}}_{ij1}^{\prime }\frac{{\varvec{\partial }}{\varvec{\eta }}_{ij}^{(t)}}{\partial [{\varvec{\lambda }}_{ij}^{(t)}]^{\prime }}{\varvec{G}}[\tilde{{\varvec{w}}}_{ij}^{(t)}-m_{ij}^{(t)}\tilde{{\varvec{p}}}_{ij}^{(t)}], \end{aligned}$$

where \(m_{ij}^{(t)}\) is the sum of the elements of \(\tilde{{\varvec{w}}}_{ij}^{(t)}\), that is, the number of events in time period t, whereas \({\varvec{G}}\) and the derivative of \(\tilde{{\varvec{\eta }}}_{ij}^{(t)}\) with respect to \(\tilde{{\varvec{\lambda }}}_{ij}^{(t)}\) are defined in Appendix.

The estimation algorithm is based on the following steps. First of all define an initial guess for the parameters \({\varvec{\delta }}_i\), denoted by \({\varvec{\delta }}_i^{(0)}\), \(i=1,\ldots ,n\). Then, for every unit i, find the values of \({\varvec{\delta }}_i\) such that

$$\begin{aligned} \sum _{j=1,\,j\ne i}^n{\varvec{s}}_{ij}({\varvec{\delta }}_i) = {\varvec{0}}, \end{aligned}$$

so as to maximize

$$\begin{aligned} p\ell _i({\varvec{\theta }})=\sum _{j=1,\,j\ne i}^n\ell _{ij}({\varvec{\delta }}_i,{\varvec{\delta }}_j), \end{aligned}$$
(13)

with respect to \({\varvec{\delta }}_i\), with all other parameters kept fixed. Iterate this process until convergence in \(p\ell ({\varvec{\theta }})\), and denote the final parameter estimates by \(\hat{{\varvec{\delta }}}_i=(\hat{{\varvec{\alpha }}}_i',\hat{{\varvec{\beta }}}_i')'\), \(i=1,\ldots ,n\), which are collected in the vector \(\hat{{\varvec{\theta }}}\). In practice, the algorithm steps may be implemented by using a readily available numerical solver.

3.2 Clustering

With large samples, it is typically of interest to find clusters of units presenting a similar behavior. In our approach this amounts to assume that there are \(h_1\) groups of individuals having a similar behavior in terms of tendency to be involved in an event and \(h_2\) groups of individuals having a similar tendency to collaborate in the network. For each group we have specific parameter vectors denoted by \({\varvec{\alpha }}_{g_1}^*\) and \({\varvec{\beta }}_{g_2}^*\), with \(g_1=1,\ldots ,h_1\) and \(g_2=1,\ldots ,h_2\), all collected in the parameter vector \({\varvec{\theta }}^*\).

For unit i, let \(d_{i1}\) denote the cluster to which the unit is assigned with respect to the first type of tendency and \(d_{i2}\) the cluster assigned with respect to the second type of tendency. The corresponding classification pairwise log-likelihood has the same expression as \(p\ell ({\varvec{\theta }})\) defined in (12), with \({\varvec{\alpha }}_i={\varvec{\alpha }}_{d_{i1}}^*\), \({\varvec{\beta }}_i={\varvec{\beta }}_{d_{i2}}^*\), and then \({\varvec{\delta }}_i=(({\varvec{\alpha }}_{d_{i1}}^*)^{\prime },({\varvec{\beta }}_{d_{i2}}^*)^{\prime })^{\prime }\). This function is denoted by \(cp\ell ({\varvec{\theta }}^*,{\varvec{d}}_1,{\varvec{d}}_2)\), where \({\varvec{d}}_1\) is the vector with elements \(d_{i1}\) and \({\varvec{d}}_2\) is that with elements \(d_{i2}\), respectively, with \(i=1,\ldots ,n\).

To cluster units in homogeneous groups, we maximize \(cp\ell ({\varvec{\theta }},{\varvec{d}}_1,{\varvec{d}}_2)\) by an iterative algorithm that is initialized from the output of a k-means clustering of the individual estimates \(\hat{{\varvec{\alpha }}}_i\) and \(\hat{{\varvec{\beta }}}_i\). Then, it alternates the following three steps until convergence:

  1. 1.

    for \(i=1,\ldots ,n\) try to change the cluster \(d_{i1}\) of unit i by finding the cluster that maximizes the individual component of the classification pairwise log-likelihood, which is defined as in (13) accounting for the cluster structure, with all other parameters kept fixed;

  2. 2.

    for \(i=1,\ldots ,n\) try to change the cluster \(d_{i2}\) of unit i by the same procedure as above;

  3. 3.

    update the parameter estimates of \({\varvec{\alpha }}_{g_1}^*\), \(g_1=1,\ldots ,h_1\), and \({\varvec{\beta }}_{g_2}^*\), \(g_2=1,\ldots ,h_2\), by maximizing \(cp\ell ({\varvec{\theta }}^*,{\varvec{d}}_1,{\varvec{d}}_2)\) with respect to \({\varvec{\theta }}^*\) with \({\varvec{d}}_1\) and \({\varvec{d}}_2\) kept fixed.

We select \(h_1\) and \(h_2\) as the smallest number of clusters such that the initial clustering of the estimates \(\hat{{\varvec{\alpha }}}_i\) and \(\hat{{\varvec{\beta }}}_i\), performed by the k-means algorithm, leads to a between sum of squares equal to at least 80% of the total sum of squares. Then, at convergence of the three steps illustrated above, we check that the number of clusters is adequate, comparing the maximum value of \(cp\ell ({\varvec{\theta }}^*,{\varvec{d}}_1,{\varvec{d}}_2)\) with that of \(p\ell ({\varvec{\theta }})\), as we show in connection with the application in the next section.

4 Simulation

As clarified in Section 2.1, the knowledge of main and second-order marginal effects, without knowing higher-order effects, is not enough to generate data from an assumed model. Moreover, even with a small number of actors, generating data is computationally unfeasible, apart from the case in which all second- and higher-order effects are null. Given these constraints, we now implement a study on simulated data that mimic the scientific publications data analyzed in the next section, to provide relevant indications on the properties of the proposed estimation methods.

The simulation study is based on a benchmark design (DESIGN 0) and on alternative designs, variations of the benchmark. Under DESIGN 0, we assume that \(r=250\) events involve an overall number of \(n=500\) individuals and cover \(T=5\) consecutive periods. The individual parameters measuring the tendency to participate to the event, denoted as \(\alpha _i\), can equally likely assume \(h_1=4\) possible values equally spaced from \(-5.0\) to \(-2.5\). To simplify the generation of the simulated data, the tendency to reciprocate, \(\beta _i\), are all equal to 0. The alternative simulation designs alter DESIGN 0, by using \(n=1,000\) rather than 500 (DESIGN 1), \(T=10\) rather than 5 (DESIGN 2), \(k=8\) rather than 4 clusters (DESIGN 3), and \(r=500\) events rather than 250 (DESIGN 4).

Under each design, 100 samples have been generated and the model based on a second-order time polynomial for both the tendency to connect and the tendency to reciprocate is assumed. This is the same model estimated in the following section to analyze the data on scientific publications. Estimation is based on both the fixed-effects method illustrated in Sect. 3.1 and the clustering estimation method illustrated in Sect. 3.2. The quality of these estimation methods is measured, for each sample, by the average across individuals of the absolute value of the difference between each parameter estimate and the corresponding true value. The average across samples of these sample-specific averages are reported in Table 3 for the fixed-effects estimation method and in Table 4 for the cluster estimation method, where the measures of errors are indicated by \(\textrm{err}(\hat{\alpha }_l)\) and \(\textrm{err}(\hat{\beta }_l)\) for \(l=1,2,3\). The clustering results in Table 4 also reports the value of the Adjusted Rand Index (ARI; Hubert and Arabie 1985) between the true and predicted clustering.

Table 3 Average measure of error of the fixed-effects estimator of every parameter under each simulation scenario
Table 4 Average measure of error of the estimation method based on clustering for every parameter under each simulation scenario

We note that as the number of actors n increases (DESIGN 1 vs 0), the average error of the fixed effects method does not change substantially because, as expected, the greater amount of information is compensated by the larger number of parameters to be estimated. On the contrary and coherently, the error level of the cluster estimation method strongly improves with an increase in the number of subjects. Also, we observe that as the number of time periods T increases (DESIGN 2 vs 0), both methods improve, since there is a higher data availability but no additional parameters to estimate, with a more pronounced gain in the performance of the clustering method. When the number of clusters \(h_1\) increases (DESIGN 3 vs 0) we observe that the fixed-effects method does not change its performance, whilst the cluster estimation method has a worse performance: the same amount of information is now split into a higher number of clusters, worsening the estimation performance of cluster-specific parameters. As a result, the ARI, while having satisfactory levels close to 1 in all other designs, in DESIGN 3 is significantly lower. Finally, as the number of events r increases (DESIGN 4 vs 0), both methods have a significant improvement in terms of performance: more occasions are available to learn marginal and joint tendencies of subjects to participate in the events.

5 Application

In order to illustrate the approach based on individual-specific effects, see assumptions (8) and (9), we propose an application based on the data recently made available by Ji and Jin (2017). The data refer to the publication history of all authors with at least one paper published in four top statistical journals (Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society - series B) between 2003 and the first half of 2012. Overall, 3607 authors are involved who coauthored 3248 articles. In Table 5 we report some descriptive statistics, whereas in Fig. 2 we represent the distribution of the number of articles and the number of coauthors for each individual in the dataset.

Table 5 Descriptive statistics on the number of authors per article
Fig. 2
figure 2

Distribution of the number of articles per author (left panel) and of the number of coauthors (right panel)

From Table 5 we observe that the number of published papers per year does not vary considerably, even if there is an increase from 2003 to 2009 and a decrease after 2009 (year 2012 counts only partially). Moreover, the average number of authors moderately increases during the time span and it is important to note that the number of single-author articles is relevant in each year. Indeed, these articles represent the 16.7% of the total articles considered in the dataset and this justifies the use of the proposed approach for the analysis. Regarding Fig. 2, we note the concentration of the number of articles, with the majority of authors (2335) who published only one article in one of the four top journals considered in the reference period and 520 who published only two articles. On the other hand, the three most productive researchers published 33, 40, and 82 articles, with a total of 155 overall (ignoring possible overlapping). Even the distribution of the number of coauthors shows a very high concentration, although the situation is somehow different as the third modality (i.e., 2 coauthors) has the highest frequency, equal to 970. The authors with zero and one coauthors are 154 and 842, respectively, whereas the three highest modalities are 81, 94 and 112.

The application is based on two phases, that is, fixed-effects estimation and clustering, which are illustrated in Sects. 3.1 and 3.2, respectively. The fixed-effect estimation method is executed in about 27 h and the clustering method in about the double, on an Intel(R) Core(TM) i7-8565U CPU @ 1.80 GHz 1.99 GHz machine. Regarding the first phase, with reference to (8) and (9), we assume second order polynomials for the effect of time. In this way we estimate 3607 parameter vectors \({\varvec{\alpha }}_i\) and \({\varvec{\beta }}_i\) of length 3. On the basis of these estimates it is possible to obtain trajectories both in terms of tendency to publish an article in a certain period and in terms of tendency to collaborate with other authors. We recall that, for the former, the effect that is represented is the logit defined in (3) and for the latter it is the log-odds ratio (4). The choice of a second order polynomial is driven by model parsimony, to have a reasonable low number of parameters with non-linear trajectories of tendencies to publish and collaborate. Still, we stress that the proposed method is thought more generally for \({\varvec{f}}_1\) and \({\varvec{f}}_2\) vector-valued functions of time, extendable, for instance and according to the application and the computational capacity, to higher-order polynomials or splines. In order to illustrate these results, we consider the five authors with the largest number of published articles in the period that we identify with the letters from A to E; the patterns of publication of these authors is reported in Table 6.

Table 6 Publication profiles of the five authors with the largest number of published papers in the period considered

For these top five authors, we represent the estimated profiles in Fig. 3 where we can clearly identify author E as the most productive one with a profile that follows a reverse U-shape, having its pick around years 2006 and 2007, coherently with the data in Table 6. On the other hand, this author shows the lowest profile in terms of tendency to collaborate with other authors, in accordance to a ratio between number of coauthors and number of published papers in the table which, for author E, tends to be lower than for the other main authors. The conclusion is that the large number of coauthors of author E can be mostly ascribed to his tendency to publish; see Sect. 2.2 for general comments on the interpretation of these results. In a similar way we can interpret the profiles of the other authors. For instance, authors C and D, who published the same number of papers (namely 40), have very similar profiles in terms of tendency to publish, but according to the proposed model, D has a higher tendency to collaborate, with a difference that also increases in time, and in particular the curve for D always dominates that for C; this is again coherent with the data in Table 6. Finally, authors A and B have profiles which are in agreement with a smaller number of published papers that, at the same time, tends to increase from 2003 to 2012.

Fig. 3
figure 3

Profiles in terms of tendency of authors to publish papers (left panel) and of collaborate (right panel), for the top five authors A, B, C, D, and E

To improve the interpretability of these profiles, instead of using the logits defined in (3), we can also express the tendency to publish in terms of expected number of publications per year. These expected values are obtained by multiplying the probability to be involved in a publication by the yearly number of publications available in Table 5. The resulting profiles are reported in Fig. 4 and confirm the previous conclusions.

Fig. 4
figure 4

Profiles of tendency of authors to publish papers, measured by the expected number of yearly publications, for the top five authors A, B, C, D, and E

When we examine the overall sample of 3607 authors, the analysis may be effectively carried out by building clusters of authors. Following the approach described in Sect. 3.2, we find evidence of \(h_1=6\) different profiles in terms of tendency to publish and \(h_2=5\) profiles in terms of tendency to collaborate. The model with clustered profiles attains a maximum profile log-likelihood (normalized dividing by the number of ordered pairs of units) equal to \(-\)34.916 that is close that of the fixed-effects method, which is equal to \(-\)33.693. On the other hand, the maximum pairwise log-likelihood with only one cluster is equal to \(-\)75.401. This means that using a structure of \(6\times 5\) clusters implies an improvement of the pairwise log-likelihood equal to 97.1% with respect of using only one cluster, despite the huge reduction in the number of parameters with respect to the fixed-effects model: rather than using n individual-specific parameter vectors \({\varvec{\alpha }}_i\) and \({\varvec{\beta }}_i\), we use a very limited number of parameter vectors denoted by \({\varvec{\alpha }}^*_g\) and \({\varvec{\beta }}^*_g\). The corresponding profiles are represented in Fig. 5.

Fig. 5
figure 5

Profiles in terms of tendency of authors to publish papers (left panel) and to collaborate (right panel), for clusters 1–5, obtained following the analysis in Section 3.2

It is interesting to note that the 6 cluster profiles in terms of tendency to publish cover different possibilities. In particular, profiles for the first 3 clusters have a reversed U-shape with picks located at different years. On the other hand, profiles for clusters 4 and 5 have a U-shape but are rather different. Finally, authors in cluster 6 tend to have a rather constant over time tendency to publish, which is higher than for the other clusters. In any case, the values of the logit even for this class corresponds to low probability levels, with an expected number of publications which is smaller than 1 for all years. In a similar way we can interpret the 5 clusters in terms of tendency to collaborate. For instance, individuals in the first cluster have a general tendency to collaborate lower than the other authors, which is rather constant in time.

It is important to stress that, in principle, any author may belong to any cluster of the first type (in terms of tendency to publish) and of the second type (in terms of tendency to collaborate). In order to better understand this aspect, we consider the cross classification of authors according to both criteria. This cross classification is reported in Table 7 that also shows the size of each cluster in terms of units assigned to it.

Table 7 Cross classification of authors in terms of tendency to publish and to collaborate

On the basis of the results in Table 7 we observe that all clusters have a comparable dimension without a neat prevalence in terms of size of a specific cluster, although in terms of tendency to publish cluster 3 is the largest and the same happens for the tendency to collaborate. Regarding the association between the two classification criteria, it is interesting to comment on certain regular patters that appear evident. The most relevant one is that individuals in cluster 6 in terms of tendency to publish are all in the first cluster in terms of tendency to collaborate. In summary, the authors with the highest tendency to publish have, at the same time, the smallest tendency to collaborate. As noted above, when commenting the results shown in Fig. 3 for author E, this means that, for the most productive authors, having a large number of coauthors is more plausibly ascribed to the tendency to publish than to a pure tendency to collaborate. Author E can be considered as a pivotal or representative author for this joint class.

Other patterns may be easily discovered by looking at Table 7 as, for instance, that authors in cluster 1 in terms of tendency to publish are mostly in cluster 3 in terms of tendency to collaborate. Comparing the two profiles we observe that they have an opposite shape, which is coherent with the previous reasoning according to which the tendency to have a large number of coauthors may be reasonably ascribed to the general tendency to publish than to a specific social behavior. This is likely due to the fact that scientific collaborations are viewed as long-term investments, and once a productive author establishes a team of researchers that effectively collaborate with each other, he/she tends to fully exploit these known scientific relations, with the aim of authoring even a large number of articles without changing the team. The whole analysis is repeated on a reduced dataset that excludes 2012, the partly available last year of the sample. The results are substantially equivalent to those reported above, and minor differences are commented in detail in the Supplementary Material.

6 Conclusions

We propose a new model for social networks arising from a sequence of events involving an arbitrary (one or more) number of actors. The main novelties, relative to available approaches, may be summarized as: (i) binary variables \(Z_i^{(e)}\) for actor i being involved in event e are directly modeled instead of the tie variables \(Y_{ij}^{(e)}=Z_i^{(e)}Z_j^{(e)}\); (ii) the model is based on marginal first- and second-order effects that have a meaningful interpretation in terms of tendency of an actor to participate in an event and tendency to cooperate; (iii) these effects are parametrized accounting for each event time and individual fixed-effects parameters, so that the evolution of individual behaviors may be represented by suitable trajectories; (iv) inference is based on a composite likelihood function, built on the distribution of each ordered pair of units, and maximized with numerical complexity of order \(O(n^2)\), n being the network size; (v) units may be clustered in groups having the same behavior so as to simplify the interpretation of the data structure. The results of a simulation study confirms the adequacy of the proposed estimation methods.

In conclusion, it is worth noting that the proposed approach may be potentially used in contexts different from our motivating example. In particular, we can formulate third- (or higher) order effects to account for triangularizations. This amounts to rely on a different composite likelihood function: a composite likelihood based on triples or a pairwise conditional likelihood, depending on the type of third-order effects used. However, triples would necessarily lead to a higher computational complexity.

Furthermore, individual covariates can be easily incorporated. In fact, we rely on a parametrization based on a linear predictor with suitable polynomials of event times. This linear predictor can include, in a natural way, individual covariates with no increase in complexity. However, covariates specific to each pair of units (not to single units) and to events, contribute to increase the computational burden.

Finally, our fixed-effect model can be naturally extended to random-effects and lends itself to a Bayesian formulation. By simply adding suitable priors on model parameters, the same inferential algorithm for finding the composite maximum likelihood estimate can be adopted to find maximum a posteriori estimates, with minor adjustments.