Introduction

Research collaboration has been continuously growing in academia to increase scientific productivity, to share research costs, and to achieve new knowledge and interdisciplinary skills. Despite being predominantly found in the fields of science, technology, engineering, and math (STEM), it has been gaining relevance even in areas which have historically been less cooperative, such as the humanities and social sciences (Dahlander & McFarland, 2013; Wuchty et al., 2007).

As research collaboration becomes the norm, the study of the social networks of scientific communities has gained importance. Structural and relational studies analyze how individuals, communities and institutions interact and influence one another (Blau, 2017; S. P. Borgatti & Cross, 2003; Burt, 1992; Granovetter, 1973; Marsden, 1990; M. McPherson et al., 2001; Uzzi, 1997). Furthermore, the advancement of network analysis methods has given rise to several studies that explored the behaviors of collaborative academic communities (Barabási et al., 2002; S. Borgatti et al., 2009; De Montjoye, Stopczynski, Shmueli, Pentland, & Lehmann, 2014; Ding, 2011; Katz, 1994; Lee & Bozeman, 2005; Newman, 2000; Newman & Park, 2003; Zhang et al., 2018). However, a lot is yet to be learned as research collaborations are constantly evolving and researchers have not yet been able to unveil all the complexities of these networks, such as addressing how academic endogamy impacts research collaboration networks.

How are scientific collaboration groups structured in universities where so many scholars share the same alma mater? Endogamy is not limited to a few countries, it is found both in developed and developing countries. It affects both established and new institutions, which are often structurally constrained by their pool of applicants due to induced homophily, or occasionally even nepotism-choice homophily (Kossinets & Watts, 2009). Uncovering the type of research relationships that are built in an environment with high endogamy can be vital to understand the underlying causes of success and failure of scientific productivity in an array of institutions worldwide.

At first glance, a group in which members have similar characteristics might seem positive, given that they share local boundaries and have viewpoints that are more likely to converge. However, scientific work is not always performed by a homogeneous group of researchers. On the contrary, nowadays the process of coming up with innovative ideas requires crossing scientific boundaries such that a diverse range of actors and fields of knowledge can intersect (Cummings & Kiesler, 2005; Star & Griesemer, 1989). Networks with high homophily are characterized by strong provincial ties and vast numbers of links to redundant contacts, which result in a flow of repetitive information. Besides, clustered network structures, with their resulting lack of opportunities to contact external actors, may invariably limit the construction of new ideas (Burt, 1992, 2004; Granovetter, 1973; M. McPherson et al., 2001; Michelfelder & Kratzer, 2013). Despite the dilemma of local cohesion, it is in weak ties that members of homogeneous groups find some of their greatest opportunities for building and trading new ideas. Also known as structural holes, these network opportunities allow for well-positioned players to build bridges connecting actors from distinct clusters, thereby providing faster and more direct access to unique information (S. Borgatti et al., 2009; Burt, 1992, 2004; Granovetter, 1973, 1983; Hansen, 1999).

Elite research universities in Brazil hire a significant number of their own alumni as scholars among their faculty members. Brazil’s most extreme case, the University of São Paulo (USP) has 70% of its faculty members hired from its own alumni pool. USP is not only the country’s most prestigious and affluent university, but it is also commonly recognized among the top three universities in Latin America (Times Higher Education, 2021; QS World University Ranking, 2022). Moreover, it boasts the largest student enrollment of all Brazil’s public universities and is considered the main birthplace of Brazilian professors. These characteristics make the USP an interesting case to analyze from a network analysis angle.

This study aims to shed light on how scientific collaborative communities are structured in elite research universities with high levels of academic endogamy. It further aspires to understand the dynamics of local scientific networks and their changes over time. Is such a homogeneous academic setting open to newcomers? Are scholars in those settings more responsive to further homophily?

Literature review

Social connections are more likely to occur among individuals who are alike, that is, who share physical attributes or have a similar educational level and socioeconomic background (Dahlander & McFarland, 2013; J. M. McPherson & Smith-Lovin, 1987; M. McPherson et al., 2001; Ruef, Aldrich, & Carter, 2003; Smith et al., 2016). The literature has focused on academic endogamy, a type of homophily that takes place in higher education institutions, with the hiring of faculty members that are also alumni to that school (Blau, 1973; Dutton, 1980; Hargens & Farr, 1973; McGee, 1960; Smyth & Mishra, 2014). What means to be an alumni may vary from author to author, although many focus on the most recent degree, usually the PhD degree (Delamont & Atkinson, 2001).

The literature is interested in these types of connections usually due to their possible impact on scientific productivity (Eisenberg & Wells, 2000; Horta, 2013; Horta et al., 2010; Inanc & Tuncer, 2011; Yudkevich & Sivak, 2012). However, it considers endogamy a byproduct of both individual choice and structural constraints (Kossinets & Watts, 2009; M. McPherson et al., 2001).

This provides an interesting case to be investigated with tools based on network analysis, especially Granovetter’s Strength of Weak Ties theory. According to Granovetter (1983), weak ties allow for a wider diffusion of information as they can reach individuals connected to other social networks, whereas strong ties are more likely to bond similar individuals, thereby limiting the spread of information to its own cluster. This means that outsiders or non-inbred scholars could be a local source of non-redundant connections and leading to a more effective information diffusion in an environment where so many scholars share the same contacts.

This discussion is far from being consensual, however. Many have advocated for the relevance of strong ties, associating them with team excellence, stronger information diffusion patterns and the likelihood of change due to the familiarity these members already have with each other (Brown & Reingen, 1987; De Montjoye et al., 2014; Krackhardt, D., Nohria, N., & Eccles, 2003; (Rawlings et al., 2015, p.1717). In a more appeasing tone, others suggest that a balance between strong and weak ties can be optimal for information exchange and creativity (Michelfelder & Kratzer, 2013; Zhou et al., 2009).

Burt (1992, 2004) also defends the role of weak ties in social networks. He argues that the similarity of ideas and attitudes within the group, which occurs with redundant contacts may reduce opportunities for the exchange of new knowledge. This highlights the role of bridge-builders, and their ability of obtaining new ideas that can be shared with other members in their own cluster. In other words, they are a point of access for new information.

The literature has given its attention to scientific collaboration and co-authorship. There are studies that focus on comparisons between fields, concluding that STEM areas tend to show more cooperative efforts (Dahlander & McFarland, 2013; Wuchty et al., 2007), but also studies that focus on research impact and productivity (Li et al., 2013, Bordons et al., 2015) or even the effects of the Covid-19 pandemic on co-authorship networks (Sachini et al., 2021).

Data sources differ, however. Kossinets, G., & Watts, D. J (2009) used e-mail interactions and course registrations to identify relationships in a large US University. Zhang et al. (2018) used papers extracted from Web of Science for a coauthoring analysis. Hâncean, M. G., & Perc, M. (2016) used a similar strategy restricted their analysis to sociology and Eastern Europe countries. The main conclusion of this literature is that when homophily is present, highly productive authors tend to work together, increasing output inequalities, which may be bad for the system output as whole.

Research questions

This study explores the structure and characteristics of research collaboration in a homogeneous scientific community and, therefore, contributes to the relevant literature on endogamy, homophily and social networks. Its main hypothesis is that scientific cooperation in an environment with high academic endogamy is largely influenced by homophily, which could lead to collaboration clusters of inbred scholars, making it hard for non-inbred scholars to form local ties. Thus, this paper addresses the following questions:

  1. (1)

    How are local research collaboration networks structured in elite research universities with high levels of academic endogamy? Do they change over time?

  2. (2)

    Does homophily influence the formation of ties between faculty members? Are scholars with identical academic endogamy status more likely to work together? Are these preferences maintained over time?

The case of professors in the University of São Paulo could lead to important findings helpful not only to the Brazilian higher education system, but also to the evaluation and planning of educational policies in other countries, mainly ones with the same level of system maturity as Brazil, where academic endogamy also occurs.

This study contributes to this literature in four ways: (i) by using an official dataset, in which affiliation and publications are self-reported but submitted for regulatory review, it differs from most studies in which data is obtained from the same sources (usually Web of Science) and usually only from publications in English-here we include publications in other languages as well; (ii) by analyzing the case of one single large ego network, which also happens to be the largest and most elitist university network in Brazil; (iii) a network with 70% level of endogamy; (iv) an analysis of co-authoring pattern in a developing country. For all these reasons, we believe this study is relevant and contributes greatly to the literature.

Data

The University of São Paulo (USP) is the best ranked Brazilian University on Times Higher Education rankings and QS University rankings. It was selected as a case study for this study due to its status in Brazil and to its high levels of endogamyFootnote 1-70% in 2016 (Grochocki, 2020). USP is a public university with an enrollment of 97,982 students and 5631 faculty members. Because it is such an important institution for higher education in Brazil, it is also responsible for the PhD degrees of 24.4% of active scholars in the whole system.

The dataset used was collected by CAPES (Coordination for the Improvement of Higher Education Personnel), the Brazilian government agency responsible for the establishment, evaluation, and financing of graduate programs in Brazil. Scholars in Brazil maintain an official profile with their curriculum vitae at “Plataforma Lattes”, an online academic resume database managed by CNPq (Brazilian National Council for Scientific and Technological Development), with information on researchers’ education, language skills, current and past employment, scientific publications, awards, and grants, among others. Information on these resumes is self-reported, but used for official purposes of funding and regulation.

An open-source Python program called “ScriptLattes” (Mena-Chalco & Cesar Junior, 2009) was used to extract the online data available on the scientific production of 5,230 unique ID numbers identified as USP professors from the years 2000 to 2019.

The total sample of scientific production of the 5,230 scholars represents 93% of the university’s faculty population, in all fields of knowledge. We excluded those scholars who were not linked to any graduate program. The final sample of 5,230 scholars led us to 196,941 journal papers, 71,239 conference papers and 18,992 books. More information is given in the appendix.

Collaboration information was split into five groups composed of four years of aggregated data each: 2000–2003, 2004–2007, 2008–2011, 2012–2015, and 2016–2019. An adjacency matrix on coauthorship data was created to run different methods of collaboration network analyses. Tables 1, 2 summarize descriptive information on the values of variables on this sample, while Table 3 shows how variables are correlated.

Table 1 Summary of USP’s Academic Collaboration Networks from 2000 to 2019
Table 2 Logistic regression model of predictors of tie formation (2000–2019)

Finally, the dataset was restructured from individual to dyadic format. In such a layout, every row corresponds to a potential or actual tie formed between two scholars within a determined period (4 years in the case of this study). Thus, rows describe not only tie characteristics, but also attributes of both individuals. The first listed scholar is referred to as an “ego” and its immediate contact as an “alter”. Equal pairs of scholars were given a unique “dyad ID” to match the same ties in distinct time periods. Following, a binary variable “tie” was generated as 1 for all 53,526 ties which were identified as having taken place in the last 20 years based on the coauthoring of journal and conference papers, and books. All other potential ties received a 0.

Considering the extensive number of potential ties (close to 137 million), a sample of 5 million of those was randomly selected while keeping every actual tie (Kleinbaum et al., 2013). This method was chosen as ties among older professors (former advisors/teachers) and young faculty members (former advisees/students) are expected in an environment with high endogamy. Furthermore, this study aims to illustrate cross-disciplinary collaboration. Therefore, the adoption of other methods, such as selecting potential ties based on the absolute difference in hiring year (Dahlander & McFarland, 2013) or limiting ties within fields, would not allow to describe those relationships. As expected, no significant differences were found when comparing the variables of the full and the randomly selected tie sample.

Method

This study uses descriptive social network analysis (SNA) methods, as well as multilevel modeling (MLM). At the network level, methodologies and tools were adopted to replicate and measure characteristics of complete networks. Images were produced using the program Gephi, version 0.9.2., layout method Force Atlas2 and shape method Polygon. Edge weights were rescaled to a normalized range.

Relational data challenges the assumption that observations are independent of one another. Consequently, multilevel modeling (MLM) has been widely adopted to address this limitation of Ordinary Least Squares regressions when analyzing ego networks. Furthermore, MLM avoids both ecological and atomistic fallacy, allowing for cross-level inferences. Multilevel modeling simultaneously estimates the variance within and between groups for an outcome variable, and its association with individual and group independent variables (Crossley et al., 2015; Peugh, 2010; Rabe-Hesketh & Skrondal, 2012; Snijders & Bosker, 2012; Snijders, Spree, & Zwaagstra, 1995). Among others, MLM holds that Level 1 residual variance is assumed to be constant within and between Level 2 units and that Level 1 and level 2 residuals are assumed to be uncorrelated (Perry, Pescosolido, & Borgatti, 2018).

To correct for heteroscedasticity, standard errors were clustered robust at the ego level. Besides, an unstructured covariance matrix was adopted to maintain the assumption of uncorrelated residuals.

The main model for this study is described in the equation:

$${logit(Y}_{ij})={\beta }_{0j}+{\beta }_{1j}{x}_{1ij}+{\beta }_{2j}{x}_{2j}+{\beta }_{3j}{x}_{3ij}+{\beta }_{4j}{x}_{2j}{x}_{3ij}+ {\cup }_{0j}$$

where \({Y}_{ij}\) is the outcome variable of interest “tie” between j (ego) and i (alter). \({\beta }_{ij}\) represents random differences between groups, where \({\beta }_{0j}\) equals the average intercept plus group-dependent deviation \({\cup }_{0j}\). \({X}_{1ij}\) serves as characteristics of an individual (level 1) in an ego network j (level 2). Likewise, \({x}_{2j}\) exhibits characteristics of group ego j. Following, \({x}_{3ij}\) represents the association of homophily of alter and ego shared traits. Finally, \({x}_{2j}{x}_{3ij}\) depicts homophily by ego interaction terms. \({\cup }_{0j}\) is an ego-level (level 2) residual (error) term. Thus, \({\sigma }_{\cup 0j}^{2}\) represents the magnitude of variation found among the average tie-values. Clustered robust standard errors were computed for all models at the ego level.

While non-inbred (dummy) is the treatment variable, other individual characteristics of egos and alters are female (dummy), age (continuous), years of experience (continuous), academic experience abroad (dummy), quantity of Postdoctoral researchers (continuous), PhD (continuous) and Masters’ students (continuous), and undergraduates (continuous), number of published papers in journals (continuous) and conferences (continuous), number of published books (continuous) and book chapters (continuous), and fields of study (categorical). Tie characteristics are degree (continuous), same academic endogamy origin (dummy), same gender (dummy), same field (dummy), and year period (categorical). The estimates of all models can be found on Table 4 in the results section.

Results and discussion

As discussed in the previous sections, Collaboration information was split into five groups composed of four years of aggregated data each: 2000–2003, 2004–2007, 2008–2011, 2012–2015, and 2016–2019. Inbred professors are square shaped while non-inbred ones are represented by circles. Scholars' fields are represented by colors: medical and health sciences (pink), social sciences (light green), natural sciences (blue), humanities (orange), engineering and technology (brown), interdisciplinary (red) and agricultural sciences (dark green).

Although a large university wide connected network was identified for every four-year period, a significant number of nodes neither connected to the broad university network nor formed any local collaboration (image with all nodes included at the left corner).

Figures 1, 2, 3, 4, and 5 represent the academic community of professors in the University of São Paulo and their local scientific collaborations between years 2000 and 2019, grouped into four-year periods while Table 5 summarizes the information in those figures.

Fig. 1
figure 1

USP’s academic collaboration between 2000–2003

Fig. 2
figure 2

USP’s academic collaboration between 2004–2007

Fig. 3
figure 3

USP’s academic collaboration between 2008–2011

Fig. 4
figure 4

USP’s academic collaboration between 2012–2015

Fig. 5
figure 5

USP’s academic collaboration between 2016–2019

Figures 1, 2, 3, 4, and 5 and Table 5 show collaboration clusters within the University of São Pauolo have increased over time. However, the core of these networks remain to be the STEM fields, with groups in the social sciences positioned on the edges, and collaboration in the humanities rather limited and barely showing in our figures. Interdisciplinary academic collaboration remained scattered during the whole period considered in our sample, which is expected, given its possible ties with several other fields. In other words, our data shows that, in the case of USP, collaboration happens within fields or, at the most, to loosely connected fields, as the strong clusters that emerged show.

Medical and health sciences were, and still are, the most important feature of these networks throughout the whole period considered. However, these clusters expanded and developed ties with researcher in other fields, such as the natural sciences, engineering and social sciences, which gained importance in these networks over the years.

Despite representing nodes with squares for inbred scholars and with circles for non-inbred ones, the visual analyses based on Figs. 15 would be limited. Thus, to further analyze the issue, we now turn to our multilevel logit model, which focus on evaluating how collaboration is affected by endogamy status. Since, in our data, positive events represent around 1% of the total sample and, therefore, the dataset is a sparse matrix, rare event bias could be a concern. However, this is not the case in our sample.

The first model is a multilevel logit model with fixed effects (level 1) and random effects for ego (level 2). This model includes individual characteristics of ego (j) and homophily tie traits with alter (i) as control variables. Standard errors were clustered at the ego level. Following, interaction term effects were added to Model 2 to compare the likelihood of tie formation of four groups: non-alumni inbred (j) to non-inbreds (i), inbreds (j) to inbreds (i), non-inbreds (j) to inbreds (i), and inbreds (j) to non-inbreds (i). Furthermore, Model 3 adds new interaction terms of mutual endogamy status trait with the five distinct year periods. This model contributes to the analysis of homophily effects changes on the likelihood of ties over time.

Like previous studies (Dahlander & McFarland, 2013; M. McPherson et al., 2001), results indicate that the hypothesis that homophily influences the establishment of academic collaboration among faculty members seems to be true in the case of those who share the same academic endogamy status. Data on joint academic publications show inbred scholars are more likely to hold ties with inbreds, as well as non-inbreds with their non-inbred colleagues. This means that those with endogamy ties to the university seem to collaborate more among themselves. Likewise, those who received their PhD degrees elsewhere seem to collaborate more among themselves, which suggest that research networks within the university may not be as integrated and diverse as they could be.

Ties among faculty members of distinct endogamy status occur, but they seem to be less likely than among those who share endogamy status homophily. This trend could be the result of endogamy status homophily influencing the formation of local ties among inbred scholars which in turn only gives non-inbred scholars the option of collaborating with each other or with scholars outside the University of São Paulo. Considering non-inbreds scholars are a minority of around 30% of the faculty body in the USP, a higher probability was expected of ties among non-inbreds and inbreds based on numbers alone. However, our models show collaboration is more likely to be found amongst those who share the same endogamy status.

Outcomes suggest that the slope for these homophily effects seem to become steeper every four-year period among non-inbreds. Therefore, it is likely that endogamy status preference is more influential on non-inbred faculty members over the years.

Besides these findings on the effects of shared endogamy status, other homophily characteristics also seem to impact the likelihood of academic collaborations. Collaboration networks are more likely to be found among those scholars of same gender and field. Females are also more likely to contribute to academic collaboration. On the other hand, if scholars were subjected to international academic mobility, they are less likely to collaborate within the University, suggesting that scholars with these kinds of experiences abroad may prefer collaborating academically with their external networks, to do research by themselves or to pursue other types of collaboration within their own university.

Conclusion

Results show that local research collaboration has been growing among University of São Paulo faculty members. Both inbred and non-inbred scholars have been benefitting from the opportunity of cooperating with their university colleagues. Notwithstanding, there is still a significant number of them who is weakly or not at all connected with their co-workers.

The hypothesis that a homogeneous setting is prone to an increased likelihood of ties being formed among faculty members who share mutual characteristics is confirmed. Same academic endogamy status is a highly statistically significant predictor of local research collaboration. That would be expected of inbred scholars for being a majority group which already knows the local culture and shares mutual contacts. However, non-inbred faculty members were also more likely to build collaboration ties among themselves. This could mean that research networks are not as connected and integrated as they could be. Furthermore, outcomes suggest that over time endogamy status preferences get stronger for non-inbred scholars. In other words, non-inbred scholars are more likely to increase their collaboration among themselves over the years. We would have expected that years of work at the university would have allowed for these scholars to integrate and to establish new and more intense collaborations with local inbred scholars and their established research clusters, but that does not seem to be the case in the University of São Paulo.

High academic endogamy may further promote the bond of similar individuals in local research collaboration networks. Consequently, these communities might be isolating their faculty members who were trained at other academic institutions, leading to segregated networks. Over time, such practice might discourage newcomers to integrate already established research clusters, pushing these non-inbred scholars to create their own local ties.

Universities with high endogamy could be neglecting a high valued local resource of non-redundant contacts and their connections and the possibilities it brings for increased more diverse collaboration, internationalization, and research productivity. Such behavior could limit opportunities for new information to be exchanged and for knowledge to be jointly produced. Non-inbred faculty members have the potential to form bridges connecting inbred scholars to scientific network contacts outside their own departments and university. Perhaps, a more balanced environment is the optimal format for information exchange and creativity to flourish within and outside universities.