1 Introduction

A social network (SN) is a set of actors (individuals, groups, teams, companies, etc.) connected to each other via some kind of relationship or tie such as kinship, friendship, sharing common members, etc (Newman 2001a). SN (especially Online SN) have an immense impact on people and the relationship between them. They paved the way for new forms of communication that were hardly imaginable just a short time ago (Ghali et al. 2012). Moreover, they have an equally important effect on businesses and academia. Modeling the different relationships between different entities using SN allows for the use of SN analysis (SNA) tools to capture patterns and extract useful information about these entities and the interactions between them.

A collaboration (or a co-authorship) network (ColNet) is a type of SN in which the actors are scholars/researchers and a tie between two actors represents the fact they have co-authored at least one paper together (Newman 2001a). The paper can be a peer-reviewed paper published in a scientific journal or it can be other of other types such as patents or project reports Singh et al. (2016); Biggiero and Angelini (2015) and Guffarth and Barber (2017). Study of research publication and collaboration has a long history with the first paper of scientific collaboration dating back as early as 1665 (Uddin et al. 2013). Since then (and especially over the past few decades), scientific collaborations have increased rapidly in various areas of research (Luukkonen et al. 1993). During the 1990s, a number of authors noted the potential utility of collaborative data and, in some cases, conducted statistical analysis of small-scale ColNet (Kretschmer 1994). Study of scientific ColNet has been a field of intense interest in recent years, because it not only depicts the academic society but also represents the knowledge structure in an open innovation community (Newman 2004).

Scholars have been analyzing ColNet extensively to recognize affected factors in behavior, motivation and performance of collaborations (Newman 2001a). The employed methods included statistical analysis of data at a basic level using regression and charts (Luukkonen et al. 1992) as well as extracting patterns and structures of ColNet (Newman 2001a). The goals of analyzing ColNet include understanding the effect of many issues/factors (political, economic, social, etc.) on a research community globally or within a certain region. This would help in explaining many phenomena. Another benefit is extracting the patterns of successful researchers, which might be beneficial to both researchers and decision makers who are seeking to improve the researchers’ productivity (Sarigöl et al. 2014).

The ColNet considered in this work is customized based on two aspects. The first one is the field of the published papers based on which the ties are established and the second one is the region in which the actors are based. However, such customizations give rise to several challenges as discussed later. After constructing the network, we aim at studying it thoroughly and analyzing the results with the hope of finding answers to questions like: how are the Levantine biomedical researchers interact with each other? I.e., do they prefer to work individually or in groups? If it is the latter, do they prefer to collaborate with local or foreign researchers? What is the effect of political, economic, social, etc., factors on their productivity? Are there special collaboration patterns common between highly productive researchers?

The rest of this paper is organized as follows. The following section discusses the related works. The proposed methodology is discussed in Sect. 3, where we discuss the construction of the ColNet and its analysis. The final section is dedicated for concluding remarks and future works.

2 Related works

In one of the earliest works on ColNet, Newman (2001a) analyzed scientific collaborations in four disciplines: physics, biomedicine, high-energy physics and computer science. He studied some fundamental statistics of these networks. Also, he studied in Newman (2004), other scientific ColNet constructed from three bibliographic databases in biology, mathematics and physics. He built these networks to answer some questions about collaboration patterns such as number of papers, number of authors and average distance between them. Also, he studied how collaboration patterns varied over time from one field to another.

Other works on ColNet are as follows. The authors of Cotta and Merelo (2006) analyzed a ColNet of researchers in Evolutionary Computation (EC). The network contained 6199 papers and 5492 authors. They highlighted the fundamental characteristics of the network such as authors with the highest centrality in the main component ranking them based on number of collaborators, closeness centrality, betweenness centrality, etc. The authors of Tomassini et al. (2007) studied a ColNet of Genetic Programming (GP) researchers to reveal some differences and similarities between GP ColNet and ColNet in other scientific fields via primary features of the ColNet such as distances, assortativity and betweenness centrality. The authors of Ding (2011) analyzed both ColNet and citation networks (CitNets) together to determine whether productive authors tend to co-author with (or cite) scientists in the same or different interests. He also tried to determine whether highly cited researchers tend to co-author with or cite each other. He took papers of information retrieval as a test field. The authors of Ye et al. (2013) investigated research collaboration for hospitality and tourism scholars from 1991 to 2010. The authors aimed to identify some critical scholars and showed properties of fundamental statistical of the ColNet over time. The authors of Bian et al. (2014) constructed a ColNet from research grants dataset collected from the Clinical and Translational Science Award (CTSA) program. They employed weighted network model to represent co-authorship strength, Random Walk with Restart (RWR) to suggest new research collaborations, and rank aggregation to detect important scholars. The authors of Vanni et al. (2014) constructed a ColNet to assess collaborations between authors and countries in HIV and HPV research. The network is constructed from 1793 records retrieved from Global Health, OVID Medline and PubMed databases. Finally, The authors of Andrade et al. (2014) studied several research ColNet to discuss how processes of scientific collaboration can be characterized and identified through complex and social networks. The data were collected from the reports of the Brazilian official educational authority (CAPES) from 2001 to 2009.

Several works exists on ColNet in specific geographical regions. Examples of such papers are as follows. The author of Perc (2010) studied the evolution of scientific collaboration for Slovene scientists from 1960 till 2010. He showed fundamental collaboration patterns of the scientific ColNet as size of the largest component, mean distance between authors, clustering coefficient and number of collaborators over time. The authors of De Stefano et al. (2013) studied a scientific ColNet of Italian academic statisticians. The network was constructed from three data sources to explore the effects of having multiple sources on network analysis. Results provide evidence of distinct collaboration patterns among statisticians as well as distinct effects of scientist network positions on scientific performance, by both statistics subfield and data source. The authors of Zhai et al. (2014) analyzed evolution of scientific collaborations for Chinese scholars from 1985 till 2009 in management research over time. The network dataset was collected from 258 journals indexed by ISI Web of Science database. Also, they constructed international ColNet and they determined collaborative countries with Chinese authors. The authors of Low et al. (2014) studied scientific cooperation in Malaysian Clinical Medicine from 2001 till 2010. The network contained 3941 articles classified based on the impact factors of the Journal Citation Report (JCR). Also, they studied local and international collaborations for local authors. The authors of Kumar and Jan (2014) constructed and analyzed two ColNet for Malaysian and Turkish authors from 2009 to 2011 in energy fuels field. They compared between the two ColNet using fundamental statistics. Also, they studied correlation between research performance and centrality measures. The authors of Lozano et al. (2014) studied the evolution of scientific ColNet at Atapuerca’s archaeological complex from 1992 till 2011. The network was built based on fieldwork and bibliometric data. They presented characteristics of basic structures of the ColNet.

3 Methodology

The objective of this work is to study scientific collaboration patterns by building and analyzing a ColNet of biomedical researchers in the Levant region, which includes the four Arab states of Jordan, Lebanon, Syria and Palestine. The steps we follow in this work are discussed in the following subsections.

3.1 Crawling PubMed

Capturing the relationships between entities (actors) to model them as social networks can be challenging depending on the domain of this process. While some relationships can be automatically constructed from available databases, other may require significant manual work and involve asking actors to fill out questionnaires. Luckily, with the spread of online scientific indices and databases, the information required to construct ColNet are generally available eliminating the need to ask researchers to provide information about their collaborators. However, for specialized/customized types of ColNet, things are not as easy as they seem. For example, the ColNet considered in this work is customized based on two aspects. The first one is the field of the published papers based on which the ties are established and the second one is the region in which the actors are based. Such customizations give rise to several issues/challenges.

Restricting the ColNet to a certain field and a certain “type” of authors has its advantages and disadvantages. For example, focusing on the field of biomedicine allows for the exploitation of specialized search engines such as PubMed to locate papers of interest. It is worth mentioning that one of the goals set at the beginning of this work is to mainly focus on publicly available tools. Our intention is to allow other researchers to benefit from our experience while building their own customized ColNet in different domains and different regions (or any other customizations of the ColNet).

Due to lack of sufficient data, an additional restriction on the ColNet considered in this work is related to the time of publication. The considered papers were published between 1991 and 2010. This period was selected to ensure sufficient and consistent data with minimal gaps throughout the period. It also covers the periods of time through which local universities enjoyed large growth rates (World Bank 2009). To meet the second and more challenging “regional” restriction on the ColNet considered here, a list must be compiled containing the papers written by authors affiliated with Levant-based universities. Such authors are considered “local”. On the other hand, there is a non-negligible percentage of authors whose affiliations are unknown; we call such authors “undetermined” authors.

We extract information about articles written by Levantine researchers which are indexed in PubMed. This information is queried using the names of Levantine universities as the “Affiliation” in addition to the publication years. The time interval of interest in this research covers the two decades from 1991 till 2010. As with other longitudinal studies, this period is divided into five time intervals of equal length: 1991–1994, 1995–1998, 1999–2002, 2003–2006 and 2007–2010. The articles contained information about the publishing journal and the authors. We have noticed that PubMed only shows the affiliation of the first author of each paper published prior to 2010. So, we constructed our ColNet based on this the policy.

3.2 Constructing the collaboration networks

A ColNet is constructed for each one of the five intervals of interest. Each network consists of both local authors as well as undetermined authors who collaborated with local authors on at least one paper.

While constructing the ColNet, several challenges are faced. The first challenge is related to accurately attributing each article to its author(s). Issues like authors with similar names (especially, when the first and middle names are missing or abbreviated), authors who write their names in (slightly) different ways, etc., must be addressed. Another challenge is related to the fact that many databases only list the affiliation of the first author or do not always include the country name. To address this issue, we compile a list of all academic institutions in the Levant region and search for authors affiliated with each one of them. Despite these measures and the careful systematic inspection of the authors and their affiliations, we are still left with a relatively large number of authors with undetermined affiliations. Such authors are placed in a category of their own.

Our approach for handling the names problems works as follows. It starts by scanning the raw data generating a unique identifier for each author. It also extracts authors affiliations. The goal is to collect all possible formats for each author’s name and group matching names of the different formats under a single unique identifier.

For example, consider an author with the name “Omari, Mohammed Ali”. There are five conventions/formats typically found in PubMed for such a name. These conventions are listed in Table 1 along with a numbering system we devise to simplify our discussions.

Table 1 Formats for the authors names typically found in PubMed

Before discussing our approach of handling this problem, we note that in order to speed up the processing, we use hash tables to store all formats of names as shown in Table 1. We also note that we treat names in a hierarchical way where more names specific names appear at a higher level. For example, if we come across two names: “Omari, M. A.” and “Omari, Mohammed A.”, then the latter can be the parent of the former provided that we do not have higher matching names such as “Omari, Mohammed Ali” and we do not have other names matching “Omari, M. A.” such as “Omari, Majed A.”. The latter case of having the names “Omari, Mohammed A.”, “Omari, Majed A.” and “Omari, M. A.”, forces us to label the name “Omari, M. A.” as ambiguous. Nonetheless, as a final attempt to remove this ambiguity, we resort to checking the address of each author if possible. For example, if “Omari, Mohammed A.” is affiliated with American University of Beirut, and both “Omari, Majed A.” and “Omari, M. A.” are affiliated with Jordan University, then we assign “Omari, Majed A.” to be the parent of “Omari, M. A.”. Finally, we note that ambiguous names are ignored when constructing the ColNet. Table 2 shows the amount of ignored information.

Table 2 Ignored authors and papers

3.3 Analyzing the ColNet

We input the ColNet constructed in the previous step into a well-known SNA tool called Pajek (Batagelj and Mrvar 1998) to extract all topological measures of the ColNet. Also, the top 20 local authors are extracted and several measures of centrality are computed for each one such as betweenness, closeness and degree in addition to the proposed measures. Finally, we analyze and discuss the computed measures. During the discussion, we provide some insights and findings.

3.3.1 Topological measures

Here, measures related to the topology of the network such as size and main component are discussed.

Size of the network The most direct and intuitive measure of size is the number of nodes (authors or researchers) and the number of edges (the co-authorship relationships between authors). These two numbers determine the scale of the network, which is important since some measures cannot be computed on large-scale networks whereas some measures might not make sense for small-scale networks.

Distances A path in a network is simply a sequence of nodes such that each pair of consecutive nodes is connected with an edge. The length of a path is simply the number of edges traversed along this path. The distance between two nodes is the length of the shortest path connecting them. The maximum distance (or diameter) and mean distance are important measures of SN. They are computed by taking the distance between every pair of nodes in the network and taking the maximum and the mean of these distances (De Nooy et al. 2011). The diameter is related to the extent of connectedness of the authors in the network, while the average of the distance is related to the ‘degree of separation’, i.e., the average number of steps needed to reach any node starting from any node in the network (Newman 2001b).

Main component A component in a SN is a set of nodes connected together via ties such that any node in the component can reach all other nodes in the group or can be reached by any node via intermediate nodes (Ghali et al. 2012; Tomassini et al. 2007). From its name, the main component is the largest component in the network and it is usually measured in size (number of nodes) as well as percentage (the ratio of its size with respect to the size of the network). In ColNet, the size of the main component represents the largest number of authors who collaborated with each other directly or indirectly.

Clustering coefficient (CC) The Clustering Coefficient (CC) measures clustering or transitivity which is the extent to which nodes of the network are connected directly. It is basically the fraction of connected triples in the network that are triangles (Newman 2001a, 2003). CC is computed in two ways, Transitivity and Watts–Strogatz. According to Bian et al. (2014), the transitivity way is more accurate. It is computed as

$$\begin{aligned} CC=\frac{3\times \text {Number of triangles}}{\text {Number of connected triples}} \end{aligned}$$

where a triangle represents trios of nodes in which each node is connected to each others, while connected triples represent trios of nodes for which at least one node is connected to the others (Bian et al. 2014). With respect to scientific collaborations, a high value of CC means that the authors tend to collaborate in sets of three or more authors (Tomassini et al. 2007). Part of the clustering in the network can be attributed to papers that are authored by three or more authors; such papers that introduce triangles of collaborating authors increase CC (Newman 2004).

Density The density is a measure to identify the connectedness of the network as a whole. It is defined as the number of edges in a network divided by the number of edges in a complete network with the same number of nodes (Hou et al. 2008). It is used to measure cohesion of the network (De Nooy et al. 2011). When the density of the network is low, this means that the network is loose, and vice versa (Andrade et al. 2014). It is used in a ColNet as an indication of connectedness strength between authors where the ColNet is expressed as loose or coherent.

Degree The degree of a vertex is the number if edges incident on it. It can include in-degree, out-degree or both (Ding 2011). The average degree of the network is calculated as

$$\begin{aligned} Dn = \frac{\text {Sum of all vertices degrees}}{\text {Number of vertices}} \end{aligned}$$

The average degree of a network can be used as a measure of structural cohesion better than the density because it does not depend on size of network (De Nooy et al. 2011). Also, the degree of a vertex can be used as measure to define central vertices in the network as a whole if they have high degrees (Hou et al. 2008), because vertices with high degrees are more likely to be found in the dense parts of the network (De Nooy et al. 2011). Another related measure depends on the variation in the degrees of all vertices in the network and the possible variation of the maximum degree (De Nooy et al. 2011). The formula to calculate the degree of network is

$$\begin{aligned} VDn = \frac{\sum DCi}{(N - 1) \times (N - 2)} \end{aligned}$$

where X is the maximum value of In-degree or Out-degree in the network, Di is the degree of a vertex i, N is the number of vertices and \(DCi = X Di\).

Betweenness The betweenness of a network measures the betweenness centrality scores of all vertices (De Nooy et al. 2011). The betweenness centrality depicts the importance of a vertex in the network depending on its occurrence on shortest paths between more pairs of nodes. It is defined as the probability that a particular vertex appears on a shortest path between any pair of vertices in the network (Ding 2011). Betweenness centralization is the variation in the betweenness centrality of vertices divided by the maximum variance in betweenness centrality scores possible in a network of the same size (De Nooy et al. 2011).

3.3.2 Papers measures

Here, we discuss the following fundamental measures of articles or research papers considered when constructing our ColNet.

  • Number of papers The considered papers are those found on PubMed with at least one local author. Obviously, they have to be published between 1991 and 2010 to be considered by our study.

  • Number of authors We consider two measures that differentiate between papers based on the numbers of their authors. The first measure is about the total number of authors and the second measure is about the number of local authors. We explore through these measures (especially the second one) the extent of collaboration among the local authors with each other and also we count the papers published by a different number of the local authors.

  • Percentage of local authors.

  • Authors per paper For these measures, we explore the average numbers of local and undetermined authors per paper.

3.3.3 Authors measures

Here, we discuss the following measures about the local authors and their collaborators in the ColNet.

  • Total local and undetermined authors.

  • Papers per author These measures explore the average productivity of local as well as undetermined authors.

  • Collaborators The total and average numbers of collaborators for local authors are considered here. Note that we account the distinct collaborators of each local author.

  • Collaborating authors This measure is concerned with the percentage of collaborating local authors. Collaborating local authors are those who collaborated with others, at least in one paper, while non-collaborating local authors are those who published all their articles alone (i.e., as single authors).

  • Percentage of local collaborators This measure defines the percentage of local collaborators for each local author.

3.3.4 Proposed measures

We proposed three new measures related to the evolution of the set of collaborators for each author over time. The measures are the number of new collaborators, the number of expired or deleted collaborators, and the stability rate of the set of collaborators. They are calculated per year to better capture the effect of changes in an author’s set of collaborators on his/her productivity.

New collaborators This measure is simply the number of new collaborators added by an author every year. For a certain author, the new collaborators in a certain year are those with whom he published a paper in this year and did not have any published paper with them in the previous year. In addition to computing the new collaborators, we are interested in investigating the common patterns among authors who have similar number of new collaborators. In order to do so, we compute the different percentages of new collaborators. We give an illustrative example. Let there be three local authors: L1, L2 and L3. Assume that, in 1992, the added authors are as follows: L1 added five new collaborators, L2 added ten new collaborators and L3 added five new collaborators. Then, the percentage of authors who added five new collaborators is \((2/3\times 100\%) =66.7\%\) whereas the percentage of authors who added ten new collaborators is \((1/3\times 100\%) =33.3\%\).

Deleted collaborators This measure is simply the number of deleted collaborators by an author every year. A deleted collaborator of a certain author in a certain year is one with whom the author published a paper in the previous year, but not in this year. As with the new collaborators, we are also interested in the percentages of deleted collaborators in each year.

Stability rate of collaborators The stability rate of collaborators is concerned with the changes made on the set of collaborators for a certain author in each year. To illustrate how this measure is computed, take the example of an author who had C1, C2, C3 collaborators at 1991 and C1, C2, C4, C5 collaborators at 1992. The stability rate would be \((2/3) =0.66\) since only two out of the original three collaborators remained in the set of collaborators.

4 Results and discussion

Analysis of SN is an important study to obtain beneficial information for actors relevant the study. One of the important studies in SN is the analysis of ColNet for studying researchers collaborations with other researchers directly or indirectly, to improve their scientific research in terms of both quality and quantity (Low et al. 2014; Cho et al. 2010). We analyze ColNet for biomedical researchers in the Levant region. In addition to the apparent importance of the biomedical field and scientific research conducted within it, the importance of this work comes from the lack of previous studies on ColNet of Levantine researchers in any field, to the best of our knowledge. The results are presented in terms of periods of time from 1991 till 2010 to explore the progress and the evolution of the ColNet. The selection of the period is made to ensure consistency and comparability of the available data. The results were also classified into several categories. Each category is discussed in a separate section. One of the main contributions of this work is to compare the characteristics of the constructed ColNet with those constructed for other communities in other regions. We choose the last period of interest (2007–2010) for such comparisons since it is the biggest and most recent one.

Table 3 Topological measures

4.1 Topological measures

In this section, we discuss the topological measures of the ColNet shown in Table 3. These measures are obtained via the Pajek tool.

  • Size of the network Table 3 shows the evolution of the constructed networks (in terms of the number of vertices and the number and edges) over time. These numbers increase significantly over time. This is due to the increasing number of local authors and their collaborators, whether local collaborators or undetermined collaborators over time.

  • Main component The table shows the size of main component and the percentage it represents from the network as a whole over time. Starting from the second time interval we consider, the percentage of the main component graws from near 60% to over 72.5%. This is consistent with the main component percentages for the similar ColNet of MEDLINE, SPIRES and NCSTRL, from Newman (2001b), which were 92.6, 88.7 and 57.2%, respectively. The evolution of the size and percentage of this component over time is also interesting to compute. The table shows that the main component started with a relatively small set of authors and grew over time to cover most of the authors. The size of the components grew more than 18 folds while the percentage doubled, this indicates that the communications and the collaboration of local authors with each other and with undetermined authors increases.

  • Distances The table shows the maximum distance (diameter) and the mean distance in the constructed ColNet. The diameter of the network starts with 15 and grows to reach 26 for the time period 1999–2002. However, in the following time periods, the diameter dropped to 23. Similarly, the average distance started with 5.27 and grew to reach 8.25 before dropping to 7.05. The initial increases in both the maximum and the average distances suggest that the network is getting bigger, but, with new authors tending to work separately or in small groups with low level of collaboration. The later drop in both metrics suggests that the networks are getting more connected and the collaboration level is increasing. This hypothesis is supported by the observations mentioned when we discussed size of the component. Comparing the above results with those of Newman (2001b), the mean distances for MEDLINE, SPIRES and NCSTRL were 4.6, 4.0 and 9.7, respectively, while in the last period of our collaboration networks, it was 7.05. However, Newman mentioned that the typical distance between any pair of researchers is about six (Newman 2001b). With respect to the diameter for the MEDLINE, SPIRES and NCSTRL ColNet, they were 24, 19 and 31, respectively; while in the last period of our ColNet was 23.

  • Clustering coefficient (CC) After discussing average degrees, diameter and main components as measures of connectivity in the network, it is common in the literature to compute the clustering coefficient. From the table, we note that the clustering coefficient does not change a lot over time. This means that the number of the complete connected triangles of authors is much lower than the number of the non-completed connected triangles. The only explanation is that collaborations triplets are not growing in number in our networks. The clustering coefficients for MEDLINE, SPIRES and NCSTRL, in Newman (2001b), were 0.07, 0.73 and 0.50, respectively, while in the last period of our collaboration networks, it was 0.21. Moreover, all values of clustering coefficient of constructed networks in Newman (2001a) and Ye et al. (2013) are higher than the values of clustering coefficient of our ColNet.

  • Density From the table, we see that the density of our ColNet starts with 0.005 and it decreases with time to reach 0.00097. This might be due to the significant increases in the number of authors over time, which are not accompanied with similar increases in the number of links. The densities of the constructed networks in Kumar and Jan (2014), are 0.0067 and 0.0103, respectively. We see that our networks are the lowest in terms of the density. This is also true when comparing the densities of our ColNet with the densities of the networks of Bian et al. (2014); Hou et al. (2008).

  • Average degree The average degree of our network in the last period and average the degrees of the constructed networks in Kumar and Jan (2014), are 9.075, 4.605 and 6, respectively. We see that our ColNet is the highest in terms of the average degree. The only explanation for the observations that our network has lower density and the higher average degree is due to the high number of vertices compared with the networks. The table shows that average of degree of all authors in the ColNet grows up over time. This means that the collaboration of authors and cohesion among them increase over time.

  • Degree Here, we discuss another measure related to the degrees of ColNet. We see that the highest value is at 1991–1994. After that, it is significantly reduced. This refers to more variation in number of collaborators of the authors at this period of time, while in later periods of time, the variation in the number of collaborators of authors is few.

  • Betweenness For the betweenness measure, the table shows that the betweenness at 1991–1994 is low. Then, after one jump, it becomes stable.

Table 4 Fundamental features of the articles in the ColNet

4.2 Papers measures

In this section, we analyze some of the features of the articles under consideration. These features are listed in Table 4. As with the other measures, we divide the interval of interest into five subinterval and study the evolution of the measures over time. Also, we compare them with previous works on ColNet.

  • Number of papers The table shows the changes in the total number of papers published for local as well as undetermined authors. It depicts the productivity of the authors in the ColNet. The table shows an increase in the number of papers to match the the increase in the number of authors probably due to the spur of growth enjoyed by the academic institutions in the countries of interest starting in the early 2000s.

  • Papers authored by local authors The table shows the papers authored by different numbers of local authors. The percentage of multi-authored papers by local authors in the last period 54.96%, which is very close to the percentage for the network of Ye et al. (2013), which is 59.51%. The table shows that, for the papers published in the early 1990s, the vast majority of them have only one local author. The scientific collaboration between local authors is weak for these periods as only 1% of the papers have four or more local authors. While the number of papers with a single local author continues to increase over the following time periods due to the growth of the academic institutions and the increased importance of research in the region, the percentage of such papers decreases significantly. On the other hand, papers with multiple local authors have increased in both number and percentage. This is the first set of results directly showing an increase in scientific collaboration between local researchers over the years. Moreover, the exponential increase in papers with four or more local authors from being rare cases to constituting almost 10% of the papers suggests that relatively large research groups of local researchers are starting to form. Another observation to support this tendency to work in larger groups is the increase in the average number of authors per paper from 3 to 4 over the entire period of time under consideration.

  • Papers authored by local and undetermined authors The table shows the papers authored by a different number of authors, whether they are local or undetermined. The table shows the change in the number of articles authored by a different number of authors. We see that the percentage of papers authored by a single author tend to decrease, while the percentages of papers authored by two or three authors are relatively stable over time. However, the percentage of papers authored by more than three authors grows over time. This means that the rate of collaboration in these articles is relatively high.

  • Percentage of local authors We now discuss the percentage of local authors in the papers in order to examine their presence and their proportion in these papers. The table shows the percentages of local authors in the considered papers over the intervals of interest. From the table, we see that the majority of the considered papers have 20–60% of their authors as local authors. The table shows that the papers with dominant presence of local authors have a reasonable percentage overall; however, the interesting thing is the decline in such papers over the years, which means that the local authors under consideration tend to keep some level of collaboration with non-local authors even if this level is weak.

  • Authors per paper We now examine the average number of local and undetermined authors for each paper. The table shows that the average number of papers for each local author is increasing over the time to reach 1.91 in the last interval. Compared with the averages of the MEDLINE, SPIRES and NCSTRL networks of Newman (2001b), which were 3.75, 1.99 and 8.96, respectively, the number of our network seems reasonable. Finally, we note that the averages for undetermined authors are higher than those for the local authors.

Table 5 Fundamental features of the local authors in the ColNet

4.3 Authors measures

In this section, we analyze some of the features of local authors and their collaborators. These features are listed in Table 5.

  • Total local and undetermined authors Here, we are concerned with the growth in the numbers of the local as well as undetermined authors. The table shows that the number of undetermined authors is almost twice as much as the number of the local authors over all time intervals under consideration. Moreover, the figure also shows steady increases in the numbers of both local and undetermined authors. One interesting observation about this specific trend is that the highest jump in the number of local authors from one interval to another happened from the interval 1995–1998 to the interval 1999–2002 which coincides with the beginning of the the growth spur enjoyed by the academic institutions in the countries of interest.

  • Papers per author We now discuss the productivity of local and undetermined authors. The table shows that average number of papers per author is steady over the entire interval of interest as it ranges between 2.32 and 2.72. At a first glance, these numbers seem very low compared with the averages of the MEDLINE, SPIRES and NCSTRL networks of Newman (2001b), which were 6.4, 4.8 and 11.6, respectively. However, this difference might be due to the fact that our averages are computed for a short period of time (4 years), which means that authors with papers published over several intervals will have a different average computed for each interval. The table shows that the average number of papers for the local authors is higher than the average for the undetermined authors, which is expected since for each undetermined author, we only consider the papers he/she co-authored with a local co-author. He/she might have other papers which were not considered in our study.

  • Collaborating authors Here, we discuss the percentage of collaborating and non-collaborating local authors. From the table, we can clearly see that the percentage of authors who prefer to work independently without collaborators is diminishing.

  • Collaborators This part discusses measures directly related to one of the main issues addressed in this work, which is the collaboration between researchers. The table shows that the total number of collaborations of local authors (i.e., the total number of edges incident on a local author node in the network) increases significantly with the passage of time. On the other hand, the table shows that the average number of collaborators per local author (i.e., the average degree of local author nodes in the network) does not grow equally fast since the number of local authors increases significantly with time. The highest value it reaches is 7.56 in the last interval of interest. For the three networks of Newman (2001b), which are the MEDLINE, SPIRES and NCSTRL, networks, the averages were 18.1, 173.0 and 3.59, respectively.

  • Percentage of local collaborators This part discusses, for local authors, the percentages of collaborators, whether they are local or undetermined. The table shows that in the early 1990s, only one quarter of the collaborators are local. With the passage of time, this percentage increases to 41% showing a tendency of local authors to collaborate with each other. However, over the last two periods under consideration, this percentage gets saturated. To compare our results with another ColNet with emphasis on the concept of local vs non-local authors, the tourism network of Ye et al. (2013) reports a collaboration of 76%.

4.4 Proposed measures

In this section, we study the collaborators of local authors in different ways. We depend on our proposed measures to extract the number of new and deleted collaborators and the stability rate of each author’s set of collaborators and how these numbers change over the years.

4.4.1 Collaborators of local authors

This subsection discusses the numbers of collaborators for each local author in each year. The amounts are shown in Tables 6 and 7. We study these amounts for the local authors’ collaborators for each year separately. This indicator is used to find out the amount of collaborators who have collaborated with local authors, and also the percentages of the local authors who co-authored them for each year. From these tables, we see that most of the local authors have 1–25 collaborators at all periods of time, but a very small percentage of the local authors have collaborators in amount 26–50 and 51–74 in the recent years particularly.

Table 6 Local authors and number of their collaborators from 1991 till 2000
Table 7 Local authors and number of their collaborators from 2001 till 2010

4.4.2 New collaborators

This subsection discusses the percentage of the local authors who added new collaborators in each year. We use this measure to find out the number of new collaborators who are added by the local authors, and also rate of the local authors added per year. The results are shown in Tables 8 and 9. From these tables, we see that most of the local authors added 1–25 new collaborators at all periods of time, but a very small percentage of local authors added new collaborators in amounts 26–50 and 51–74 in the recent years particularly.

Table 8 Local authors and number of their new collaborators from 1991 till 2000
Table 9 Local authors and number of their new collaborators from 2001 till 2010

4.4.3 Deleted collaborators

This subsection discusses the percentage of the local authors who deleted collaborators in each year. We study these measures of deleted collaborators for local authors per year separately. We calculate this measure to find out the number of deleted collaborators who are deleted by the local authors, and also the rate of local authors who deleted them per year. These amounts are shown in Tables 10 and 11. From these tables, we see that most of the local authors deleted 1–25 collaborators at all periods of time in the collaboration networks, but a very small percentage of the local authors deleted collaborators in the amounts 26–50 and 51–74 in the recent years.

Table 10 Local authors and number of their deleted collaborators from 1991 till 2000
Table 11 Local authors and number of their deleted collaborators from 2001 till 2010

4.4.4 Stability rate of collaborators

This subsection discusses the stability rates of collaborators for the local authors. These rates are shown in Tables 12 and 13. We study the ratio of stability of collaborators for the local authors per year separately. We calculate this measure to find out the stability of collaborators with the local authors. From the tables, we see the percentages of the local authors and the stability ratios of their collaborators with them at all periods of time.

Table 12 Local authors and rate of their stability rates with collaborators from 1991 till 2000
Table 13 Local authors and rate of their stability rates with collaborators from 2001 till 2010

4.5 Highly productive and influential authors

In this section, we focus on the highly productive and influential local authors in our ColNet. For this purpose, we determine the top 20 local authors in all intervals by computing the degree centrality, betweenness centrality, closeness centrality, productivity and numbers of new and deleted collaborators. Moreover, we focus specifically on local authors with the highest productivity (highest number of papers) and compute the following statistics: the numbers of new and deleted collaborators, the stability rate of collaborators, the number of collaborators, and their productivity at each year.

4.5.1 Top authors

In this subsection, we find top 20 local authors in terms of degree, betweenness, closeness centrality and productivity at all periods of time. We also find top 20 local authors in terms of the number of new collaborators, and the number of deleted collaborators. The last period of time is chosen as a case study to highlight some observations. At the 2007–2010 interval, we notice that the top local author in terms of the productivity is same the top local author in terms of the number of new collaborators, number of deleted of collaborators and a number of collaborators. Moreover, the top local author in terms of the closeness and the betweenness is not same the top local author in the productivity. Table 14 shows the matching between the top local authors in terms of the used measures and the top local authors in terms of the productivity.

Table 14 Matching ratio between the top local authors in terms of the used measures and the top local authors in terms of the productivity

4.5.2 Top 20 local authors in terms of productivity

In this subsection, we compute some statistics about the top 20 local authors in terms of productivity. The computed statistics for each local author are: number of new collaborators, number of deleted collaborators, stability rate of collaborators, number of collaborators and productivity at each year. We study these statistics to explore their effect on the productivity. The last period of time is chosen as a case study.

Intuitively speaking, one might assume that these factors do have an affect on productivity. Here, some examples are extracted to prove this conjecture. The identifiers of local authors are used to denote the local author. Local author 522’s interaction with his collaborators is considered as an example of the impact of all the factors in the productivity. He authored 22 and 17 papers in years 2009 and 2010, respectively. Looking into his statistics, we note that he added 43 new collaborators in year 2010, he deleted 50 collaborators. In total, he had 61 collaborators with a stability rate of 30%. On the other hand, in year 2009, he added 51 new collaborators, and deleted 31 collaborators. In total, he had 68 collaborators with a stability rate of 40%.

Following are other examples to illustrate the impact of these factors individually on the productivity. Local author 518 authored five papers in year 2009 and 10 papers in year 2010. In year 2009, he added 19 new collaborators and deleted 12 collaborators. In total, he had 23 collaborators with a stability rate of 30%. In year 2010, he added 45 new collaborators and deleted 17 collaborators. In total, he had 51 collaborators, with a stability rate of 30%. Local author 518 is an example of the impact of the number of collaborators on the productivity. On the other hand, the number of collaborators might not always be the only influencing factor on productivity. For instance, local author 67 authored 10 papers with 66 collaborators in year 2010, while local author 1187 authored 12 papers with only 17 collaborators in the same year. Following are other examples related to other measures. Regarding the stability rate, local author 1081 authored eight papers in year 2008 with a stability rate of 80%. In the following year, he authored five papers with a much lower stability rate of 20%. Another example is local author 853 who authored three papers in year 2008 with a stability rate of 10% and three papers in year 2009 with a stability rate of 80%.

From the above discussion, we can find that the diversity in the collaborators contributes to the improvement of researchers’ productivity. The diversity in the collaborators is characterized by adding new collaborators and deleting existing collaborators, which gives a low stability rate. This improvement is due to exchange of ideas and scientific experiences among the authors as well as not being restricted to certain ways of thinking and working.

4.5.3 Highly productive local authors

Table 15 Statistics for the local authors who have authored 15 papers in a single year

Table 15 shows the statistics of these local authors who have produced more than 15 papers in a single year. The best local author in terms of the productivity is ATT. Looking at his statistics, we can see that his productivity increases with the increase in the number of new collaborators and the increase number of deleted collaborators. Moreover, his stability rate is low (0.3–0.4) and the number of his collaborators is high (41–68). Looking at the entire set, we note that the ranges for number of new collaborators, number of deleted collaborators, the stability rate and the number of collaborators are 13–51, 6–50, 0.3–0.4, and 12–68, respectively.

According to the results discussed previously, we can argue that the proposed measures are positively correlated with productivity. Furthermore, this finding is supported by calculation of the correlations via slope lines. We find these correlations in three case studies. In the first one, we consider all local authors. In the second one, we consider the set of local authors who published at least five papers in a single year. Finally, in the third one, we consider the set of local authors who published at least ten papers in a single year. Figures 1, 2 and 3 show the positive correlation between the proposed measures as well as the number of collaborators with the productivity in every case study. From these figures, we see that number of collaborators, number of new collaborators and number of deleted collaborators measures have positive correlation with the productivity. Also, this correlation increases with increased productivity.

Fig. 1
figure 1

Correlation of our proposed metrics with the productivity based on all local authors

Fig. 2
figure 2

Correlation of our proposed metrics with the productivity based on local authors who they published at least five papers in a 1 year

Fig. 3
figure 3

Correlation of our proposed metrics with the productivity based on local authors who they published at least ten papers in a 1 year

With regards to the correction between the stability rate and the productivity, we see that for the second and third case studies consisting mainly of highly productive people, most of the authors have a stability rate between 0.3 and 0.4. Then, we can conclude that the authors’ high interaction with the collaborators has correlation with improved productivity. The justification for this finding might lie in the improved exchange of ideas and scientific experiences with higher level of interaction as well as not being restricted to work with certain collaborators and on certain problems.

5 Conclusion and future work

In this work, we analyzed ColNet for Levantine researchers in the biomedical field. The analysis focused on the patterns of scientific collaboration based on the published papers found on PupMed from 1991 till 2010 years. The ColNet are still largely understudied, to the best of our knowledge. Analyzing the ColNet reveals that local researchers are shifting from working individually or with non-local collaborators to working within large groups of local collaborators relatively. Furthermore, we find that the authors’ high interaction with their collaborators based on the proposed measures is correlated with improved productivity. The justification for this finding might lie in the exchange of ideas and the scientific experiences with more researchers, as well as not being restricted to work with certain collaborators and on certain problems.

There are many future directions this work can take. The first one is to cover larger regions and different areas of the Arab world. The second one is to come with additional measures and study their effect on productivity. The third direction is to expand this study to consider the quality of publication in addition to quantity; i.e., incorporate publication quality measures such as citation and impact factor into the dataset and focus on knowledge creation and transfer (Wang 2016).