Synonyms

Disciplinary; Knowledge flow; Scholarly communication; Scholarly networks; Science maps; Scientific collaboration; Scientific evaluation; Topic identification

Glossary

Node (in Scholarly Networks):

Entities such as words, papers, patent, authors, journals, institutions, fields, or country

Edge (in Scholarly Networks):

Citation, cocitation, co-word, coauthor, bibliographic coupling, or hybrid relations

Scholarly Network:

The combination of edge properties and node properties defines a scholarly network

Macro-level Approach:

Statistics that are used to identify the global structural features of the networks, including component, bicompo-nent, shortest distance, clustering coefficient, degree distribution, and error and attack tolerance

Meso-level Approach:

Approaches that focus on the behavior of a group of actors, including topic identification and community detection

Microlevel Approach:

Indicators that are useful to understand individual node's power, stratification, ranking, and inequality in social structures, including centrality measures and PageRank and its variants

Introduction

In recent years, we have witnessed a growing trend of studying various types of networks, such as social networks, information networks, technical networks, and biological networks (Newman 2003). These studies were informed by the social studies of human interactions, were accelerated by the discovery of small-world and scale-free properties, and were also enriched by various macro-level statistics, meso-level clustering techniques, and microlevel indicators.

Studying characteristics of scholarly communication is crucial for understanding and exploration of reasons for better scientific innovation, scientific collaboration, and scientific activities in general. Scholars have used different types of networks to answer a wide spectrum of questions related to research interaction, scholarly communication, and science policy making; these efforts have greatly advanced the scholarship of scientometrics and informetrics. The earliest well-defined network in scholarly communication is probably the paper bibliographic coupling network, proposed by Kessler in the 1960s (Kessler 1963). Since then, various types of networks have been proposed and examined, for instance, co-citation networks, citation networks, coauthorship networks, coword networks, and hybrid networks. For these networks, the paper is usually the basic research unit and can be aggregated into several higher levels, such as the author, journal, institution, and field level. Network types define edge properties and aggregation levels define node properties. The combination of edge properties (i.e., citation, co-citation, co-word, coauthor, bibliographic coupling, or hybrid) and node properties (i.e., words, papers, patents, authors, journals, institutions, fields, or country) precisely defines a network. Such networks are referred to as scholarly networks in this entry.

Various types of scholarly networks provide an ideal research instrument to quantitatively study scholarly communication. In particular, scholarly networks have been employed to study several essential aspects of scholarly communication: conducting scientific impact evaluation (primarily through citation networks), studying scientific collaboration (primarily through collaboration networks), identifying research specialties and topics (primarily through co-occurrence networks), and studying knowledge flow patterns (primarily through citation networks).

Scholarly Networks as a Type of Networks

In an important review article on complex networks, Newman (2003) distinguished four kinds of real-world networks: social networks (e.g., collaboration networks), information networks (e.g., citation networks), technical networks (e.g., Internet router networks), and biological networks (e.g., protein networks). Based on such division, two types of scholarly networks can be distinguished: social networks vs. information networks. In social networks such as coauthorship networks, a node is a social actor (i.e., an author); in information networks, a node is usually an artifact, such as a paper, a journal, or an institution.

In addition to “social networks vs. information networks,” another distinction can be made, which is “real connection-based networks vs. similarity-based networks.” Coauthorship networks and citation networks are constructed based on real connections, whereas cocitation, bibliographic coupling, topical, and co-word networks are constructed based on similarity connections. These scholarly networks can also be viewed from their edge types: collaboration-based, citation-based, or word-based. Citation-based scholarly networks include citation networks, co-citation networks, and bibliographic coupling networks; word-based scholarly networks include topical networks and co-word networks; collaboration-based networks include coauthorship networks. Those distinctions (social networks vs. information networks, real connection-based networks vs. similarity connection-based networks, citationbased networks vs. non-citation-based networks) are helpful to understand how different types of scholarly networks relate to each other.

Yan and Ding (2012) constructed six types of scholarly networks aggregated at the institution level and found that topic networks and coauthorship networks have the lowest similarity and these two types of networks set two boundaries (social and cognitive) for all six types; co-citation networks and citation networks have high similarity; bibliographic coupling networks and co-citation networks have high similarity; co-word networks and topical networks have high similarity.

The Use of Scholarly Networks

Before network theories were introduced to scientometrics, accumulative citation counting was widely used in the area of scientific evaluation. In the same vein of research, several citation-based indicators were proposed, such as Journal Impact Factor and h-index (Hirsch 2005). The accumulative citation counting and citation based indicators equated all citations to have the same weight, without consideration of the citing papers, citing authors, or citing journals. This equal counting mechanism has been questioned, as scholars (e.g., Pinski and Narin 1976; Bollen et al. 2006; Yan et al. 2011) argued that it is more reasonable to differentiate the weight of citations based on the source of endorsement. This tension has largely been alleviated by the construction of different types of scholarly networks and the invention of various network-based bibliometric indicators. Comparing to traditional citation counting, scholarly networks have the advantage to consider the source of the citation endorsement. In this way, scholarly networks can capture the complex research communication and interaction more precisely.

In addition to scientific evaluation, scholarly networks also contribute to other realms of scholarly communication and science policy making. For instance, coauthorship networks have been used to detect research communities and identify collaboration patterns (e.g., Newman and Girvan 2004); co-citation networks, bibliographic coupling networks, and co-word networks have been used to identify research specialties, examine interdisciplinarities, and map the backbone of science; and citation networks have been used to study knowledge flows and knowledge transfer in science and technology (e.g., Jaffe et al. 1993; Yan et al. 2013).

The Framework of Studying Scholarly Networks

Through scholarly network analysis, scientists and policy makers have gained unprecedented insights into the interaction of various research aggregates. The study of scholarly networks in general can be presented in a framework (Fig. 1), including approaches, network-network types, network-aggregation levels, and applications.

Scholarly Networks Analysis, Fig. 1
figure 1919figure 1919

A framework of scholarly network studies

Approaches

Given that we have established a scholarly network, we can describe its properties on three levels, by macro-level metrics (global graph statistics), meso-level techniques (community characteristics), and microlevel metrics (individual actor properties). Macro-level metrics seek to describe the global characteristic of a scholarly network as a whole with the aim to capture the generic structural features of a network. Commonly used measures include diameter, mean distance, components, and degree distribution. Meso-level techniques focus on identifying research communities and studying how communities interact with each other. Microlevel metrics relate to the analysis of the individual properties of network actors, for example, actor position, actor status, and distance to others, which informs us about “the differential constraints and opportunities facing individual actors which shape their social behavior” (Yin et al. 2006, p. 1600). It zooms in to capture the features of the individual nodes/actors in a network with consideration of the topology of the network. Microlevel metric usually refers to centrality, which indicates how central the actor is to the network. Central actors are well connected to other actors, and metrics of centrality will measure an actor's degree (degree centrality), average distance (closeness centrality), or the degree to which geodesic paths between any pair of actors passes through the actor (betweenness centrality).

Macro-level Macro-level metrics are useful to identify the global structural features of the network. There are many ways of characterizing the structure of a network, such as component, bicomponent, k-core, mean distance, clustering coefficient, degree distribution, and error and attack tolerance of the network. In network analysis, connected graphs are called components.

  • Component analysis can be used to learn about the macro-level structure of a network.

  • In a bicomponent, no node can control the information flow between two other nodes completely because there is always an alternative path that information may follow (Nooy et al. 2005).

  • The k-core of a network is a substructure in which each node has ties to at least k other nodes (Seidman 1983).

  • A geodesic is the shortest path between two nodes.

  • The degree of a node is the number of other nodes connected with it. Degree distribution measures the character of a network: a few nodes have many links and majority have smaller numbers of links.

Meso-level Meso-level scholarly network analyses focus on clustering various scholarly objects in the same groups based on certain clustering or modeling techniques. The clustering of papers, authors, institutions, journals, and subject categories is usually referred to as community detection; and the clustering of words and research topics is usually referred to as topic identification. Broadly perceived, clustering techniques fall into two branches: one yields discrete results where a node in a scholarly network is grouped into one or a couple of clusters; and the other branch yields fractional results where a node is grouped into clusters with certain probabilities. “Discrete” clustering techniques are traditional methods that include graph partitioning (e.g., Kernighan-Lin algorithm), hierarchical clustering, partitional clustering (e.g., k-means), and spectral clustering (e.g., algorithms utilizing Laplacian matrices). In this decade, more and more clustering tasks have used modularity-based methods that use modules to measure the strength of communities. “Fractional” clustering techniques use probabilistic models to assign papers, journals, or authors to clusters. The outcomes of topic models are probability distributions of words, papers, journals, or authors for each topic (e.g., Blei et al. 2003).

Micro-level Freeman (1979) elaborated four concepts of centrality in a social network, which have since been further developed into degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality. Eigenvector is based on the principle that the importance of a node depends on the importance of its neighbors. PageRank, on the other hand, is derived from the influence weights proposed by Pinski and Narin (1976); it is formally formulated by Brin and Page (1998), who developed a method for assigning a universal rank to Web pages based on a weight-propagationalgorithm called PageRank. A page has high rank if the sum of the ranks of its backlinks is high. Actors in the PageRank of Web information retrieval systems are Web pages, and actors in the PageRank of coauthorship networks are authors. The underlying idea is that a citation from an influential publication, a prestigious journal, or a renowned author should be regarded as more valuable than a citation from an insignificant publication, an obscure journal, or an unknown author. It is sometimes argued that non-recursive indicators measure popularity and recursive indicators measure prestige.

Network Types

In addition to the different approaches, the interaction of research aggregates can be explored from different types of scholarly networks. Each type of scholarly networks has its own use and can bring different perspectives to study research interaction and scholarly communication. For example, social networks such as coauthorship networks focus on finding collaboration patterns of contacts or interactions between social actors. Similarity-based networks such as co-citation networks, bibliographic coupling networks, and co-word networks focus on identifying research topics or schools of thoughts. In citation networks, each node is a piece of knowledge and a link denotes the knowledge flow.

Aggregation Levels

In these network types mentioned above, an article usually is a single research unit and can be aggregated into several higher levels. Figure 2 shows the different aggregation levels discussed in scholarly network studies.

Scholarly Networks Analysis, Fig. 2
figure 1920figure 1920

Aggregation levels of scholarly networks

The right side of the cascade is connected through “journal-ship” affiliation: a paper is published in a journal, a journal is classified into a subject category, and a subject category is further classified into a class. The left side of the cascade is connected through authorship affiliation: a paper is written by authors, an author is affiliated to an institution, and an institution is located in a country. Through studies of different research aggregates, we are provided with multiple focus lenses that allow us to zoom in and gain a concrete, detailed perspective on research interaction, while zooming out allows us to obtain a holistic and integrated view of the interacting institutions and disciplines.

Key Applications

Scholarly networks have rich applications in the studies of scholarly communication and research interactions. Broadly perceived, six applications are apparent to us. In this section, brief introductions are given for each application.

Evaluating Research Impact

Impact evaluation has become an important issue in the science community. Scientists as well as policy makers now have a keen interest in evaluating scientific output. For scientists, evaluations of research impact help them find potential collaboration, discover new research topics, and locate appropriate venues to publish their work. For science policy makers, evaluations of research impact help inform them how to allocate research funds, promote emerging research fields, and monitor discipline developments. The traditional citation-based bibliometric indicators do not consider the source of the citation endorsement. However, in reality, being cited by a renowned author, a prestigious journal, and/or a highly influential paper differs from being cited by a remote author, a peripheral journal, and/or an obscure paper. Network-based bibliometric indicators are capable of considering the provenance of citation endorsement; specifically, PageRank and its variants have gained popularity in evaluating research impact. PageRank-like indicators denote a collection of algorithms based on Google's PageRank, such as Y-factor (Bollen et al. 2006), CiteRank (Walker et al. 2007), Eigenfactor (Bergstrom and West 2008), and SCImago Journal Rank (SCImago 2007). Among these network-based bibliometric indicators, citations are weighed differently depending on the status of the citing publication (e.g., Walker et al. 2007), the citing journal (e.g., Bollen et al. 2006; Pinski and Narin 1976), or the citing author (e.g., Radicchi et al. 2009).

Studying Scientific Collaboration

Scientific collaboration, as a large-scale real-world social phenomenon, has a particular charm to scientists and social scholars. Coauthorship networks provide an accurate and expedite medium, allowing scientists and scholars to explore various intriguing questions pertinent to this social phenomenon. Physicists and mathematicians have discovered the small-world and scale-free properties from coauthorship networks, for the first time providing a systematic inquiry into humans' social relationships. Later on, coauthorship networks have been used as a testing field for various modern clustering techniques (e.g., Newman and Girvan 2004). Such techniques are useful to examine scientific collaboration at a more granular level, providing insights to study the science of team science.

Studying Disciplinarity and Interdisciplinarity

The topic of interdisciplinarity has long been a research focus for social scientists. The quantitative study of interdisciplinarity has been enhanced by studying citation networks aggregated at the field level. Scholars usually chose some representative journals, or all journals from a field based on the ISI's classification of journals, and then measure the extent to which the publications of the chosen field cited the publications of other subject categories. Network-based indicators have also been proposed to measure how interdisciplinary disciplines are, using measures such as entropy (Zhang et al. 2010), integration and specialization (Porter et al. 2006), diversity and coherence (Rafols and Meyer 2010), and relative openness (Rinia et al. 2002).

Identifying Research Expertise and Research Topics

Human knowledge, in the form of scholarly publications, increases at a fast pace. How to effectively organize the expanding knowledge has become an important issue. Under such motivation, scholars have proposed various clustering techniques to group papers, authors, journals, institutions, and fields, with the aim to identify and organize research specialty in an effective way. For similarity-based scholarly networks such as co-citation networks and bibliographic coupling networks, the assumption is that if two research entities co-occurred frequently, then they are more likely to have similar characteristics. Therefore, co-occurrence networks can successfully achieve the goal of identifying and organizing scientific knowledge (e.g., White and McCain 1998; Boyack et al. 2005; Waltman et al. 2010).

Producing Science Maps

Clustering results can also be presented in science maps, and these maps are able to deliver richer and more informative messages to a broader audience body. Science maps on author and journal interactions are usually used to identify research topics (e.g., Boyack et al. 2005). As institutions are associated with geographical locations, science maps at the institution level are useful to illustrate the geographical distribution of scientific productivity (e.g., Leydesdorff and Persson 2010). Science maps at the field level provide a unique view on the backbone of science (e.g., Boyack et al. 2005) or on the knowledge flow in scientific disciplines (e.g., Rosvall and Bergstrom 2008).

Finding Knowledge Paths

The production and creation of knowledge is not dependent on a single isolated entity; instead, knowledge is diffused, exchanged, and circulated among various entities. Knowledge flow, in the past 20 years, is becoming more inter-sectoral, more interorganizational, more interdisciplinary, and more international. The issues of how do scientific and technological knowledge, innovative ideas, management skills, or certain influences transfer within different sectors, between different organizations, and between different scientific disciplines are pertinent to understanding patterns of knowledge transfer and dissemination. Citation networks serve as an ideal research instrument to uncover such patterns. In citation networks, a node is a research aggregate, and a link denotes a citation from the citing research aggregate to the cited research aggregate.

Future Directions

Studies on scholarly networks usually chose one type of network at one aggregation level. The choice of a type of network can be inconsistent or even arbitrary, and the findings have been discrete and cannot be generalized to address a wider spectrum of research questions. We recommend that, in order to capture varied aspects of research interactions, different types of networks need to be combined and thus form a hybrid network. Beyond hybrid approaches, scholars have proposed heterogeneous scholarly networks to incorporate different academic entities while keeping edge semantics. Study of the heterogeneous networks has evolved from bi-typed networks to star-typed heterogeneous networks. By adding more academic entities (e.g., authors, journals, articles, words), heterogeneous networks can better simulate the mutual engagement of various academic entities in the complex academic environment.

Therefore, future research on this topic would benefit from (1) constructing hybrid and heterogeneous scholarly networks and (2) evaluating different approaches on hybrid networks or heterogeneous scholarly networks through possible “golden standards” (such as award lists or expert judgments) in order to determine which approach can yield more precise clustering results and more useful information for scientific evaluations.

Cross-References

Centrality Measures

Components of the Network Around an Actor

Similarity Metrics on Social Networks

Social Interaction Analysis for Team Collaboration