MACA: a modified author co-citation analysis method combined with general descriptive metadata of citations

Bu, Yi; Liu, Tian-yi; Huang, Win-bin

doi:10.1007/s11192-016-1959-5

MACA: a modified author co-citation analysis method combined with general descriptive metadata of citations

Published: 03 May 2016

Volume 108, pages 143–166, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Scientometrics Aims and scope Submit manuscript

MACA: a modified author co-citation analysis method combined with general descriptive metadata of citations

Download PDF

Yi Bu¹,
Tian-yi Liu¹ &
Win-bin Huang¹

1094 Accesses
23 Citations
Explore all metrics

Abstract

Author co-citation analysis (ACA) is a well-known and frequently-used method to exhibit the academic researchers and the professional field sketch according to co-citation relationships between authors in an article set. However, visualizing subtle examination is limited because only author co-citation information is required in ACA. The proposed method, called modified author co-citation analysis (MACA), exploits author co-citation relationship, citations published time, citations published carriers, and citations keywords, to construct MACA-based co-citation matrices. According to the results of our experiments: (1) MACA shows a good clustering result with more delicacy and more clearness; (2) more information involved in co-citation analysis performs good visual acuity; (3) in visualization of co-citation network produced by MACA, the points in different categories have far more distance, and the points indicating authors in the same category are closer together. As a result, the proposed MACA is found that more detailed and subtle information of a knowledge domain analyzed can be obtained, compared to ACA.

Using the appearance of citations in full text on author co-citation analysis

Article 27 April 2018

Paper Co-citation Analysis Using Semantic Similarity Measures

A new approach to journal co-citation matrix construction based on the number of co-cited articles in journals

Article 07 June 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Co-citation analysis (CA) is a significant branch of citation analysis in bibliometrics. It can be divided into at least three types according to the object of study: author co-citation analysis (ACA), document co-citation analysis (DCA), and journal co-citation analysis (JCA). H. D. White and B. C. Griffith brought ACA into Library and Information Science (LIS) in 1980s (White and Griffith 1981) in order to depict the intelligent domain of certain field(s). The main purpose of ACA is to map scientific domains from the perspective of co-cited authors by pointing out the co-citation relationships in which the object of study (i.e. the unit of analysis) is author rather than document or journal (Jeong et al. 2014). The basic assumptions of ACA can be summarized as: all cited articles play equal roles in co-citation analysis; the more two authors are co-cited, the stronger their relevance is. Moreover, four normal steps of ACA are listed as followings (McCain 1990; Eom 2008a): (1) selection of author set and retrieval of co-cited author counts; (2) forming the raw co-citation matrix; (3) transformation from the raw co-citation matrix to the correlation matrix; (4) multivariate analyses (e.g., cluster analysis, multi-dimensional scaling (MDS), factor analysis, etc.). The concepts and methods of ACA were applied frequently in other majors to exhibit scientific domains and academic researchers (Eom 1999; Tsay 2011). Recently, ACA has been further combined with content-based analysis (Jeong et al. 2014) and artificial intelligence technologies (An et al. 2011).

However, it is assumed that each citation in an article has equal contribution according to White and Griffith (1981). It could not reveal significance and relevance because the purpose of these citations could be different in citers’ perspective. For example, the article, named “PageRank for ranking authors in co-citation networks” (Ding et al. 2009), has two references, coauthorship-related one (Liu et al. 2007) and PageRank-related one (Bianchini et al. 2005) with the corresponding authors, Liu and Binanchini. In fact, the authors have different interest fields, Library and Information Science (LIS) and Computer Science (CS), though their studies are co-cited. The author, Dr. Binanchini, could appear in citation networks (graph) obtained in multivariate analyses while LIS is considered. This might cause an oversight to explore the potential authors in LIS if lots of such situations occurred. In other words, its performance has been accepted and tolerated despite the fact that ACA uses author co-citation relationships as its unique information to construct a knowledge domain. And the major purpose of this paper is to reduce the oversight by involving more general information in citations based on ACA. The information can be general descriptive metadata of a citation, such as published time, the publication itself, and keywords of a citation. Specifically, in time perspective, for example, small difference between two citations’ published time implies that the authors tend to focus on similar issues in the same period of time. The representation of authors’ relationship might be distinctive in knowledge graph because of various concepts, methods, or even diversified demands in different periods of time. Similar journals where two authors’ papers are published or similar keywords of citations they use, on the other hand, implies that they tend to research on similar issues.

As a result, the proposed method, called Modified Author Co-Citation Analysis (MACA), exploits four general descriptive metadata in citations, authors of a citation, the time when a citation is published, the carrier (i.e. journals, conferences, monographs, and even electronic sources, etc.) where a citation is published, and the keywords of a citation, to construct a citation network. Similar to ACA, the information of authors in citations (i.e. author co-citation count) is used to establish the co-citation relationships among authors. The import of published time information in citations to every co-cited author is produced to form the co-citation matrix from time perspective, called time-based parameter. The carrier information of citations is abstracted first and their professional fields belonged are developed according to the focused issues. The relationship of co-cited authors, called carrier-based parameter, is calculated depending on the similarity of professional fields of their articles. Similarly, the professional fields to which keywords of citations belong are obtained initially based on the meaning of keywords. Fields calculate the co-cited authors’ relation in keyword perspective, called keyword-based parameter by fields.

Related works are described in “Related works” section. The calculations and explanations of the proposed MACA are detailed in “Modified author co-citation analysis (MACA)” section. The dataset and pre-processing of our studies are expressed and the performance and analysis of the proposed MACA are demonstrated in “Experimental results and discussion” section. Finally, the conclusions are provided in “Conclusion” section.

Related works

ACA has been a hotspot in informetrics and scientometrics, which aims to instruct scientific research by looking for co-citation relationships between authors in academic articles set and mapping knowledge domains (McCain 1990). Much empirical research indicated that ACA is very effective and applicable in evaluating discipline development situations and identifying micro-structures of certain field and its sub-fields since it can reveal dynamic changes and future developments.

The major steps of ACA are shown in Fig. 1. An academic dataset is selected by using certain methods (e.g. selection of specific journal(s), snowballing, etc.) and the author’s name should be disambiguated in the first two steps. Author name disambiguation mainly bases on the authors’ affiliation, collaboration records, and research areas. Then the co-cited authors within a dataset are abstracted to construct a raw co-citation symmetric matrix based on their co-citation count regardless of whether the first-author or all-author information is counted. The raw co-citation matrix is transformed into a correlative co-citation matrix for normalization in the next step. Many correlation measurements (e.g. Pearson’s r, Jaccard, cosine, Euclidean distance, etc.) should be judged and selected in this step. The final series of data analysis methods (e.g., factor analysis, cluster analysis, network analysis, and multi-dimensional scaling) are used to produce a more accurate interpretation of the results. For example, when trying to cluster given authors, a hierarchical agglomerative or iterative partitioning method is adopted to analyze the correlating authors. Then professionals provide some explanations based on the results before peer reviewing.

Over 30 years, four major concerns of traditional ACA can be summarized as followings: (1) Data collection methods (White and McCain 1998; Cothill et al. 1989) and database selection (Zhao and Strotmann 2008); (2) Raw matrix formation and definition or modification of ACA; (3) Correlation matrix transformation and similarity measurement in ACA (Ahlgren et al. 2003; White 2003a; Bensman 2004; Leydesdorff and Vaughan 2006; Egghe 2009; Mêgnigbêto 2013); (4) Further analysis methods (e.g. factor analysis, multi-dimensional scaling, cluster analysis, network analysis, etc.) and visualization (White 2003b; An et al. 2011; Chen 1999; Moya-Anegón et al. 2007).

In the method of raw matrix formation and definition or modification of ACA, researchers focus on diagonal values in the raw co-citation matrix (White and McCain 1998; McCain 1991) and first- or all-author co-citation analysis (Persson 2001; Zhao and Logan 2002; Zhao 2006; Rousseau and Zuccala 2004; Zhao and Strotmann 2008; Schneider and Larsen 2009; Eom 2008b). The latter research has made traditional ACA more informative since more authors’ co-citation relationships were imported. However, these studies only focused on author-related information instead of other available metadata in citations. Moreover, some researchers studied on content-based ACA. Jeong et al. (2014), for example, tried to use the similarity of citance (i.e. citing sentences) to modify traditional ACA, the essence of which is to improve the step of the raw co-citation “count” calculation. The results showed that content-based ACA performed better than the previous methods. Nevertheless, content-based ACA requires full-text data in TXT or HTML format and more calculative complexity. Concerning these disadvantages, in this paper, we hope to modify the construction of raw co-citation matrix combined with other citation descriptive metadata (i.e., citations’ published time, citations’ published carrier, and citations’ keywords) in order to integrate more types of information and to improve the performance of ACA. This paper tries to modify traditional ACA by adding an “author-based parameter calculation” step (white block in Fig. 1).

Modified author co-citation analysis (MACA)

The framework of the proposed MACA, which analyzes the relationship of two authors by using general descriptive metadata of citations including the published time, keywords, and carrier, is shown in Fig. 2. Obviously, the major difference between ACA and MACA is the stage of constructing raw co-citation matrix. The authors’ names, published time, carriers and keywords of each citation should be abstracted in the first stage. The co-citation matrix of MACA is then constructed by four matrices, called author-based parameter, time-based parameter, carrier-based parameter, and keyword-based parameter, based on the four kinds of corresponding descriptive metadata, respectively. Note that in Fig. 2, the white blocks refer to new steps we introduce, while the green blocks mean traditional steps. The calculations of three different parameters and the co-citation matrix are detailed in the following.

Calculation of the time-based parameter between two authors

An academic article usually exposes the research interest, professional field, and specific contribution of an author. The published time of an article may also implicitly show the authors’ research period on this work. According to the observation of general academic research procedure, the researchers usually read literatures first and formulate their problem inside the studies, then looked for the current solutions or algorithms related to their problems. The researchers, especially in engineering field, cite recent studies for exploiting, modifying, or comparing. It simply implies that two authors’ works could be related, cooperated, or continued while the published time of their articles, especially co-cited by an article, is near.

Nevertheless, the purpose of the citations in an article, more often than not, could be different, and they might belong to different professional fields (Bu et al. 2015; Brooks 1985). For example, a mathematic theory proposed in a citation is cited for conducting an algorithm, and the method of another citation belonged to bibliometrics is cited for evaluating its results. The analytical result of author co-citation combined with the calculation of their published time could not be influenced while the analysis in a specific field is mainly considered. However, the authors belonging to different professional fields would actually be shown obviously in the knowledge graph. In other words, the relationship between authors of two citations within similar published time should be reflected on the knowledge graph if their studies are in the similar research field.

Three academic researchers within their professional fields are indexed and shown in Table 1. The histogram of the number of pairwise authors` are co-cited according to their time difference as demonstrated in Fig. 3 as well. The distributions of the pairwise authors, 1 and 2, 2 and 3, 1 and 3, are drawn as a solid line, placing a circular, triangular, and square markers at the data points, respectively. Obviously, a total of 36 articles are co-cited, and 72 % have less than 3-year difference. These articles, closed at the published time, have similar or related issues in network science after examining them artificially. The similarity is also revealed in the observation of other pairwise authors. Moreover, there are not many co-citations with more than a 5-year difference, and one of them could be a literature review or a classic study in a professional field. It implies that the interest field of the authors might be related in the same period while their articles having only a small difference in published time are co-cited. In other words, the authors having a number of co-citations with small differences in published time can have closer positions on the knowledge graph.

Table 1 Three authors and their area of interests

MACA: a modified author co-citation analysis method combined with general descriptive metadata of citations

Abstract

Similar content being viewed by others

Using the appearance of citations in full text on author co-citation analysis

Paper Co-citation Analysis Using Semantic Similarity Measures

A new approach to journal co-citation matrix construction based on the number of co-cited articles in journals

Explore related subjects

Introduction

Related works

Modified author co-citation analysis (MACA)

Calculation of the time-based parameter between two authors

Calculation of the carrier-based parameter between two authors

Calculation of the keyword-based parameter between two authors

Construction of the co-citation matrix based on three above parameters

Experimental results and discussion

Dataset and preprocessing

Indicating the affiliated professional field of keywords and information carriers

Multi-dimensional scaling (MDS)

MDS-measurement

Factor analysis

Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

JEL Classification

Search

Navigation