Introduction

In recent years, studies in Library and Information Science (LIS) in China have received much attention in theoretical research and practical applications; and a lot of achievements have been scored. It is necessary for us to grasp the current status of LIS in China and its development trends.

At present, research advances of LIS in China have been conducted by many researchers (e.g., Qiu et al. 2009; Zhang et al. 2011; Li 2011; Sun and Zhang 2011). Especially, there are numerous quantitative studies of this field based on keywords analysis. For example, Xiao et al. (2009) investigated keywords collected from nine core LIS journals, and found some research hotspots based on keyword frequency, including digital library, information retrieval, information service, and information resources. Zheng (2010) conducted a statistical analysis of keywords extracted from the titles of journal articles, and found that there were four major research hotspots in LIS: basic theory, ontology and digital library, management and service, and technology and application. Li (2011) revealed the research hotspots and trends of LIS in China using statistical analysis of high-frequency keywords. This article drew the conclusion that many research hotspots were not isolated but correlated, such as library 2.0 and digital library, ontology and information retrieval, and information resources and information service. Yang (2012a) collected relevant data from CJFTD and reviewed the status of LIS in China. The results of this clustering analysis identified the major research contents, including information service and sharing, library service based on information technology, and competitive intelligence. And, more remarkable, Wang (2011) analyzed the high-frequency keywords extracted from eight core LIS journals in China utilizing co-word analysis. By the clustering analysis of co-word matrix, this article concluded that there were nine hot research topics; and according to the number of links of each topics cluster, it described the development status of each research topic. These results indicated that the basic research areas and important research hotspots of LIS in China were remaining stable, such as information resources construction, knowledge management and information retrieval. Library service and information service under network environment would become new research hotspots. Especially, digital library would attract more and more attention in the near future. Yang (2012b) studied the current status of LIS in China from the perspectives of topics distribution and structure of co-word network. This article drew conclusions that there were 15 research topics in LIS, and the research was still dispersing on the whole. University library, digital library, knowledge management and information service was central in the field considering the overall characteristics of co-word network.

It can be seen that many studies on research advances of LIS in China have been achieved, and some important research sub-areas or branches of LIS in China have been continuing to mature, such as information service, information retrieval, information resources, digital library, and competitive intelligence. Meanwhile, the application of information technology has become a new research hotspot of LIS in China, for example, ontology, data mining, semantic web, which promotes the development of LIS in China. However, the above works only revealed the basic properties of LIS based on word frequency statistics, and the relationship among research topics was not revealed clearly, such as their location in the whole LIS in China, the internal correlation of each research topic and the external correlation among them.

Therefore, based on the above reviews, this article is intended to reveal the whole structure and correlation among topics of LIS in China with the aid of co-word analysis. First, the research hotspots (core keywords) would be discerned based on the co-word data; second, the internal and external correlations of research topics would be described, including the topics clusters and their intuitive visualization from a two-dimensional map; third, the research status and trends from the aspects of correlation structure would be calculated and shown by a strategic diagram. On the basis, this article would better reveal the research advances of LIS in China, and the results would be more accurate which could help us grasp the current status and trends.

Methodology

Taking co-word analysis as methodology, our study first extracts keywords of LIS journal articles from 2008 to 2012. A co-word matrix is then generated based on the co-occurrence number of the keywords. We use multivariate statistical analysis and social network analysis to reveal the research status and trends of LIS in China. Finally, we conclude with our interpretations of the data.

Co-word analysis

Co-word analysis is similar to co-citation or co-occurrence analysis (Small 1973; Small and Griffith 1974); and it has been accepted as a reasonable way to map the relationships among concepts, ideas, and problems (Callon et al. 1983, 1991). Co-word analysis has been utilized to reveal research advances in many fields (e.g., Coulter et al. 1998; Ding et al. 2001; Wang et al. 2011; Liu et al. 2012).

In co-word analysis, it is assumed that keywords extracted from papers could represent a specific research direction, research topic or subject of a field. If two keywords co-occur within one paper, the two research topics they represent are related. Higher co-word frequency means stronger correlation in keywords pairs, which can further suggest that two keywords are related to a specific research topic (Cambrosio et al. 1993).

Co-word analysis has the potential of effectively revealing patterns and trends in a specific discipline (Ding et al. 2001). In this paper, we conduct a co-word analysis utilizing the methods of multivariate statistics and social network in order to indentify the intellectual structure and development trends of LIS in China.

Data collection and pre-processing

Chinese Journal Full-Text Database (CJFTD) is the largest Chinese journal full-text database. We use it as the data source in this study because it consists of 18 core LIS journals in China (as listed in CSSCI and shown in Table 1). After collecting articles from CJFTD published between 2008 and 2012, 24,713 articles were identified. Through manually filtering duplicated and irrelevant articles, we finally obtained 21,593 articles and 80,431 keywords (3.72 per article).

Table 1 18 core LIS journals in China

Subsequently, a symmetric co-word matrix is generated by counting the co-occurrence of two keywords. The data in diagonal cells is treated as missing data and the values of non-diagonal cells are co-word frequencies. In order to obtain better results, a Pearson’s correlation matrix, indicating the correlation degree of each keywords pair, is achieved from the original matrix. The correlation matrix is the basis for further analysis.

Method of data analysis

Compared with other approaches, the multivariate statistical analysis and social network analysis are better for conducting co-word analysis (Ding et al. 2001). In multivariate statistical analysis, clustering and multidimensional scaling (MDS) are commonly used. With the aid of SPSS19.0, clustering analysis can directly demonstrate clusters of keywords and the relations among topics. Based on co-word data, the importance and status of each topics cluster in China can then be obtained. The map generated by MDS can indicate the correlation degree, research emphasis and directions through the location of research topics.

Furthermore, we analyze the network characteristics of the co-word matrix using Ucinet6.0, including centrality, density, and the core-periphery structure (Lee 2008). In a network, if the node has a large amount of relations with others, it has a higher centrality and lies in an essential position in the network. Centrality is therefore used to measure the correlation degree among different topics. Similarly, a higher density means higher cohesiveness or equals the higher internal correlation degree among nodes. The density of a research field represents its capability to maintain and develop itself (Law et al. 1988). Therefore, a strategic diagram based on centrality and density of each topics cluster would be drawn in order to indicate the status and evolutionary trends of LIS in China.

As we know, the higher the centrality is, the more central the research topic is in the whole research field; and the higher the density is, the more mature or potential the research topic is. On the basis of this, in strategic diagram, x-axis stands for degree centrality and y-axis stands for density; the origin is the average or median of these two axes. Four different quadrants with different centrality and density display different status of research topics. In quadrant I, research topics have high centrality and density; they are mature and stand at the core of the field. In quadrant II, research topics are not central but are well-developed. In quadrant III, research topics are marginal and get little attention. In quadrant IV, research topics are central in the field but are undeveloped or immature (Callon et al. 1991).

Result and discussion

Frequency of keywords

In order to achieve more precise results, we standardize these keywords. First, we use “Chinese classified thesaurus” (2005) to standardize them (using their English translations). In this process, we also consider researchers’ advice. Second, we merge or alter terminology (e.g., “college library” is replaced by “university library”; “information resource organization” is replaced by “information organization”.), and filter the general terms (e.g., theories, construction, development, influence, applications, and competition) which are too broad to be of practical concern. Finally, 181 keywords with a frequency of more than 13 are chosen as the research sample for co-word analysis. We believe that these 181 keywords with a total frequency of 23,127 (about 29 % of the total) are able to represent the main contents of LIS research in China. Table 2 shows the top 50 keywords.

Table 2 The top 50 keywords

To a great extent, co-word data (co-word frequency and co-word correlation coefficient) shows the importance of keywords. The top ten keywords with high co-word frequency and co-word correlation coefficient are noted in Table 3. These research topics are the focus of LIS in China. Note that Information Service is a major focus of LIS and serves as an important bridge connecting Library Science and Information Science, which also indicates the presence of interdisciplinary studies between Library Science and Information Science.

Table 3 The top 10 keywords with high co-word data

Multivariate statistical analysis

In this study, hierarchical clustering is chosen with Ward’s method as the cluster method and Squared Euclidean distance as the distance measurement. Two different clustering results are achieved according to different clustering steps (one and five) in order to accurately interpret research topics and correlations of LIS in China.

If we set the clustering step to one, these keywords are divided into 13 clusters (Cluster1 to Cluster13). After much discussion, we find that this 13-cluster solution is a better fit for interpreting the current status of LIS research in China; and these 13 topics clusters could represent the current sub-areas of LIS in China more comprehensively.

Taking the frequency, co-word frequency and co-word correlation coefficient into consideration, the top five to ten keywords in each cluster are chosen to represent these 13 topics clusters because keywords with lower indexes attract little attention. The keywords of each cluster are shown in Table 4.

Table 4 13 topics clusters of the LIS field in China

In Table 4, keywords in each cluster reflect the corresponding research topics, as well as the research directions of LIS in China. For example, Cluster1 is related to the application of Ontology, and Semantic Web in Information Retrieval (e.g., Zhang et al. 2010); Cluster2 includes the research topics on Data Mining, Information Recommendation and Knowledge Discovery (e.g., Ding 2010; Wu 2010).

In order to achieve an in-depth understanding of each cluster, we calculate the total co-word frequency and correlation coefficient of each cluster, as well as their averages (shown in Table 5). In Table 5, the average data is treated as essential indexes to distinguish each topics cluster.

Table 5 The frequency and co-word data of each cluster
  1. 1)

    Cluster11 has the highest average frequency, which indicates that it gets much attention in China. A high co-word data level also indicates its overall importance. Due to more attention of researchers and high correlation with topics in other clusters, Cluster6, 7, and 8 have a higher average frequency and average co-word data, especially Cluster8, which has the highest average co-word data. Thus topics in Cluster6, 7, 8, and 11 are focuses and can also be treated as bridges within the whole research structure.

  2. 2)

    Every index in Cluster4 is low due to its receiving little research attention and having little connection with others. So we may conclude that topics in Cluster4 are relatively isolated and are not the focus (emerging or neglected) or in a marginal location in China at present. More discussions are provided in the next section.

  3. 3)

    Cluster12 and 13 have low average frequencies but relatively high average co-word correlation coefficients. This reflects that little attention is given to them, but they have a high correlation with other topics. Conversely, Cluster1 and 9 have a high average frequency but low average co-word correlation coefficients which indicates that more attention is given to them by researchers but they have less connection with others. This demonstrates that topics in these two clusters are well-developed but independent. In fact, studies related to Information Retrieval, Ontology, and Library Cause are becoming an organic whole, and their development momentum is therefore powerful.

If we set the clustering step to five, six clusters (Cluster1′ to 6′) are obtained. Table 6 shows the statistic data of these six clusters.

Table 6 The statistic data of each cluster according to the clustering step for 5
  1. 1)

    This clustering result demonstrates the relationships among clusters in Table 4. If the clusters in Table 4 are aggregated into a cluster in Table 6, research topics in each cluster will have a high correlation. It can be seen that there are currently six large research directions of LIS in China; and the detailed description is given in the following section.

  2. 2)

    Technology and application, including Information Retrieval, Recommendation, Ontology, and Data Mining, are major directions of LIS in China. The development of LIS with network environments, especially the Visualization and Information Policy, has received a lot of attention. Competitive Intelligence and Library Cause are two independent branches of LIS in China. Cluster′ and 6′ are two largest sub-areas of LIS in the country. Cluster4′ includes Information Sharing, Information Service, Information Resource, and Information Literacy. In Cluster6′, Information Organization, Digital Library, E-government, and Informatization are important at present.

  3. 3)

    According to the average data, the research topics in Cluster4′ (including Cluster6, 7 and 8) are important research sub-areas of LIS, in that they are definitely the most popular research topics in China. Conversely, the research topics in Cluster2′ (including Cluster3 and 4) are isolated and are given little attention; and this result corresponds to the analysis above. In addition, other topics clusters have moderate development in China.

In order to give an intuitive understanding of correlations among topics, we select the top two keywords in each cluster (a total of 26 keywords) to generate a two-dimensional map using MDS. This gives a better finding, where the stress value is 0.1239, and RSQ is 0.9287. In Fig. 1, research topics clustered into one are linked with lines.

Fig. 1
figure 1

A two-dimensional map of research topics of LIS in China

These 26 keywords are divided into seven clusters. The relationship and correlation degrees among topics clusters can be discerned through their location and distance, as seen in Fig. 1. Dimension 1 (from left to right) represents the internal correlation degree of each topics cluster. The research topics on the left have higher correlation; conversely, the research topics on the right have lower correlation. Dimension 2 (from bottom to top) represents the emphasis of research topics. Obviously, from bottom to top, the research emphasis is from Information Science to Library Science.

  1. 1)

    Research topics of Library Science are located on the top of Fig. 1, and topics of Information Science are on the lower part. It’s worth noting that the distance between these two large research areas is long, indicating a lack of LIS interdisciplinary studies in China. On the whole, the correlations among these topics clusters are not high, and the studies of LIS in China are scattered.

  2. 2)

    The majority of keywords belonging to two large topics clusters are concentrated in the third quadrant, demonstrating the LIS research emphases in China, including Information Service, Information Resource, Knowledge Management, Information Sharing, and Information Literacy. The short distance between these two clusters also demonstrates the high correlation among topics.

  3. 3)

    The other topics clusters scatter around the Fig. 1, and the distribution of research topics is dispersive. We can therefore conclude that LIS research in China is imbalanced and is mainly focused on Information Service, Knowledge Management, and related researches, where other research topics are unsystematic or immature.

Analysis of co-word network

In this study, the co-word correlation matrix is analyzed by Ucinet6.0 to obtain the centrality, density and a core-periphery matrix. Subsequently, according to the centrality and density of each cluster, a strategic diagram is drawn to clearly display the current status and trends of research topics. Furthermore, a relation network that visualizes the structure and relationship of keywords is generated by NetDraw.

Table 7 lists the top ten keywords with high degree centrality and betweenness centrality.

Table 7 The top 10 keywords with high centrality
  1. 1)

    Keywords with a high degree centrality are Library Service, Information Service, Search Engine, Reference Service and Cloud Computing, indicating that research topics represented by these keywords are at the core of LIS in China as a whole. Keywords with a high betweenness centrality are Information Architecture, Information Management, Tacit Knowledge, Information Behavior, and Knowledge Management, which play the role of bridges among research topics.

  2. 2)

    In core-periphery analysis, 63 keywords are identified as the core keywords from the whole structure. This basically represents the research focuses of LIS in China, which include Information Service, Knowledge Management, Information Sharing, Information Resource, Information Literacy, Librarian, Knowledge Service, Library Service, E-Government, and Library Management.

The density of the correlation network with all keywords is 0.152, which is a relatively low level and indicates that LIS in China is decentralized. In this study, a new calculation is conducted to obtain the centrality and density of each cluster, as shown in Table 8. A strategic diagram whose origin is (27.88, 0.22) (the average of centrality and density) is generated in Fig. 2.

Table 8 Density and centrality of each cluster
Fig. 2
figure 2

The strategic diagram of thirteen clusters

The strategic diagram reveals the status and trends of current researches of LIS in China by dividing these 13 clusters into four quadrants.

  1. 1)

    As shown in Fig. 2, quadrant I includes Cluster7 and 11. The density and centrality of these two clusters are both high. High density indicates that these clusters have high internal correlation, and the research topics in these clusters tend to be mature in China. High centrality indicates that these clusters are widely connected with other clusters. That is, research topics in Cluster7 and 11 are the cores of LIS in China. Note that the well-developed degree of Cluster11 is not very obvious. By contrast, Cluster6 and 8 are close to quadrant I, indicating that they have a better propensity to be well-developed and become the core of the fields.

  2. 2)

    Clusters in quadrant II (Cluster1 and 9) have a close internal connection. That is, many researchers in China have paid attention to them, and these topics are well-developed and mature. However, their low centrality, meaning little connection with other clusters, reflects that research topics in these two clusters are isolated in China. In general, Cluster1 and 9 are more independent and mature research areas.

  3. 3)

    Many clusters are located in quadrant III, including Cluster2, 3, 4, 5, and 10. Their low density and centrality reveal that the research topics of these clusters are marginal and immature in China. This result, corresponding to Table 5, is the further evidence of the results noted above. That is to say, there are many marginal and immature research topics of LIS in China at present.

  4. 4)

    Cluster6, 8, 12, and 13 located in quadrant IV have high centrality and low density. This phenomenon illustrates that the research topics in these clusters are at the core of LIS field in China, but are not mature. Thus topics in these clusters, including Information Sharing, Knowledge Management, and Information Ecology, will become new research trends that need more in-depth study. In particular, research topics in Cluster6 and 8 located close to quadrant I have a great potential for development.

In order to visualize the entire structure of these keywords or research topics of LIS in China, we draw the correlation network chart, as shown in Fig. 3. The relative size of nodes represents the keywords’ frequency, and the relative size of lines represents the correlation degree between keywords. In Fig. 3, keywords belonging to the same research topic are aggregated together, which form a few big clusters and many scattered clusters. And thus this result coincides with the notations above.

Fig. 3
figure 3

The structure of keywords in 2008–2012 (the Pearson coefficient >0.6)

Conclusion

In this study, we conduct co-word analysis using SPSS19.0 and Ucinet6.0 in order to obtain a clear understanding on the development of LIS in China in the last 5 years. Through providing some clear and reasonable results, this study identifies the major research focuses, the correlation among research topics, and the current status and trends in the field.

  1. 1)

    The core keywords include Information Service, Knowledge Management, Knowledge Service, Information Resource, Digital Reference Service, Digital Library, Library Management, Social Network, Information Literacy, and Intellectual Property, which are discerned according to frequency, co-word data and correlation network data.

  2. 2)

    We identify 13 topics clusters of LIS in China, where each cluster represents a research direction of LIS. Based on correlations, these 13 topics clusters are aggregated into six large branches of LIS. Major and developed research topics of LIS have formed in China, but the majority of topics are marginal or immature. That is to say, on the whole, the research development of LIS in China is imbalanced; and the research topics of LIS in China are relatively decentralized considering their low correlation among topics.

  3. 3)

    The research topics in Cluster 6, 7 and 8 are found to be at the core of LIS in China, in that they have been well-developed. Topics such as Information Service, Knowledge Management, and Information Sharing have a great potential for development. In particular, topics in Cluster 12 and 13 are the focuses but are undeveloped. In fact, studies such as E-Government, Information Ecology, Informatization, are emerging and will become new hotspots of LIS in China.

Above all, based on co-word analysis, this study reveals the research advances of LIS in China and offer valuable results. This study helps us grasp the current status and trends of LIS in China by explicit description and reasonable interpretation of results. In the future, we will seek to compare domestic and international LIS research, in hopes of finding gaps and disparities in research, thus strengthening LIS research in China and connecting it with international trends.