Introduction

Research institutions are crucial to scientific innovation and the competiveness of nations (Abramo and D’Angelo 2014; An et al. 2014). Yan and Sugimoto (2011) stated that the research institution is “a stable and representative element of studying the production, diffusion and consumption of knowledge”. An analysis of the research profiles of important institutions not only help identify their weakness and find collaborators, but also provide information for making scientific policies at several levels (Hicks 2012). Institutions within the same field may focus on different research emphasis in terms of specialization and diversification (Glänzel et al. 2009). The prominent institutions usually stand leading positions in hot research topics and keep their competitive advantages (An et al. 2014). As a result, it is necessary to identify the research focus of different institutions and compare their specializations.

The Library and Information Science (LIS) research in China has developed rapidly in the past decade, which motivates researchers to map the intellective structures of LIS research in China based on the analysis of co-author (Yan et al. 2010), co-citation (Hu et al. 2011), and co-word (Zong et al. 2013; Hu et al. 2013). However, the deep analysis of research topics of LIS institutions in China is seldom studied (An et al. 2014). In China, the LIS research institutions are diverse in research functions and research backgrounds. For example, some institutions are established by government-leading with specific research functions, others belong to different schools of universities with diverse research backgrounds. Therefore, their research focuses are significant different from each other, which should not be ignored when studying the overall researches of LIS in China. The study of research profiles of LIS institutions in China can be beneficial to recognize the LIS research in China more clearly as well as improving research collaboration based on their research advantages.

In this paper, we analyze the publication keywords of institutions from a new perspective—instead of identifying research focus using hot topics; our approach investigates the research topics that can better represent the specialization of an institution, aiming to reveal the comparative advantage of different institutions. The goal of this paper is to (1) establish the quantitative method of identifying institution-specific topics of the main LIS institutions in China; (2) reveal the research focus and comparative advantages of these institutions based on their institution-specific topics. The key innovation of our method is to calculate the Activity Index of an institution for particular keywords, so that we can select their institution-specific topics.

Related work

Bibliometrics analysis of research institutions

Nowadays, the academic ranking has become a very popular way of institution assessment, especially for universities (STolZ et al. 2010; Aguillo et al. 2010; Hazelkorn 2014). Although the peer review is believed to be the best way to evaluate research output (Bornmann 2011), its major limits are the costs and time (Abramo et al. 2013). Therefore, the bibliometric indicators were widely utilized to capture the quantity and quality of academic research output. Related indicators are mainly focus on macro analysis of research activity and performance of institutions based on their publication and citation data (Bornmann et al. 2012; Waltman et al. 2012). For example, the most common method is to use the number of publications as the basic indicator of research activity, the citation impact is used as a basic indicator of research performance (Moed et al. 2011), and there are many improved approaches based on these basic indicators. Zhu et al. (2014) incorporated the international collaboration as a new dimension besides the quantity and quality dimensions when evaluating the research institutions of China. Recently, Bornmann et al. (2014) has introduced a web application to visualize institution performance within specific subject areas based on the basic indicators we mentioned above.

As Thijs and Glänzel (2008) pointed out that “the usage of bibliometrics indicators cannot disguise that comparing institution remains often like comparing apples with pears”. Institutions usually have different subject profiles in the context of specialization and diversification, so it is necessary to utilize more sophisticated techniques in the performance evaluation at the institutional level (Glänzel et al. 2009). Leta et al. (2006) developed a method to classify and map the European and Brazilian institutional landscaped on the basis of their research profiles. Miguel et al. (2008) used 53 subject categories to describe the disciplinary profile of an institution so as to reveal its intellectual structure and main research fronts. Thijs and Glänzel (2010) constructed the field vectors based on the field assignment of journals to represent the research profile of institutions; they also believed that even the groups within the same field differ in particular profile (Thijs and Glänzel 2008). So it’s necessary to distinct their difference at a more subtle level. Belter and Sen (2014) recommend topics as one metric for the evaluation of NOAA R&D. An et al. (2014) find that early researches at the institution level in a certain research field lack of the details of the specific research topics, so they defined the salient research topics as those high-frequency controlled terms and then compared the main LIS institutions in America and China.

Identifying research topic using keywords

In bibliometrics, researchers believed that publication keywords are the basic knowledge elements which can represent the publication’s key concepts (Ding et al. 2001; Yi and Choi 2012). Many researchers have utilized the keywords to reveal the intelligent structure of research entities in the form of co-word clustering (Rip and Courtial 1984; Callon et al. 1991) or keyword network (Choi et al. 2011; Assefa and Rorissa 2013).When identifying the research topics of research groups, the high-frequency keywords assigned by more authors or indexers in publications are usually considered as more important. For example, Zhao and Wang (2011) used keywords with frequency above 60 times to analyze the research foci of pervasive and ubiquitous computing. Niu et al. (2014) used the top 30 high-frequency keywords to find significant differences between geosciences, multidisciplinary and environmental sciences. In some other databases, high frequency keywords are used in the analysis of hotspots and developing trends of research fields (Su et al. 2014).

However, the use of high-frequency keywords as research topics has long been questioned. Quoniam et al. (1998) argued that classical bibliometric techniques ignored the innovative aspects which are often described by the long-tail keywords; Milojević et al. (2011) pointed out that some high-frequency keywords are non-specific words and therefore are of limited values in the bibliometric analysis; Liu and Ma (2013) argued that the high-frequency keywords cannot be used for distinguishing knowledge content. In the analysis of research institutions, Moed et al. (2011) found that “an institution may show internally a low publication activity in a field compare to its output in other filed, but externally, compare to other institutions in the same field, be among the most productive ones.” Normally, a research institution may have special preference on some topics due to their development history and research orientation (Huang et al. 2006). We regard such topics as “institution-specific” topics, and the identification of them can help us recognizing the strategic positioning of a research institution as well as its comparative advantage.

Methodology

The Keyword Activity Index

The institution-specific topics are usually more concentrated on publications of the given institution and can reflect its research focus. To compute these topics, we borrow the idea of the Activity Index from Frame (1977). The Activity Index comes from the Revealed Comparative Advantage Index in economics (Balassa 1965). It measures whether a country has alternatively comparative advantage in a particular research field. The definition of it is:

$${\text{AI}} = \frac{\text{the share of the given country in publications in the given filed}}{\text{the share of the given country in publications in all science fileds}}$$
(1)

In formula (1), AI > 1 indicates that the country emphasizes the given science field as compared to its overall research production while AI < 1 indicates that it has a relative poor research on the field as compared to its overall research production.

AI has been widely used to represent the research profiles of countries/regions. Thijs and Glänzel (2008) used AI to describe the national profile of 8 research fields in European countries. López-Illescas et al. (2011) used AI as an indicator to investigate the university’s research performance with considerations on discipline specializations. Pouris and Ho (2014) used AI to identify the emphasized and underemphasized research fields of Africa. Harzing and Giroud (2014) used AI to discover the competitive advantages of nations for different academic disciplines.

The AI was often used at the macro level (to reveal the research field emphasis of countries or regions); the similar idea can also be applied at the micro level to measure whether or not a research institution emphasizes on a given research topic. As the research topics are usually represented by the publication keywords, we can extend the AI to the Keyword Activity Index (KAI):

$${\text{KAI}} = \frac{\text{the share of the institution in publications containing the given keyword}}{\text{the share of the institution in all publications}}$$
(2)

The idea of KAI resembles to the Revealed Comparative Advantage Index in economics. We can think of an institution’s research on a given topic as a country’s export of a given product. In formula (2), the denominator is similar to a country’s share of world exports; the numerator is similar to the country’s share of world exports on a given product. Accordingly, the KAI can help us to measure whether an institution has comparative advantage in studying a given topic. KAI > 1 indicates that the topic is emphasized in the institution above its average level; KAI < 1 indicates that the topic is underemphasized in the institution.

In bibliometrics, an institution’s share of publications can be calculated as dividing its publication number by the total publication number of all institutions; its share of publications on a given topic can be calculated as dividing the number of its publications containing the given keyword by the number of all publications containing the given keyword. Let n (i, j) denotes the number of publications containing the keyword K (i) in the given institution I (j); n (i, all) denotes the total number of publications containing the keyword K (i) in all institutions; n (all, j) denotes the number of publications contributed by the given institution I (j); n (all, all) denotes the total number of publications of the given research field. Accordingly, formula (2) can be rewritten as:

$${\text{KAI}} = \frac{{n(i,j)/n(i,{\text{all}})}}{{n({\text{all}}, j)/n({\text{all}},{\text{all}})}}$$
(3)

Procedures

The procedures of identifying the research focus of institutions based on their institution-specific keywords can be demonstrated as Fig. 1.

Fig. 1
figure 1

Procedures of mapping research focus of institutions based on their specific keywords

  • Step 1: Constructing the background corpus and institutional corpuses based on the publications collected from an academic database. The background corpus consists of all publications from the core LIS journals in China. The number of publications for each given keyword is calculated and then listed in <keyword, frequency> format, so that we can get the value of n (i, all) from the background corpus when calculating the KAI values according to Eq. (3). Each institution corpus consists of all publications contributed by the given institution in the background corpus; the keywords are calculated using the same method as in the background corpus, so that we can get the value of n (i, j) in Eq. (3) from the related institution corpus.

  • Step 2: Constructing the keyword frequency matrix of institutions, in which the rows, columns, and cells indicate institutions, keywords, and the frequency of keywords (which equals the number of publications containing the keywords) within the institution’s publications.

  • Step 3: For each institution, we compute the KAI of each keyword for a given institution corpus according to Eq. (3), and then we select the keywords with the highest KAI values as its institution-specific keywords. Note that, there exist some keywords with very low frequency for a given institute but also have extremely high KAI values because they were rarely assigned by authors in other institutions. We believe that they are not good representations of the institution’s research focus. Therefore, we select keywords with high KAI values from each institution’s high-frequency keywords as its institution-specific keywords. To be specific, in our data corpus, the high-frequency keywords are defined as the top 200 keywords of each institution, which neither eliminate too many hot topics for the relatively large institutions nor retain too many unimportant topics for the relatively small institutions. The KAI threshold is set to 1.5 because the keywords with KAI values between 1 and 1.5 are mostly words with general meanings (such as “information”, “library”) which cannot be used for better distinguishing different institutions.

  • Step 4: Clustering the institution-specific keywords of each institution based on the co-word clustering method and identifying the comparative advantage research focus of each institution based on the clusters.

Data collection and process

Constructing the background corpus of LIS research in China

The Chinese Journal Full-Text Database (CJFTD) was used as the data source. Like other researches (Hu et al. 2013), we use the published papers from all core journals of LIS in China to resemble the research field of LIS, we collected the related papers by setting: (1) the category of data source as “core journals” in “Library and Information Science” which contains all 19 core LIS journals in China; and (2) the time span from 2000 to 2013. In total, we retrieved 65,653 publications. Note that, one of the core journals, Journal of the China Society for Scientific and Technical Information, has not been indexed by CJFTD since 2003. So we download the journal publications during 2003–2013 from the Wanfang Data (http://librarian.wanfangdata.com.cn/) in China. The data is easily merged using the NoteExpress format, which are available in both two databases. We extract all author-provided keywords of those publications as the representation of research topics, because these keywords are believed to be carefully selected to identify distinctive research focus of scientific papers (Abrahamson 1996; McCloskey 1998), and they are easy to attain in our database.

Next, we have to manually remove the keywords which are mistakenly assigned by indexers (Law and Whittaker 1992), and eliminate the domain stop words (He 1999) such as “research”, “counter measure”, “problem” and so on. Since different authors may use various keywords when describing the same concept, we normalize keywords with the same meaning into a standard form, for example, “electronic library” is normalized into “digital library”. As there are so many keywords, we have only scanned keywords with frequency higher than 1 (one is excluded). After the data preprocessing, 67,786 keywords are kept, and their total frequency is 277,721. The background corpus of LIS in China is constructed based on these keywords.

Identifying the top institutions and constructing institutional corpuses

As Van Raan (2005) pointed out, identifying the attribution of publications, i.e., find the specific organization of a publication from the provided affiliation information is an extremely important and challenging technical problem in bibliometric analysis. Taking into account the problems that Van Raan mentioned as well as the naming conventions of Chinese organizations, we firstly select those institution addresses that have at least 200 papers and extract their main organization names. After that, the main organization names are searched in the background corpus one by one, so that we can manually normalize the results. An example is demonstrated in Fig. 2, in which the Wuhan University Library will not be merged because it is an independent research institute with the School of Information Management.

Fig. 2
figure 2

An example of institution name normalization

After the normalization of the major research institutions, we select several top institutions as our analysis target. The distribution of publication number over number of institutions is shown in Fig. 3. As we can see, there is a quantity gap between the 9th most prolific institution and the 8th one. In addition, the top 8 institutions have been widely studied in many bibliometric researches and universally recognized as the most important LIS research institutions in China. So the top 8 most prolific institutions are chosen for our analysis. Their institutional corpuses are constructed in the same way as the background corpus. The basic information of all corpuses is shown in Table 1.

Fig. 3
figure 3

The distribution of papers over institutions

Table 1 Basic information of eight institutional corpuses and the background corpus of LIS in China

Clustering the institution-specific keywords

According to the discussion in the step 3 of “Procedures” section, the KAI value of top 200 high-frequency keywords in each institution corpus are calculated based on Eq. (3). The keywords with KAI > 1.5 in each institution are selected as their specific keywords. Note that author keywords are unsystematic and some of them are idiosyncratic. The effects are more noticeable when identifying the institution-specific keywords. Since the core LIS journals in China have not yet provided options for the author to select keywords from controlled thesaurus provided by the manuscript submission system, we mapped these specific keywords into established thesaurus. The online “Library, Information Science & Technology Thesaurus” in the EBSCO HOST Database and the printed “Great Dictionary of Library and Information Science” is used as references.

For better analysis and comparison, we utilized the co-word analysis method to cluster keywords based on their co-occurrence in the publications of each institution. Firstly, the co-word frequency of each two institution-specific keywords is counted as the number of publications containing both of them. Secondly, the symmetric co-word matrixes of each institution are built, in which the rows and columns represent the specific keywords of the institution, and the cells is set as the co-word frequency of keyword pairs. Then those original matrixes are transformed into Pearson’s correlation matrixes to indicate the similarity and dissimilarity of each keyword pair (Van Eck and Waltman 2009). Thirdly, the co-word matrixes are imported into the SPSS 19.0, in which the cluster analysis is conducted using the hierarchical clustering with Ward’s method, and the distance measure is set as “Square Euclidean distance”. Note that there are few high-frequency keywords in the institution-specific keywords, so the relationships among them are much looser, leading to sparser co-word matrixes. To attain a better clustering result, we invited 3 experts in LIS to adjust the co-word clusters results and assign the subject label of each cluster.

Result analysis

The clustering results of institution-specific keywords of the top 8 LIS institutions in China are listed in Table 2. A keyword may belong to different cluster subjects in different institutions, because different institutions may focus on different aspects of the same topic. From Table 2, we can see that, on the one hand, different institutions have their own focus on research topics; on the other hand, some institutions share the same research focus, but we are still able to find their differences in detail by comparing their institution-specific keywords.

Table 2 Clusters of institution-specific keywords of the top eight LIS institutions in China

For a better result visualization, we displayed the hierarchical structure of clusters and the relationship between clusters and institutions in the EXTRAVIS software (Holten et al. 2007) (see Fig. 4). In Fig. 4, the clusters are organized hierarchically according to Table 2, for example, both “S&T evaluation” and “Humanity & Social Science evaluation” are two sub-clusters of “research evaluation”; the link between an institution and a cluster indicates that the institution focus on the research topics in the cluster.

Fig. 4
figure 4

A visualization of relationships between clusters and institutions

In the following sections, we will treat the cluster which belongs to only one institution as its “unique advantage subject”; and other clusters as its “advantage subjects”. As Huang et al. (2006) pointed out, institutions have different focus on research topics mainly because of their development history and research orientation. When analyzing the result in Table 2, we are looking for the reasons of why these institutions focus on those topic clusters from their research background and organizational function.

  1. 1.

    School of Information Management, Wuhan University

This institute dates back to the Boone Library School in the early twentieth century, which is first modern Library school in China. Since then, Wuhan University has always stood in the first place of the evaluations of LIS research and education in China. Nowadays, it is the biggest and the most comprehensive LIS School in China with 5 majors including the Library Science, Information Management and System, Archives, Editing and Publishing Science, E-Commerce.

As we can see in Table 2, Wuhan University has unique advantages in public information resource management and publishing. The advantages of public information resource management, particularly the development and utilization of government information, may originate from the integration of its traditional advantages in information sciences and archives. The comparative advantages of publishing may come from the comprehensiveness of its major coverage (five majors). Besides, informetrics, information economics, user and service, and LIS education are also the advantage subjects of Wuhan University. Based on Table 2, we find that (1) comparing to Nanjing University, the informetrics research of Wuhan University focus more on the research contents (such as content analysis and co-word analysis) and information visualization (such as information visualization and Multidimensional Scaling); (2) comparing to Sun Yat-sen University, the research of user and service focuses more on the user interaction and user experience; and (3) it has gotten superiority on the cloud computing service research, which is an emerging topic in recently.

Besides, as a leader of LIS research in China, Wuhan University has also invested its research energy on LIS education and gained corresponding advantage on this subject. However, we also find in the past decade, Wuhan University seems to have given up their longstanding research advantages on traditional library science.

  1. 2.

    The National Science Library of China

NSLC is also known as the Documentation and Information Centre of the Chinese Academy of Sciences, which is the highest academic institution of science in China. With its dual role as a service-focused library as well as a research-focused institution, NSLC has contributed far more research outputs than other libraries in China, confirming its leading role in LIS research. The functions of NSLC are described on its homepage (National Science Library 2013) as:

NSLC functions as the national reserve library for information resources in natural sciences, inter-disciplinary fields and high-tech fields. It also provides services in information analysis, research information management, digital library development, scientific publishing, and promotion of sciences. NSLC is actively participating and leading national efforts to build a powerful National Scientific Information Infrastructure.

The institution-specific keywords of NSLC in Table 2 show that, it has unique advantages in scientific research oriented service and open access, which correspond to its research functions. As “the national server library for information resources”, its research emphasizes topics about scientific information management; as an institution which is “leading national efforts to build a powerful National Scientific Information Infrastructure”, its research emphasizes topics about open access in the last decade. Beside of these two unique research advantages, NSLC also has unique research advantage in Classification. Actually, NSLC has compiled the “Classification for Chinese Academy of Sciences Library”, which has been widely used in scientific institutions and university libraries in China. Other research advantages of NSLC include the knowledge organization technology, and the competition intelligence. Its research on knowledge organization technology is more related to scientific research serving comparing with ISTIC, such as Mashups, linked data, and metadata. Its research on competition intelligence focuses more on strategic intelligence than Nanjing University and ISTIC.

  1. 3.

    School of Information Management, Nanjing University

Nanjing University is well known in China as the founder and manager of the Chinese Social Science Citation Index (CSSCI) Database, which is the most widely used database for academic evaluation of Humanity and Social Science in China (Su et al. 2014). As we can see from the institution-specific keywords of Nanjing University in Table 2, it has paid high attention to the research evaluation of Humanity and Social Science based on the CSSCI, such as academic evaluation, journal evaluation and evaluation system. Another unique research advantage of Nanjing University is the social informatization, in which researchers there have conducted a series of studies including the social informatization measurement system and the investigation of digital divide in China. Besides, competition intelligence is an advantage subject of Nanjing University. Comparing with the application-oriented research of NSLC and ISTIC, its research in competition intelligence focus more on basic theory, in which the counterintelligence research is a highlight. Another research advantage of Nanjing University is the informetrics, in which it focuses more on the relation mining (such as co-citation analysis and link analysis) comparing with Wuhan University.

  1. 4.

    Department of Information Management, Peking University

As one of the most famous university in China, Peking University is renowned for its humanistic spirit, which is also reflected in its LIS research. Researchers in this institution have paid high attention to the public library, library policy, and the information industry, showing their keen interest in social utilities. Other research advantages of Peking University include the basic theory of LIS, information retrieval, and LIS education; all of them are traditional subjects of LIS. The research focus of Peking University shows that it still insists on the traditional LIS researches and pays high attention to the social responsibility of Library, which is rare because too much emphasis has been placed on technologies of LIS in China nowadays.

  1. 5.

    Department of Information Management, Jilin University

Jilin University is very special in the top LIS institutions in China. The Department of Information Management in Jilin University is set up in the School of Management, which is originated from the old engineering economics specialty, while the LIS schools or departments in other 5 universities are all independent. Therefore, it seems obvious that the research focus of Jilin University in LIS would be influenced by its unusual research background.

As we can see from Table 2, the institution-specific keywords of Jilin University show very different research preferences with other institutions. It has unique advantages in enterprise information management and knowledge management, making it the only one focusing on the LIS research from the perspective of enterprise in the top LIS institutions in China. Obviously, this could be contributed to its special research background. Other research advantages of it include the knowledge organization technology and the information ecology. Comparing with NSLC and ISTIC, its research in knowledge organization technology focus more on the technologies used in digital library, such as formal concept analysis and semantic grid. Its research in information ecology is not as mature as Central China Normal University.

  1. 6.

    School of Information Management, Sun Yat-sen University

Sun Yat-sen University has unique advantages in web analysis, in which the online public opinion is a merging topic recently in China. Other research focuses of it include the public library, user and service, and the LIS education. Its research on public library is very similar with Peking University. Its research on user and service focuses more on the service usability and user satisfaction comparing with Wuhan University.

  1. 7.

    School of Information Management, Central China Normal University

Central China Normal University has a unique research advantages in the topic of knowledge community, despite of its small output scale. Other research advantages of it include the information economics and information ecology, which overlap with Wuhan University and Jilin University. Its research in information ecology has a certain scale and has more obvious advantages than Jilin University. Notice that, there are many scattered research focus of CCNU, showing their development potentials.

  1. 8.

    Institute of Scientific and Technical Information of China

The ISTIC is a national research and service institute subordinated to the Ministry of Science and Technology of China. This background endows it with a special research orientation. The functions of ISTIC are described on its homepage (ISTIC 2013) as:

ISTIC is designed to provide decision making support to the government agencies that take care of S&T activities in the country, in addition to its mandate of providing comprehensive information services to industry, universities, research institutes, and research personnel. It functions as a major pillar in the national science and technology innovation system, providing guidance to S&T activities and staging demonstrations for the same purpose.

According to its special responsibilities, it has many unique advantages despite of its small output scale. As we can see in Table 2, it has unique advantage in the evaluation of S&T in China, which is directly related to its function of “providing decision making support to the government agencies that take care of S&T activities in the country”. It has established the Chinese Science and Technology Paper Citation Database (CSTPCD), which is the most authority database for S&T evaluation in China. Besides, from its research advantages in language process, competitive intelligence, and knowledge organization technology, we can see its effort on S&T information analysis for providing “comprehensive information services”. Its unique research advantage in information retrieval, especially in machine translation is a natural result of tracing foreign S&T development and providing related information services nationwide. Its competitive intelligence focuses more on the industry comparing with Nanjing University and NSLC. Another unique research advantage of ISTIC is the thesaurus. Actually, the Chinese Library Classification and the Chinese thesaurus are both constructed by ISTIC. So there is no doubt that ISTIC has put high attention on the word analyst and thesaurus construction.

Conclusion and discussions

In this paper, we use the Keyword Activity Index (learnt from the Activity Index) to identify which topics are emphasized by a given institution in the context of LIS research in China. We chose the top 8 most prolific LIS research institutions in China for analysis. The corresponding keywords with the highest KAI values are selected from the top 200 high-frequency keywords of each institution as their institution-specific keywords. With the aim of investigating the research focus of these institutions, we cluster the institution-specific keywords of each institution as a representation of its research advantages, because the institution is more productive on those research topics as compared to its overall productions.

Our research, based on a large-scale dataset with 65,653 publications, finds that: (1) the KAI indeed helps to better position the research advantages of each LIS research institute in China; and (2) the KAI reveals that the top 8 LIS research institutions in China have their own research advantages, some of them are unique advantages which have been emphasized by only one institution, and others are shared by several institutions, but some differences in the same research subject of different institutions can be distinguished according to the institution-specific keywords. In the analysis part, we have discovered strongly relation between the research advantage of institutions with their research function and research background.

Keywords of publications have been widely used as proxy of research topics in many bibliometric researches. Some researchers prefer to use high-frequency keywords in a research group’s publications as its topical focus. However, some high-frequency keywords are general concepts shared by many other researchers. Thus, it may be inefficient to uncover the topical specialization of a research group with its high-frequency keywords, especially when we want to identify the diversity of research topics between different research groups. In this paper, we proposed a method to select institution-specific keywords from a more holistic perspective, that is, to consider the frequency of keywords both in the institution and in other similar institutions. A background corpus representing the whole LIS field in China is constructed to help us identifying whether a keyword is specific in the few institution or commonly shared by many institutions. As we can see, the institution-specific keywords in Table 2 are diverse between different institutions. Thus, we can use less keyword to reveal the research advantages of institutions, and the result fits well with the research function and research background of those institutions.

Our method can also be applied to other researches in many ways. Firstly, keywords assigned by indexers or extracted automatically can also be measured with the KAI using the same formulas, so as to get better representation of research topics of institutions. Secondly, our method can also be able to generalize to discover the research focus of other research entities (authors, journals, countries, etc.), research domains, time spans and so on. The research focus of them can be identified by extracting their specific keywords which are more concentrated in their publications than in all publications.

There are some limitations of our method. The first problem is the source of keywords. In this paper we use the author keywords because the LIS journals and database in China do not provide options for the authors to select keywords from controlled thesaurus. As a matter of fact, the author-assigned keywords are relatively subjective and some idiosyncratic keywords may be used; the effects are more noticeable when identifying the institution-specific keywords. It cost us a lot of time and energy to clean the keywords and match keywords with the established thesaurus. Therefore, we would like to suggest the follow researchers using controlled terms as the analysis objects if possible. Another problem is how to set the criteria to select keywords for later analysis, which is also a common problem in keyword bibliometrics. In this paper the criteria is set manually taking account to the real data. We believe that there is still room for improvement and we are looking for a more quantitative method in the following study.