Introduction

Identifying most highlighting and emerging themes of any journal is indispensable for intellectual and policy reasons in science and social sciences. In the era of computer technology and automation, co-word analysis has become easier to extract knowledge from the text: research papers, conference papers, articles of the newspaper, and chapters of the book etc. Co-word analysis is one of the methods that use to identify themes and relationship among these. It is related to co-citation analysis (Small 1973; Small and Griffith 1974). It deals directly with set of terms shared by documents instead of shared citations. It counts and analyzes the co-occurrences of key-words in the publication on given subject. In addition, it has potential to describe interactions, which exist between different phases of innovation process and to show whether basic research or applied research is the moving force (Callon et al. 1991).

Co-word analysis is a technique that uses pattern of co-occurrence of words and phrases in a corpus. It establishes relation between idea and concept within the subject area, presented in corpus. Occurring of two keywords within the same paper indicates a relationship between the topics to which they refer (Cambrosio et al. 1993). The availability of many co-occurrences with a keywords or phrase indicates to central point which has many connections with other words in a corpus that may similar to a research theme. It identifies the co-occurrence strength of terms and creates a set of lexical graph that effectively illustrates the strongest association between various terms.

In this study an attempt has been made to trace the intellectual structure which developed under the umbrella of Scientometrics over the year 2005–2010. Scientometrics is an international journal, launched in 1978. The journal covers all aspects of scientometrics and published 46.31 % of scientometrics research paper of the world (Mooghali et al. 2011). The whole period (year 2005–2010) has been divided into two phases i.e. 2005–2007 and 2008–2010. Nine hundred and fifteen articles have been analyzed using co-word analysis, falling under the framed period. MDS has been used to plot and identify proximity between keywords. Salton index has been used to draw network and understand the relationship between keywords. Different characteristics of the same have been calculated using inner-link, inner-link key, outer-link, outer link-key, centrality and density and using these characteristics, developed and emerging themes been identified.

Literature review

Co-word analysis has been used by many researchers to explore conceptual network in different discipline such as Management information systems (Culnan 1986), Software engineering (Coulter et al. 1998), Polymer chemistry (Callon et al. 1991), Environmental acidification research (Law and Whittaker 1992; Law et al. 1988), Scientometrics (Courtial 1994), Neural network research (Van Raan and Tijssen 1993; Noyons and van Raan 1998), biological safety (Cambrosio et al. 1993), Patents (Courtial et al. 1993), Optomechatronics (Noyons and van Raan 1994), Bioelectronics (Hinze 1994), Medicine (Rikken et al. 1995), Biology (Rip and Courtial 1984; Looze and Lemarie 1997), Condensed matter physics (Bhattacharya and Basu 1998).

Currently, mature visualization skills of co-word analysis have been applied in many subjects and disciplines, such as Information retrieval (Ding et al. 2001), Nanotechnology (Kostoff et al. 2006), International scientific studies (Hou et al. 2006), Human genome (Musgrove et al. 2003) and Medical informatics (Wagner and Leydesdorff 2005), Management science (Yue 2012), Iranian nanoscience and technology (Mohammadi 2012) and Knowledge management (Ponzi 2002; Hou et al. 2006; Sedighi and Jalalimanesh 2014).

Methodology

This study has been made on the journal Scientometrics which fall under the framed period. A total of 959 papers were retrieved from the journal Scientometrics through ‘Springer Link’ database from the same period. A total of 44 papers are excluded as they don’t come into category of articles and belong to reviews, editorials, letters, short communication etc. Finally, 915 articles are selected for co-word analysis. In this study, ‘R’ software (R Core Team 2012) has been used. It is open source statistics analysis software, freely available on internet.

Steps of co-word analysis

Co-word analysis helps us to structure the data at various levels of analysis: as networks of links and nodes; as distributions of interacting networks and as transformation of networks over time periods (Ding et al. 2001). It reduces and presented data into a visual representation which explain essential information of contained in data. Step wise procedures of analysis are mentioned below as it is shown in Fig. 1:

Data collection

Keyword plays a very important role in co-word analysis. There are two ways to extract keywords form the targeted sample (i.e. journals articles, conference papers, reports, chapters of book etc.): Non-parametric and Parametric. Non- Parametric deals with manual efforts of collecting keywords given by author, journal database, abstract database and citation database. Some important keywords can be extracted from abstract and title of the document. Coulter et al. (1998) selected keywords for their study which were added by indexer and selected form article and then standardized to remove the different variants form of keywords. Courtial (1994) used keywords given by authors. Coulter et al. (1998) chose descriptor provided by GUIDE database as keyword. Looze and Lemarie (1997) used keywords suggested by expert of the subject. Nyams and Van Ram (1998) conducted a study using co-occurrence of classification code.

Another method uses software to extract keywords from the text such as NP tools, Ti.exe, GenEx. These softwares use an algorithm or text rank to select keywords which represent core topics of the specific subject. Mihalcea and Tarau (2004) examine different algorithm and developed a text rank- graph based on ranking model for key word extraction from full text. This method of keywords extraction eliminates the biasness of indexer and building thesauri and classification system. Ding et al. (2001) used above mention methods for their study.

Non-parametric way of key extraction has been used in the study, we have accepted not only all keywords added by author to the article but have also collected keywords provided by LISA to the same articles and some keywords extracted non parametrically from the articles’ titles and abstract. Thus total 889 raw keywords were collected from 915 articles. These keywords are standardized (standardizing process given bellow under 4.2 sub-heading) and 240 unique keywords find for study. The average number of selected keywords per article is found 16.07. The range of selected keywords for each article which found out of 240 keywords varies from 1 to 51. Approx 5.84 % articles have more than 30 keywords in which 2 papers have 51 keywords, 21.74 % articles have 21–30 keywords and 45.57 % articles have 11–20 keywords. Thus 26.85 % articles have up to 10 keywords while 73.15 % articles use more than 10 keywords.

Data standardizing

After selection of keywords, these should be standardized by vocabulary tool (Ding et al. 2001; Yue 2012) as some related concept is presented by different words. Standardizations of keywords remove all synonyms, ambiguity, general term (a term which occurred mostly in a subject) and different variants form of word.

All keywords which are chosen for this study from the titles and abstract are standardized with help of the LCSH, SLSH and Bibliometric Dictionary, in order to remove the variants form of keywords such as singular/plural and synonyms words.

Keywords representing the same concept have been clubbed into standardized form. Words having low frequency are merged into broader term. Words not having broader or similar term or which are not specific or which are common words, have been ignored (Ding et al. 2001) such as author, citation, bibliometric etc. Using said selection procedure, different variants form of keywords such as singular/plural and synonyms words have been eliminated. After multiple corrections, 240 keywords are chosen as a research sample for co-word analysis; and each of them has minimum ten frequencies.

Matrix calculation

After selection of 240 keywords, a document term matrix is prepared on the basis of key word occurrence in Corpus. With the help of ‘R’ software, the document term matrix transformed into co-occurrence matrix (symmetrical matrix) of 240 keywords and the diagonal value of matrix is treated as missing data.

Data mapping

Co-word analysis can be done by different methods; Many scholar uses software: LEXIMAPPE (Law and Whittaker 1992; Cambrosio et al. 1993; Courtial 1994; Looze and Lemarie 1997); CAIR: Content Analysis and Information Retrieval, (Coulter et al. 1998); BibTechMon (Bibliometric Technology Monitoring) (Kopcsa and Schiebel 1998); Kohnen’s neural network algorthim (Polanco et al. 1998).

Whereas, the most commonly used methods are multidimensional scaling and clustering techniques. Ding et al. (2001) applied this method using SPSS software. This study has utilized hierarchical clustering, multidimensional scaling and social network map to visualize the co-word structures in Scientometrics.

Clustering

Clustering is a method of grouping objects into a set on the basis of similarity and dissimilarity. Single linkage is a widely used clustering technique but this technique limiting one cluster to 10 co-words only (Lee and Jeong 2008). Leydesdorff and Zaal (1988) make a study and suggested that ‘ward’ mode of analysis is better suited for symmetrical metrics than single linkage clustering because of the large number of zero hits which may lead to ‘chaining’ in the first cluster and isolates. This dissertation has used ‘Complete’ method of clustering and five clusters of the keywords have been made for co-word analysis.

MDS and Correlation

Following steps have been followed in order to plot two dimensional MDS map for each cluster

  1. (a)

    Pearson’s r calculated on the basis of co-occurrence matrix of keywords.

  2. (b)

    With the help of ‘R’ statistical software MDS was applied to all keywords of each cluster.

  3. (c)

    Five MDS map are generated for a segment of the period,

  4. (d)

    Generated map displays relationship among the keywords of a cluster.

Similarity between two words (Pearson correlation coefficient) is calculated on the basis of all co-occurrence frequency that two words have in the cluster and this similarity matrix has been used for plotting MDS. Pearson’s r is used to identify the relation among the keywords (Leydesdorff and Zaal 1988; An and Wu 2011). Pearson’s coefficient calculates a value which indicates similarity in pair of keywords. Pearson’s value lies between +1 and −1. Positive value indicates similarity between the keywords, whereas −ve value shows dissimilarity. Dissimilarity causes problem during measurement of the proximity among the keywords, this problem can be solved by linear transformation of (r + 1)/2. Linear transformation would transform the value between 0 and 1 (Leydesdorff and Vaughan 2006) and hence, better analysis can be done. The words with high Pearson correlation coefficient are located together in the MDS map, and those words located together in the map have high similarity in terms of co-occurrence profile within the whole matrix.

Network of keywords

Thin lines over MDS map between keywords form a network which correspond to a social network. Linkage among keywords presents the co-occurrence strength. Co-occurrence strength takes value of Salton index. Salton value can be calculated using following formula (Yue 2012):

$$ Salton = C_{ij} /\sqrt {N_{i} N_{j} } $$

here, C ij  = Co-occurrence frequency of i and j keywords; N i  = Frequency of ‘i’ keyword; N j  = Frequency of ‘j’ keyword; Salton value lies between 0 and 1.

In this section, link strength among keywords of the network map is defined as Salton >0.2 (Ding et al. 2001; Yue 2012).

Result

In order to understand the overall co-word analysis, co-occurrence of keywords were analyzed during whole period (2005–2010). Whole period has been divided into two segments i.e. year 2005–2007 and 2008–2010, so that dynamic changes during these two periods can be observed.

Co-word analysis (year 2005–2010)

General overview structure

In order to understand the positional concept (relative position) of different clusters of scientometric field during 2005–2010, a general overview structure were generated with the help of MDS. Each cluster is selected as an input variable. As shown in Fig. 2, each cluster is represented by five keywords having highest frequency in the cluster. Position of a cluster (sub-domain) in the graph depicts the relation of its keyword with other clusters. Strongest linkage between clusters is shown by network developed by thin lines, which has value >0.35 (according to Salton index).

Fig. 1
figure 1

Step of co-word analysis used in Scientometrics

Fig. 2
figure 2

General overview structure of Scientometrics in 2005–2010

Detail co-word structure

In order to plot detail structure of each cluster, keywords are selected as input variable to map the sub-domain based on correlation matrix of 240 keywords. Detailed co-word structures (Figs. 3, 4, 5, 6, 7) are plotted to visualize specific characteristics of each sub-domain (clusters) in the field of scientometrics. The thin lines represent the link between two keywords with the Salton Index (>0.2). The keywords having crass mark represent to outer link key in detail co-word structure map.

Fig. 3
figure 3

Detail co-word structure of cluster 1 (C1) in 2005–2010

Fig. 4
figure 4

Detail co-word structure of cluster 2 (C2) in 2005–2010

Fig. 5
figure 5

Detail co-word structure of cluster 3 (C3) in 2005–2010

Fig. 6
figure 6

Detail co-word structure of cluster 4 (C4) in 2005–2010

Fig. 7
figure 7

Detail co-word structure of cluster 5 (C5) in 2005–2010

Custer 1 includes 43 keywords with higher frequency like Citation analysis, Scientific output, University ranking and Case study. Cluster 2 has 45 keywords like Cluster analysis, Self citation, Publication output, citation pattern, Search strategy, Statistical analysis. Cluster 3 includes 54 keywords as Scientometric analysis, Keyword analysis, Collaboration, Authorship, Bibliometric law, etc. Cluster 4 includes 51 keywords related to Interdisciplinary research, Power law, Centrality, Informetrics, and Patent citation analysis and so on. Cluster 5 includes 47 topics relating to Website, URL, Co-link analysis, Gini-index, web-citation, co-word analysis, collaboration pattern topics on and so on.

It can be observed from above plotting network of different cluster that connection strength is not same. University ranking, Bibliometric analysis, Citation analysis in cluster 1 (C1); Citation pattern, publication output,Citation index, in C2; Patent analysis, Co-authorship network, Scientific collaboration, International collaboration, Lotka law in C3; International patent, University patent, Bibliometric map, and Citation performance in C4; Co-citation analysis, Co-citation count, Multidimensional scale, Website, Webometrics in C5 are active keywords which occupy more links.

For the comparative study of different clusters, following characteristics of network are tabulated below:

  1. 1.

    Outer link refer to the no. of links of a clusters which established network with other clusters (sub-domain).

  2. 2.

    Inner link refer to the no. of link which exist between keywords of a cluster. If A & B connected with an arch, it means A & B each have one link.

  3. 3.

    Total link refer to the sum of inner and outer link.

  4. 4.

    \( \varvec{ }Inner \, Link\,\% = \frac{Inner \, Link}{Total \, link} \times 100 \).

  5. 5.

    \( Outer \, Link\,\% = \frac{Outer \, link}{Total \, link} \times 100 \).

  6. 6.

    Inner Link Key refers to the no. of keywords which has link within cluster.

  7. 7.

    Outer Link Key refers to the no. of keywords which has link from other clusters.

  8. 8.

    Total Key refers to the no. of keywords within a cluster.

  9. 9.

    \( Inner \, link \, key\,\% = \frac{Inner \, Link \, key}{Total \, key} \times 100 \)

  10. 10.

    \( Outer \, Link \, Key\,\% = \varvec{ }\frac{Outer \, link \, key}{Total \, Key} \times 100 \)

  11. 11.

    \( Average \, link \, per \, key = \frac{Total \, link }{Total \, key} \)

  12. 12.

    Centarlity is defined as mean of outer link (sum of Salton index of outer links/outer links). It measures the intensity of links for a given cluster with other clusters.

    $$ \varvec{ }Centrality = \frac{\sum X}{Outer \, Links} $$

    Here, X = values of Salton index of outer link.

  13. 13.

    Density is defined as mean of inner link (sum of Salton index of inner links/inner links). It measures the strength of the links that tie the words making up the cluster together.

    $$ Density = \varvec{ }\frac{\sum Z}{Inner \, Links} $$

    Here, Z = values of Salton index of inner link.

From Table 1, it can be observed that the intra-connection strength (inner link) of C1, C3, C4 and C5 is much higher than its outer link, whereas C1’s inner link is slightly higher than its outer link. Overall 63 % of links are inner links; these links reflect the substantial relationship among keywords of clusters.

Table 1 Co-word structure of five clusters during year 2005–2010

The number of keywords which bear inner link is much higher than outer link. Overall 56 % key bear inner link while 35 % key bear outer link only. On the average, C1 and C3 has 2 links (outer and inner links) and other cluster has only 1 link per key.

C4 owns highest centrality and highest density among the whole research networks of this period. Here, centrality is slightly higher to its density, it means this cluster owns not only connection within the cluster but also owns extensive connection with others clusters’ keywords. This event indicates that this cluster’s topics lies in the core of all research subject and research subject composed by its keywords are tending to mature. C2 occupies the highest density and higher centrality. The inner connection in this clusters are intense which explains that the research topics has capacity to maintain it and to develop over course of time. C1 and C3 has lowest centrality but its density is slightly higher it means its topic is immature in comparison to other clusters. C5’s centrality is slighter higher than its density means its topics are in strategic position in comparison to C1 and C2.

Co-word analysis of 2005–2007

General overview structure

During this period, 188 keywords are selected as the keywords research sample. 52 Keywords (see Table 4 of Appendix 1) which does not have appropriate frequency (<10) are excluded. Previous method was applied to generate the general overview map of Scientometrics in 2005–2007 by MDS (Fig. 8) and each cluster was represented by its five highest no. of keywords and network with thin line show strongest relation between cluster according to Salton index (>0.35).

Fig. 8
figure 8

General overview of the structure of Scientometrics in 2005–2007

Detail co-word structure

In order to plot detail structure, keywords are selected as input variable to map the sub-domain based on correlation matrix of 188 keywords. Thus fives detailed sub-domain (cluster) structure (Figs. 9, 10, 11, 12, 13) are plotted to visualize specific characters of each sub-domains (clusters) in the field of scientometrics.

Fig. 9
figure 9

Detail co-word structure of cluster 1 (C1) in 2005–2007

Fig. 10
figure 10

Detail co-word structure of cluster 2 (C2) in 2005–2007

Fig. 11
figure 11

Detail co-word structure of cluster 3 (C3) in 2005–2007

Fig. 12
figure 12

Detail co-word structure of cluster 4(C4) in 2005–2007

Fig. 13
figure 13

Detail co-word structure of cluster 5 (C5) in 2005–2007

During year 2005–2007, each cluster has average value of 37 keywords. For plotting detailed co-word structured, keywords selected as input variable to map the sub-domain based on correlation matrix of 188 keywords. Five detailed sub-domain (cluster) structure (Fig. 9, 10, 11, 12, 13) were plotted to visualize specific characters each sub-domains (clusters) in the field of scientometrics during year 2005–2007.

Cluster 1 describes research topics related to Bibligraphical analysis, Citation analysis, Hyper link, Collaboration network, Knowledge diffusion. Cluster 2 describes research topics related to Keyword analysis, Precision, Interdisciplinary research, Case study, Informetrics. Cluster 3 describes topics on cluster analysis, self-citation, H-index, Editorial board, Power law, Journal impact factor, Citation distribution. Cluster 4 focuses on Scientific output, Web page, Website, Internet, URL, Co-link analysis, Content analysis, Correlation coefficient. Finally, Cluster 5 appears to focus on Co-author analysis, Authorship, international collaboration, Patent analysis, Scientific collaboration, University rank, and Correspondence analysis.

Observing the co-word network of different clusters during this period (2005–2007), it found that C2 and C4 had more nodal keywords. Co-authorship network, Collaboration network, Hyper link, and Bibliometric network, in C1; Co-citation rate, Co-citation analysis, Co-citation cluster analysis, Network analysis, Centrality, Co-occurrence analysis, Citation environment, Citation index, Information retrieval, Text analysis, Scientometric analysis in C2; Citation count, Journal rank, Journal impact factor, Self-citation, Lotka law, Complex network in C3; University patent, Patent citation analysis, Patent productivity, Data mining, Domestic collaboration, Internal link, Web impact factor, Web-link analysis, URL, Cybermetrics in C4; Patent analysis, Co-authorship analysis, and International collaboration in C5 are active keywords which occupy more link with other keywords and focused main research topics of this period. In Table 2, different characteristics of five clusters can be seen.

Table 2 Co-word structure of five clusters during year 2005–2007

As shown in Table 2, during this period C2, C3 and C4 have much higher than its outer link but C1 is on contrary. It shows abundant internal connection among keywords. C3 and C4 had about 88 % inner link key which is highest in whole research network and C1 had 67 % which is lowest. On the contrary C1 has highest outer link key which indicates strong relation with other clusters. Over all 81 % key are inner link key while only 42 % key has outer link. Thus these links show a stable internal composition in each cluster but week extensive relation among clusters.

On the average, C1 had highest no. of link (inner and outer link) per key which show coherent network among keywords while C2 had lowest no. of link per key. Overall average link per key is three. It is clearly advisable dense link between keywords.

In this period, C4 had highest density and higher centrality means this cluster did not only owns intense inner connection but also had good connection with other cluster. On this basis it can be said that C4’s research topics had already been cared and able to maintain itself to develop over the course of time and this field also considered important by Scienometrics research community. C1& C2 both has lower density and centrality. These research fields are immature and developing. These clusters did not have important network and nodal points.

Co-word analysis of 2008–2010

General overview Structure

During this period, among 240 keywords only nine keywords were not included (see Table 5 of Appendix 2) as these keywords did not appeared with appropriate frequency (it occurred less than ten times in the corpus). Thus 231 keywords were selected for research sample. The same method was used to generate the general overview structure of Scientometrics in 2008–2010 by plotting MDS (Fig. 14) and each sub-domain (cluster) was labeled by top five most frequent keywords within the cluster as before.

Fig. 14
figure 14

General overview structure of Scientometrics in 2008–2010

In order to pot detailed structured of each cluster during the period of 2008–2010, each cluster’s keywords selected as input variable to map the sub-domain based on correlation matrix of 231 keywords. Thus fives detailed sub-domain (cluster) structure (Figs. 15, 16, 17, 18, 19) were plotted to visualize specific characters each of five sub-domains (clusters) in the field of scientometrics.

Fig. 15
figure 15

Detail co-word structure of cluster 1 (C1) in 2008–2010

Fig. 16
figure 16

Detail co-word structure of cluster 2 (C2) in 2008–2010

Fig. 17
figure 17

Detail co-word structure of cluster 3 (C3) in 2008–2010

Fig. 18
figure 18

Detail co-word structure of cluster 4 (C4) in 2008–2010

Fig. 19
figure 19

Detail co-word structure of cluster 5 (C5) in 2008–2010

During this period (2008–2010) C1, C2, C3, C4 and C5 has 50, 76, 37, 37, 31 keywords respectively. Cluster 1 includes the research topics related to Bibliometrics, Scientific output, Citation index, Editor, International collaboration, Citation analysis. Cluster 2 includes keywords related to Website, Co-citation analysis, Citation rate, Patent citation, Patent application, Co-word analysis. Cluster 3 contains topic related to Co-author analysis, Cluster analysis, Interdisciplinary research, Hyper link. Cluster 4 includes topics on H-index, Co-citation cluster, Web page, Cited paper, Citation impact, Internet. And cluster 5 includes research topics on Scientometrics, Collaboration pattern, Keyword analysis, Centrality, Social network analysis, Network analysis, Patent analysis, Knowledge mapping.

It was found to observe the plotted figure that Bibliometric indicator, Bibliometric analysis, Iinternational collaboration, and Scientific output in C1; Patent related topics, Co-citation,Websites, Web link, Out link, Web link analysis, academic ranking in C2; Lotka law in C3; h-index, web citation, URL, and topics related to internet in C4; Informetrics, Scientometrics, Network analysis, Co-authorship network, Scientific collaboration, Collaboration pattern, Network structure in C5 are active keywords which associated with more keywords in the cluster and focused main research themes of this period.

From Table 3, it can be noticed that each cluster had inner link more than its outer link. Cluster 2 (C2) has highest no. of outer link but cluster 3 is on contrary while this cluster has 16 % outer link key only. On average each cluster has 67 % inner link key strong relationship between keywords. Overall around 59 % link are located within different clusters are inner link. These links reflect not only intense network within the cluster but also show loose linkage with other clusters.

Table 3 Co-word structure of five clusters during year 2008–2010

Cluster 5 and 4 has highest 5 and 3 link per key respectively which is evidence of abundant links among keywords and coherent network.

During this period C3 has highest density and centrality among whole research network, here centrality is more less than its density, it means the topics which embedded in the C3 had already form their own subfields with strong internal composition. C1 owns lowest density and centrality in whole research network which indicates its immaturity during this period. C2 is one with highest centrality indicating its strong linkage with other clusters. C4 and C5 also has more density than its centrality which indicates its intense connection between keywords which explains that its topic are already cared and developed.

Discussion and conclusion

Co-word analysis is powerful tool to identify the linkage and association among different themes of subject through the analysis of co-occurrence frequencies of keywords and phrases. It is used to detect the themes, relationship between these themes, the extent to which these themes are central to and the degree to which these themes are internally structured in a given research area (He 1999). This technique is mainly used for quantitative study of literatures. It is used for the better understanding and to communicate the development of scientific field (Borner et al. 2003) as well as for evaluation purpose (Noyons 2005). Analysis can be performed by different open source and proprietary software like Word Stat, Text Mapping, Python, R Software and T-Lab etc.

‘R’ is a very sophisticated statistical analysis open source software (used in this study), as it helps us to find out the frequency (TM package) and co-occurrences of keywords from a document in a corpus. It can be used for MDS plotting, creation of network among keywords and calculating its different characteristics. This can also be used to reduce a large space of related descriptors to multiple related smaller spaces that are easier to comprehend.

This study provides a description of intellectual structure of the field of scientometrics from the perspective of frequently appearing keywords and phrases using co-word analysis, hierarchal clustering and link analysis. As we know, increasing number of new keywords and its association within or outside of the cluster is the evidence of micro level study and conception of new themes. Keyword associations provide a detail structure through which trend, and formed pattern of a publication can be analyzed.

Hence, we can conclude about the structure of scientometrics using co-word analysis of scientometrics literature in the journal Scientometrics. These field rapidly evolving as demonstrated by appearance in increasing no. of keywords in the publication. Research topics related to the data analysis of the period of 2005–2007 shows a research trend focusing on hyper-text, hyper-link, citation analysis, co-citation cluster, network analysis, URL, web-link analysis, authorship, text analysis, co-word analysis, patent citation, co-occurrence analysis, cluster analysis, scientific output, international collaboration, information retrieval, collaboration network. Some marginal topics which do not have good association and frequency like keywords analysis and web resource but get popularity in 2008–2010.

Consistent themes such as citation analysis, network analysis, web-link analysis, Lotka law, collaboration network have been noticed in second phase (year 2008–2010), but the focuses are moving towards h-index, website, citation rate, co-author analysis, author co-citation analysis, web resource, keyword analysis, scientometric analysis, scientific collaboration, bibliographic coupling, information retrieval, webometrics, informetrics. Whereas, cluster analysis, content analysis, self-citation, university patent, Bradford law, Zipf law, university patent are being noticed dimmer at the same time. Some new keywords like ARWU (academic world ranking of universities), H-indicator, Back-link, Knowledge mapping, web Co-link analysis, Garfield impact factor, trend analysis, Lotka distribution, Intra-regional collaboration, Co-citation analysis and Distributional model, H(2)index, Citation performance and have been found during analysis (see Table 9 of Appendix 2).

Scientometrics themes have well defined genealogies such as, Citation analysis, Author productivity, Bibliometric analysis and other appear to emanate from multiple preceding themes as, H-indicator, Co-citation map, and Co-citation link. Some topics emerge quickly with little evidence of ancestry as knowledge mapping, Back-link, Weak-tie, Bayesian analysis, Stochastic model and others. Thus it can be concluded that during year 2005–2010 scientometrics has good number of established themes and new themes had emerged due to dynamic nature of the subjects.