Introduction

Knowledge production and scientific change are crucial to economic growth (Solow 1956). The rapid scientific development has substantially stimulated knowledge convergence which affects the progress of innovation (Kodama 2005). With that trend, knowledge and scientific relatedness has obtained great attentions (Joo and Kim 2010; Makri et al. 2010). The relatedness of scientific topic with other topics presents quite different characteristic. For instance, some scientific knowledge is diverse enough to interact with other knowledge (Geum et al. 2012), while some knowledge elements have low combinatorial potential. Knowledge productivity of each country is unbalanced (Guan and Ma 2007), and the geographical distribution of knowledge elements is also uneven. For example, some scientific studies of developed countries involve vast knowledge elements and own plenty of scientists (Livingstone 2010). The distribution of knowledge productivity has changed over the past decade due to the global trend and international collaborations (Zhang et al. 2010). The traditional centers like North America have been exceeded by some Asian countries (Matthiessen and Schwarz 2010).

The USA and China are most the same in the geographical area (9,629 million, 9,600 million, respectively) and the population of the USA is about a quarter of China’s population. China’s economy has been growing so fast in recent years and its GDP has surpassed Japan after 2010 and occupies the second position only to the United States. What’s more, the average annual GDP growth rate of China was 9.9 % during 1952–2013 (National Bureau of Statistics of China, http://www.stats.gov.cn/). Some scholars considered that China’s economy will be equal with that of the USA in size in about 10 years (Virmani 2005). China is also the second largest nation in energy consumption. Energy demand pressures promote the government to review its energy and economic policies to support the sustainable development (Li et al. 2007). Both the USA and China are trying to make impressive strides in harnessing sources of solar energy (Fthenakis et al. 2009). For example, China has the largest solar panels producer (Bradsher 2010) and the United States shows significant research strengths in solar energy field (Hassan et al. 2014). The growth of the scientific and technological capacities of China can be expected to reinforce its economic development (Zhou and Leydesdorff 2006). The USA and China are two nations which are prominent scientific producers. China has significantly increased its publication output in solar energy in recent years which is close in terms of publication output with the USA. A comparative study in solar energy in two prominent nations can contribute to the scientific research and professional technology of this field.

This raises some questions: How the relatedness between knowledge element and others evolves with time? What’s the difference between the relatedness in the USA and China? To what extent does the scientific relatedness exert influence on the literature productivity? Previous studies have showed that technological relatedness can explain the entries and exits of cities which are specialized in those knowledge (Rigby 2013), and relatedness is a main driving force behind technological change (Boschma et al. 2013). However, few scholars have focused on the scientific relatedness (Boschma et al. 2014). There is little empirical evidence on the impact of scientific relatedness on the innovation output, especially at the country level.

The goals of this article are to compare the evolution processes of scientific relatedness in the USA and China and explore the impact of relatedness on scientific output in the period 2000–2013. Specifically, we thoroughly analyze the relatedness between knowledge and knowledge, and after that we investigate the relatedness between knowledge and country. Then, this paper assesses the extent to which the scientific output of knowledge topic is dependent on its relatedness. In order to measure the main indices, we use keywords to discuss the co-occurrence relationships among knowledge topics and explore the dynamic evolution of scientific topics. Evolutionary patterns of relatedness are also compared between the USA and China. Analyzing the relatedness at the country level aims to examine contrasts between the scientific knowledge in the USA and China. More specifically, we can compare the role how the relatedness affect the output in the two countries.

The rest of the paper is organized as follows. “Literature review” second section reviews related literature on relatedness and innovation theories which lead to our hypotheses. Afterwards, in Sect. “Data and context”, we describe our data and compare the knowledge topics in the USA and China through constructing co-occurrence networks. Section “Relatedness between topic and topic” analyzes the relatedness between topic and topic and explore the dynamic evolution of relatedness. Section “Relatedness between topic and country” investigates the relatedness between topic and country. Section “Analyses: relatedness and output” empirically tests our hypotheses and explains the results. While Sect.  “Conclusion” presents the conclusion and discussion.

Literature review

The production of innovation usually demands knowledge from multiple scientific fields (Schumpeter 1934). Attentions on the relationship between knowledge and innovation output raises a question: What kind of influence will the relatedness of knowledge exert on the innovation output. The relatedness between the topic and other topics could bring about combinatorial opportunities (Wang et al. 2014). However, if the combinatorial capacity of the knowledge topic with other topics is too strong, the potential of their creative convergence could be largely exhausted (Kim and Kogut 1996). Relative to the knowledge topic with high relatedness, the knowledge which has weak relatedness tends to have an advantage in scientific performance. First, the knowledge transfer among knowledge topics which have high relatedness is repetitive most of the time. Content transferred among such knowledge may also represent homogenous and unuseful information. While as the information transmitted among the low relatedness knowledge topics demonstrates non-redundant and useful (Hansen 1999), which makes such knowledge topic exposed to varied topics (Granovetter 1973). Different perspectives and approaches can extend innovative expertise (Coser 1975) which is crucial to innovation. Second, due to the low similarity of knowledge topics with low relatedness (Granovetter 1983), knowledge exchange opportunities are provided to produce ideas collision and innovative spark (Jerome 2013), which is helpful to stimulate innovation in different scientific backgrounds and areas. Existing studies suggest the technology portfolio of actors should not be too related (Granstrand et al. 1997; Hussinger 2010). Diversification of related knowledge can be helpful for researchers to understand emerging technological opportunities and keep up with rapid technological developments (Hussinger 2010). What’s more, knowledge topic with weak relatedness suffers less constrains, which presents strong autonomy. Relatively weak relatedness will bring fresh knowledge, making innovation easier. Thus, a knowledge topic owning low relatedness with other knowledge topics tends to be strong in innovation output.

At the country level, relatedness between topic and country measures the closeness between knowledge topic and country. We also think that relatedness between topic and country will have a negative impact on scientific output, which is due to two reasons. On the one hand, although knowledge production shows path dependence (Garud and Karnoe 2013), such path dependence limits the learning of new knowledge. Relatively high relatedness between topic and country indicates that the topic is cognitively close to the country and the percentage of knowledge topics associated with the specific topic is high in that country (Boschma et al. 2014). As such, knowledge topic in that country may reach a point where its scientific value has been exhausted (Carnabuci and Bruggeman 2009) suggesting the costs may ultimately exceed the benefits (Phelps et al. 2012). The exploration space of a topic which has high relatedness with the country is limited and under restrictions. We can predict that further output on that topic in that country is no longer fruitful. On the other hand, it is an inevitable tendency that knowledge production shows globalization (Hassan and Haddawy 2013). This international collaboration will promote the process of international communication and knowledge dissemination (Ma et al. 2014) which narrow the cognitive gap among nations. For this reason, the scientific output of the topic owning low relatedness with the country will increase with great potential. Based on the above discussion and arguments, we will propose our hypotheses:

Hypothesis:

Scientific relatedness has a negative relationship with scientific output.

Hypothesis (a):

The relatedness between the topic and other topics negatively affects its scientific output.

Hypothesis (b):

The relatedness between the topic and country negatively affects its scientific output.

Data and methodology

Data and context

We draw on data sets of articles to analyze the USA and China research innovation networks in the period spanning 2000–2013. Though the research of solar energy has a long history, it has not been seen as a priority area of research until more recently (Hassan et al. 2014). The period 2000–2013 involving relatively new information and plentiful empirical data can be used to capture recent research trends and identify the current top collaborative authors. What’s more, we found that China has only 31 papers, yet the USA has 333 papers in this field in 1999. On the one hand, few scientific documents may lead to sparse networks. On the other hand, this big gap which is found between the United States and China before 2000 may result in uncomparability. Mainly due to that China shows significant increase in scientific documents and the gap with the United States is narrowed after 2000, our study interval is 2000–2013. Two scientific data sets downloaded from the Web of Science (WoS) form the core of our study. The database has been utilized extensively in early research (Zhu and Guan 2013). Searching terms which are adopted in this paper to retrieve solar energy data records, are developed by Sanz-Casado et al. (2013), and “Appendix” displays the retrieval profile for solar energy. Many research fields utilize bibliometric methods to analyze the research trend form publication output, keywords, subject category, etc. (Mallik and Mandal 2013).The total number of articles identified in the WoS is 70,562 over the period 2000–2013. The sample includes 156 countries. The database is mainly centered in the United States, which is in number one position responsible for 14,342 publications. China, in the second position, is associated with 11,896 articles published. Figure 1 plots on the number of the USA and China articles since 2000. Both of countries show significant growths in research productivity. As clearly displayed by Fig. 1, the USA occupied an absolutely leading position and China took the second largest scientific producer before 2011. Glancing at the figure, the exponential growths in solar energy articles in both countries stand out. China has a relatively high scientific productivity in solar energy (CN: y = 28.6e.341x, R 2 = .995) and has exceeded the USA after 2011 (US: y = 227.2e.168x, R 2 = .965).

Fig. 1
figure 1

Growth in solar energy scientific articles

A large proportion of articles for each of the countries have more than one topic. The convergence of knowledge, contributes to the growth in research output for the USA, China and the world. The keywords are related to the topic (Cataldi et al. 2010) in the set of solar energy publications. Keywords summarize the core content of literature which can reflect the theme distribution and characteristics in a particular field (Glänzel and Thijs 2012). The use of keywords in the set of solar energy publications provides us with an indication of the cognitive developments of topics within the field (Rip and Courtial 1984). Keywords can be used to map knowledge structure (Su and Lee 2010), which are useful and important for the readers and authors. Accurate and concise keywords can arouse the reader’ interest and attention, which are good for the citations of the article, so almost every author writes the keywords seriously and accurately. We also considered the classification of WoS to construct the co-occurrence network. But the less number of classifications (N = 200 here) urges us to give up the idea. We used data from WoS to track the country’s scientific activities. The dataset contains details about articles, including published dates, countries, and keywords. Keywords by WoS have been considered valid proxies to trace scientific development. Scientific researchers categorize keywords carefully which are usually anchored on a subject matter. Changes in a country’s keywords over time can be clearly discerned which makes the internal cognitive structure visible (Mutschke and Haase 2001). In this study, we use a key word as proxy for a topic. We clean the dataset for the reason that different words may represent the same concept, such as “solar cells” and “solar cell”. Figure 2 shows the dynamic processes of topic numbers for the USA and China. As displayed in Fig. 2 on the whole, the number of topics of the USA per year is far more than China before 2010. Nevertheless, the gap between the USA and China has remarkably narrowed year by year. Both of the USA and China perform the impressively exponential growth pattern (US: y = 383.2e.167x, R 2 = .961; CN: y = 78.94e.306x, R 2 = .990). However, the USA retains growth in a relatively stable situation and has lost its leading position after 2010 which is occupied by China. China demonstrates a great speed and strength of the catch-up process. Note that the growth in topic numbers in the USA is exceeded by growth in China. The USA used to be the major scientific producer, but in the topic quantity in the solar energy field, China dominates.

Fig. 2
figure 2

Growth of topics in solar energy field

Article output presented in Fig. 1 does not exhibit the full feature of academic status. As a comprehensive index of richness and uniformity of data, diversification can describe the distribution of the topic share (Rafols and Meyer 2010). From a RBV (resource-based view) perspective, the countries engaging in topic diversification are willing to exploit existing scientific resources to develop new capabilities which enhance innovation (Quintana-García and Benavides-Velasco 2008). Diversified countries can manage more opportunity for knowledge utilization, and their profit will increase due to topics combination arising from diversification. Given these perspectives, we seek to probe into the topic diversification of the two countries. In our study, diversification is measured by the entropy measure developed by Palepu (1985) which is calculated from articles data at the focal year by using the following equation: \( D = \mathop \sum \nolimits_{k}^{N} P_{k} \,\ln \left( {\frac{1}{{P_{k} }}} \right) \). Where P k is the proportion of articles in the topic k, and \( \ln \left( {\frac{1}{{P_{k} }}} \right) \) is the weight for each topic k, for a country with N different topics. In order to present the dynamic change feature of topic diversification in the solar energy field, we calculate the time series data of topic share in the period 2000–2013 for the USA and China. Figure 3 portrays topic diversification of two countries, reflecting the national dynamic change. The figure reveals that the USA always has a higher diversification than China; however, the topic diversification gap is relatively smaller after 2007. After then, China closes the gap at the rapid rate.

Fig. 3
figure 3

Dynamic change of topic diversification

Current focal points of academic research in solar energy field are different cross nations. Social network analysis like co-word occurrence analysis, co-authorship analysis, and other indicators of scientific discovery and knowledge production, has been a useful tool for bibliometric analyses (Kumar and Jan 2014). In order to observe the differences and similar among countries, Fig. 4 visualizes crucial connections in the country-topic association network (2-mode network) in solar energy field in 2013. The visualization identifies different nodes which include country (square node) and its associated topics (circular nodes). Topic nodes whose degrees exceed 3 and edges whose weights exceed 2 are showed in Fig. 4, and some major countries (G7countries: The United States, Britain, France, Germany, Japan, Italy, and Canada) and China are included in the figure. We just show 24 topic labels which are most frequently on articles (i.e. Solar Cells, Nanostructures, Organic Photovoltaics, Morphology ……). Based on network structure, the more massive the academic productivity of a node in the network, the larger that vertice is visualized. The USA is the largest country node in the network, China is ranks second, and Germany is third. We can find that these countries are most extensively associated to academic topics in their respective scientific activities, which demonstrates that they are the three most comprehensive of the literature producers in this network. We can identify closely linked major topics of a country by the size of nodes in each country’s sub-network. In the USA sub-network, for example, the “Solar Cells” node is largest, with the extensive association to the USA node. Furthermore, we find that Nanowires, Organic Photovoltaics, Zno, Density Functional Theory, Saudi Arabia, Plasma Expansion, and Spacecraft Arcing, are all vastly connected to the USA. Another high density sub-network that we can identify in Fig. 4 is that China and associated topics. Dye-sensitized Solar Cells, Polymer Solar Cells, Solar Cells, Counter Electrode, Optical Properties, Tio2, Solar Energy, and Self-assembly are all highly associated with China, reflecting that they have received increasing attention in current research in China. Besides helping to identify country-associated major topics, in the center of the network (Fig. 4), one may find some topics connecting with many countries, suggesting that they attract common concerns in many countries. Take “Solar Cells”, for example, which not only connects to the USA, but also to Germany, Japan, England, and France, revealing an interrelationship between these countries. However, this is also true of some quite specific topics, which are associated with specific countries. Take Organic Dyes (whose label is not shown in Fig. 4) for example, it only connect to China, revealing a phenomenon that some specific topics acquire analyses from a single nation.

Fig. 4
figure 4

2-mode network

In this study, we used keywords as important information to construct co-occurrence networks of keywords. Like patents, the keyword provides a means to organize scholarly literature intellectually (Leydesdorff et al. 2014). Figure 5 represents the largest connected component of topic co-occurrences graphically in 2013 in the USA. Nodes represent topic and lines represent keywords co-occurrence. Edges are presented in this figure if and only if their weights exceed 2, which means the number of the co-occurrence between two topics exceed 2. Topic nodes whose degrees exceed 2 are showed in Fig. 5, which means those topic nodes connect with at least three other topic nodes. The size of the nodes corresponds to the number of articles in each topic. Figure 5 shows the co-occurrence maps of the keywords analyzed in the USA. As mentioned above, the maps show the important associations between topics. We further detect scientific sub-networks using Girvan–Newman algorithm in Fig. 5, and the color of each topic indicates the subgroup to which it belongs. Girvan-Newman algorithm is a method of community identification developed by Newman and Girvan (Newman and Girvan 2004) which uses centrality indices and approaches an iterative process to find community boundaries. Figure 5 displays it clearly that four subgroups detected in the co-occurrence network in the USA. Subgroup I (red) which is mainly involved in the solar power literature includes the biggest node: solar cells. Subgroup II (black) located at the top right of Fig. 5 is mainly associated with the chemical synthesis. Subgroup III (yellow) has relatively low article production reflecting the convergence between solar energy and other renewable energy. Subgroup IV (blue) named “efficiency and environment” is that with the dense associations among its different topics.

Fig. 5
figure 5

Topic co-occurrence network in the USA in 2013

As one observes from Fig. 6, the topics are classified into 3 subgroups in China. Subgroup I (red) is in the center of the whole network which is related to solar power literature, and it has the maximum number of topics. Subgroup II (black) corresponding to “chemical synthesis” is on the left side of Fig. 6. Subgroup III (blue) named “other energy” has few relationships with other subgroups. As indicated by Figs. 5 and 6, China has different subgroups classifications comparing with the USA. Nevertheless, their research priorities have a lot in common, such as “Dye-sensitized Solar Cells”, “Polymer Solar Cells”, “Optical Properties”, etc. The possible reasons why the subgroups classifications are different between the USA and China are as follows: on the one hand, the USA pays more attention to efficiency of energy utilization (Howarth et al. 2000) and the impact on the environment (Chu and Majumdar 2012) than China does. On the other hand, China relies on extensive method of economic growth (Fei et al. 2011). Significant investment-driven and factor-driven growth is a specific feature of China’s economic development (Sabir and Sabir 2010). Thus, in the USA, “energy efficiency and environment” prove more developed, while in China, “solar power” shows more central.

Fig. 6
figure 6

Topic co-occurrence network in China in 2013

According to the basic theory of social networks, topic nodes with high betweenness centrality are considered to represent keys of research directions (Chi and Young 2013). Topics with relatively high betweenness centrality play important roles to bridge and link with other academic topics. They are always at the hub of a network and critical for the formation of some sub-networks (Valente 2012). We rank topics in Figs. 5 and  6 according to their betweenness centrality which is multiplied by 100. Top 10 topics in the USA and China are listed in Table 1. As reported in Table 1, the highest topic is “Solar Cells” in both of countries, fully demonstrating the importance of it. The remaining topics rankings vary in two countries, but we find that 3 topics (italics and bold) are both ranked in the top 10 in the USA and China after comparing.

Table 1 Top 10 high betweenness centrality topics

Relatedness between topic and topic

To measure the relatedness between topic and topic, we use Ron Bosch’s methods and compute relatedness based on topic co-occurrence analysis (Boschma et al. 2014). A simple and normalized measure is utilized to analyze the relatedness of scientific topics. Two keywords are considered to be associated with each other if they co-occur on one article. Thus, we could obtain a co-occurrence matrix in solar energy field in each period. After then, we use Jaccard index to normalize the value represents the co-occurrences (Leydesdorff 2008). According to previous studies, we define \( R_{ijt} \) as the relatedness between topic i and topic j at time t which is given in the Eq. (1).

$$ R_{ijt} = \frac{{{\text{occ}}_{ij} }}{{{\text{occ}}_{i} + {\text{occ}}_{j} - {\text{occ}}_{ij} }} $$
(1)

We define occ ij as the co-occurrences numbers between topic i and topic j. Besides, occ i represents the total number of occurrences of topic i which can be calculated through summing up the number of ith row of the co-occurrences matrix. Similar with occ i , occ i denotes the amount of occurrences of j. We also construct R it as the relatedness of topic i at time t which is displayed in the following equation:

$$ R_{it} = \frac{{\mathop \sum \nolimits_{i \ne j}^{n} R_{ijt} }}{n} $$
(2)

where n denotes the number of topics having co-occurrence relationships with i. We further compute the scientific relatedness of the whole network at time t on the above basis. The relatedness of network R t function given by:

$$ R_{t} = \frac{{\mathop \sum \nolimits_{i}^{m} R_{it} }}{m} = \frac{{\mathop \sum \nolimits_{i}^{m} \frac{{\mathop \sum \nolimits_{i \ne j}^{n} R_{ijt} }}{n}}}{m} $$
(3)

where m denotes the number of topics belonging to the whole network. Early studies have investigated co-occurrences of technology classes (Luan et al. 2013), co-occurrences of title words (Milojević et al. 2011). However, little scientific work analyzes the evolution of the scientific relatedness cross the time. In order to fill the gap, we plot the dynamic change of R t over the time period of 2000–2013. As showed in Fig. 7, both the USA and China have displayed a decrease in relatedness. Overall, even the numbers of articles and topics involved in the solar energy have been rapidly increasing over time, the average scientific relatedness between topics have dramatically declined year by year. Due to emerging topics and development of science, this evolving curve represents that topic associates with a wide range of topics. However, the associations are weaker over years.

Fig. 7
figure 7

Dynamic change of scientific relatedness

Relatedness between topic and country

To further investigate how close a topic is to country, we construct a country-topic level variable R ict through combining the relatedness R ijt with the countries. Information about countries is acquired from the author addresses concluded in articles. In our study, articles are fully distributed to each author nation (Boschma et al. 2014). R ict which describes the relatedness between topic i and country c at time t is given by formula (4):

$$ R_{ict} = \frac{{\mathop \sum \nolimits_{j \in c,j \ne i} \varphi_{ijt} }}{{\mathop \sum \nolimits_{j \ne i} \varphi_{ijt} }} $$
(4)

where \( \mathop \sum \nolimits_{j \in c,j \ne i} \varphi_{ijt} \) means the number of topics that can be found in the dataset of the country c which topic i links to at time t. \( \mathop \sum \nolimits_{j \ne i} \varphi_{ijt} \) represents the total number of topics which topic i links to at time t. For instance, if a specific topic i is related to 50 other topics in the whole dataset, and country occupies 10 topics, then the R ict will be equal to 10/50 × 100 % = 20 %. R ict can been seen as the percentage of topics found in a country, and as such, we could attribute topics to country in each year.

Table 2 shows scientific relatedness between the USA and China and top 10 topics appeared which most frequently on articles in 2013 (R ict is multiplied by 100). For instance, the degree of scientific relatedness between the country-the USA and the topic “Solar Cells” is 18.89, which implies that researchers from the USA are concentrated on 18.89 % of the topics that are related to “Solar Cells”. Comparing with China, we find that 8 topics (italics and bold) are both ranked in the top 10 in the USA and China and there are some significant differences between the two countries. For example, the scientific relatedness between the country- the USA and the topic “Dye-sensitized Solar Cells” is 7.65, whereas the value is 44.39 in China representing that the researchers from China are specialized in more topics related to “Dye-sensitized Solar Cells” than the USA.

Table 2 Scientific relatedness between countries and topics

Analyses: relatedness and output

We collect our sample by downloading all solar energy articles from WoS which contain details about publish dates, authors, and locations to construct topic co-occurrence matrices as discussed above. In order to achieve a high precision in analyzing networks and get a fine-grained analysis of relatedness dynamics (Gulati et al. 2012), we use one-year windows from 2000 to 2013 to construct co-occurrence networks. The total dataset finally includes 156 countries and 64,658 scientific topics. The sample involved in the USA includes 23,933 topics and there are 21,284 topics in China dataset. Our goal is to empirically investigate the impact of relatedness on innovation output. Our analysis unit is topic, and we regress the number of articles containing this topic in the next year on its scientific relatedness.

The dependent variable Outputt+1 is the number of articles containing this topic in the given year (t + 1). The explanatory variable R it means the average scientific relatedness of topic i with other topics in year t. Another important explanatory variable is R ict which shows the relatedness of topic i with country c in year t. We control some factors which can influence the dependent variable in our analysis. Productivity of topic is controlled by a key variable: Stock which is constructed by the total number of articles in which a topic occurs on a given 1 year period. To control for the impact of network structures on the innovation output, we include variables: centrality and structural holes. The centrality of the topic in the co-occurrence network is measured by Pajek software. Due to that betweenness centrality can measure the ability of topic controlling the resource; we use betweenness centrality as the centrality measure (Kumar and Jan 2013):

$$ C_{B} \left( {n_{i} } \right) = \frac{{\mathop \sum \nolimits_{j < k} g_{jk} \left( {n_{i} } \right)/g_{jk} }}{{\left( {g - 1} \right)\left( {g - 2} \right)}} $$
(5)

where g jk is the amount of the shortest paths from node j to k, \( g_{jk} \left( {n_{i} } \right) \) shows the number of paths that pass i and g is the number of nodes in the network.

We compute structural holes using Burt’s measure (Burt 1992). The index CO displays the extent to which all of i’s ego network is invested in its associations with j. The CO index is computed as following equation:

$$ {\text{CO}}_{i} = \mathop \sum \limits_{j} \left( {p_{ij} + \mathop \sum \limits_{q,q \ne i,q \ne j} p_{iq} p_{qj} } \right)^{2} $$
(6)

where p ij is the ratio of i’s relationships invested in linking j. Due to the fact that the value of CO i is perhaps bigger than 1 (Lee 2010), we subtract CO i from 2 to obtain the structural holes S i (Wang et al. 2014; Guan et al. 2014) which is shown in Eq. (7).

$$ S_{i} = 2 - {\text{CO}}_{i} . $$
(7)

Table 3 provides some descriptive statistics and the correlation of the variables included in our analysis of the USA. We can obtain that control variables—stock, Structure holes, Centrality are all positively related to the output, which is consistent with early study.

Table 3 Means, standard deviations, and correlations (US)

We also analyze some descriptive statistics and the correlation of the variables included in our analysis of China. The result displayed in Table 4 is similar with Supporting the prior literature review, R it is negatively related to topic’s article output in both the USA and China (r = −.08, p < .01; r = −.15, p < .01, respectively), and a negative correlation is found that R ict is negatively related to its output (r = −.23, p < .01; r = −.21, p < .01, respectively).

Table 4 Means, standard deviations, and correlations (China)

As showed in Tables 3 and 4, there are no obvious bivariate correlations among explanatory variables. We also find that the highest VIF (variance inflation factor) is 3.01, which suggests that multicollinearity is not a concern (O’brien 2007). The dependent variable in our study is count variable and takes non-negative integer values (the number of articles). We find our dependent variables suffer from overdispersion. The mean of dependent variable is .64 and the SD is 2.55 in the USA, and the mean of the dependent variable is .91 and the SD is 3.65 in China. We use negative binomial models which can accommodate over dispersion (Cameron and Trivedi 2013). According to the results of the Hausman test (p < .001), we decide to utilize fixed effects models which are appropriate for the data (Hausman 1978). Table 5 shows the results of the Negative Binomial model with fixed effects. Model 1 included all control variables of the USA, and model 2 adds R it and R ict . After that, we put all control variables of China into model 3, and model 4 adds R it and R ict . There has been no significant difference between the USA and China. As we expected, models 1 and 3 with control variables indicate that stock and structure holes are positively associated with article output. However, centrality is negatively associated with article output non- significantly. In models 2, R ict has a significant negative impact on the innovation output (β = −.903, p < .01), and compared with that, the coefficient for R it is negative and non-significant (β = −.181, p > .05). In models 4, both R ict and R it exhibit the negative and significant influence (β = −.557, p < .01; β = −3.328, p < .01, respectively). As such, our results support that higher scientific relatedness leads to lower scientific output.

Table 5 Results of regression models

Conclusion

In this study, we comparatively understand the scientific relatedness of the USA and China between 2000 and 2013, taking a network approach in the solar energy field based on the panel data from WoS database. Both differences and similarities are found in the scientific research of the two countries. The results of our analysis generate the following findings.

While the literature productivity of the USA is the highest before 2011, China has shown an incremental increase and exceeds the USA in 2011, becoming a new major scientific contributor to the solar energy field. Both of countries show an exponential growth process in the numbers of knowledge topics. The USA demonstrates an obvious low growth rate in topic numbers, while China displays a noticeable rise so as to have developed into the largest scientific topic producer since 2010. Although the topic diversification of China is always lower than the USA, the gap between the USA and China is smaller over the recent years.

We apply social network method to extract country information and keywords from the database to visualize a 2-mode network, helping us to identify country-associated major topics. Word co-occurrences network is representative of research priorities which is widely used in capturing key knowledge topics. Comparative different structures of word co-occurrences networks are detected in the USA and China. Both of countries show the important associations with “solar energy”, “chemical synthesis”, “combining with other energy”. However, the USA displays the concerns on the “Efficiency and environment”.

Moreover, we construct keywords co-occurrences networks to analyze the relatedness between knowledge topic and topic, and then we make use of relatedness between knowledge topic and topic to develop the variable: relatedness between knowledge topic and country. As such, there are two kinds of scientific relatedness involved in this paper: (1) relatedness between topic and topic; (2) relatedness between topic and country. We also investigate the dynamic change of scientific relatedness over the time. A dramatically declined trend of scientific relatedness is found. We then investigate how close a topic is to country through analyzing the relatedness between topic and country. We obtain that 8 same topics are both ranked in the top 10 in the USA and China.

Furthermore, previous studies have used patent and literature documents to the influence of scientific relatedness on emergence and disappearance of topics (Boschma et al. 2014; Rigby 2013) and investigated the relationship between the knowledge relatedness and innovation outcome at the firm level (Tanriverdi and Venkatraman 2005). Our study investigates the impact of scientific relatedness on the innovation output in the solar energy field during the year 2000–2013. As expected, we find empirical evidence that scientific relatedness tend to negatively affect the scientific performance both in the USA and China. Our study contributes to the innovation and knowledge research.

The theory contribution of our research as follows. First, we contribute to the literature of scientific relatedness at the topic level (i.e. relatedness between topic and topic) and the country level (i.e. relatedness between topic and country). Secondly, we study on what extent the scientific relatedness exerts influence on the literature productivity at the country level. Finally, we make a better understanding of the research directions in the USA and China. Our work has potential implications for the future innovative research in the solar energy field.

Our work may help policymakers and researchers about the further scientific development at the country level. This is an urgent issue in solar energy sciences due to importance of energy development. Our results show that researchers should take into account the relatedness of the topics. At country level, the scientific relatedness between the topic and the country negatively affect the innovation performance in the next period. Thus, nation government should provide a policy direction and pay attention to support the topics which have low scientific relatedness with this country. These topics are cognitively far to the country, and they have great potential to be developed in this country. In addition, our results suggest that each country should clear directions for research and focus on the research topics which have low scientific relatedness instead of imitating other nations. Researchers should focus on the scientific topics which owning low relatedness in order to yield more scientific output.

The current study has several strengths. We differentiated two kinds of scientific relatedness and operate them in different ways. We empirically tested the relationship between scientific relatedness and innovation performance which fills the gaps in the previous studies. There are also some limitations in our study. We should note that our empirical setting is solar energy field; future studies should collect data from wider fields to check the consistency of results. In addition, we just investigate the relationship between scientific relatedness and the number of articles; future studies could probe the deep mechanism between them and the origin of scientific relatedness.