Introduction

A complex system is a system formed out of many components whose behavior is emergent. The behavior of the system cannot be simply inferred from the behavior of its components (Bar-Yam 1997). Emergent properties are the most often observed real world phenomena in a complex system. They are patterns and regularities which arise through interactions among smaller or simpler entities in a system that in themselves do not exhibit such properties. Moreover, they appear at many levels of observation and they are scale-invariant (Katz 2016a, b).

Innovation systems are frequently called complex systems, a term that is intuitively understood but poorly defined (Katz 2016a, b). Complex systems have generally accepted properties and particularly the property of scale-invariance (Baranger 2001; Marković and Gros 2014; Vicsek 2002). In particular, its scale invariant emergent properties. A scale invariant property can be identified because it is solely described by a power law function, f(x) = kx α where the exponent, α, is a measure of the scale invariance (Katz 2016a, b) which is mathematically defined as p(bx) = g(b)p(x) for any b (Newman 2005). If the scale or units are increased by which we measure x by a factor of b, the shape of the distribution p(x) is unchanged, except for an overall multiplicative constant (Katz 2016a, b). The power law or allometric function, f(x) = kx α, is the only mathematical function that possesses the property of scale-invariance.

There are two general types of scale-invariant relationships. The power law probability distributions and the power law correlations (Katz 2016a, b). The power law distribution is defined by p(x) = kx α k for x ≥ X min.Footnote 1 A power law correlation exists between two entities, x and y, if they are correlated by a power law given by the equation y = kx n, where n is the scaling factor and k is a constant (Katz 2005).

A global scientific community can be considered a complex innovation system. Baranger (2001) gives six features that characterize complex systems. (1) Complex systems contain many constituents interacting nonlinearly, (2) the constituents of a complex system are interdependent, (3) a complex system possesses a structure spanning several scales, (4) a complex system is capable of emerging behavior, (5) complexity involves an interplay between chaos and non-chaos and (6) complexity involves an interplay between cooperation and competition. If we closely examine a scientific community we may find these six Baranger (2001) criteria in action. The process of generating and disseminating new knowledge in scientific disciplines fosters the emergence of collaborative networks between academic, governmental or corporate institutions and countries that cooperate and yet at the same time are competing for scarce resources such as funds, personnel, equipment, R&D infrastructure, and the recognition of their partners.

As the knowledge dissemination process increases a complex network emerges composed of a large number of interconnected nodes whose connectivity (node degree) has a power law probability distribution characterized by p(x) = kx α for x ≥ X min (Katz 2016a, b; Newman 2004). The exponent, α, called the scaling factor and k are constants. For most real-world networks the magnitude of the scaling factor is in the range 2 < α < 3 (Clauset et al. 2009) meaning that it has infinite variance, which in turns makes the use of measures based on averages and central tendency inappropriate.

A complex innovation system dynamically evolves in non-linear ways giving it unique properties that distinguish it from other systems. It has been found that the network of citations to scientific papers of innovation systems tend to follow power law distributions (Brzezinski 2015; Clauset et al. 2009; Katz 2016a, b; Newman 2005). Also, power laws could take the form of a correlation between two exponentially growing variables resulting of innovation systems outputs, i.e. power law correlations between citations and collaboration patterns of papers have been reported recently (Coccia and Bozeman 2016; Coccia and Wang 2016; Ronda-Pupo and Katz 2016a).

The structure of, and the processes taking place inside such scale-invariant systems are the same over a broad range of spatial and temporal scales (Kwapień and Drożdż 2012). The scale-invariant property is one that is statistically similar at many levels of observation. If an innovation system is complex it does not reside in the domain of attraction of Gaussian distributions, and the Central Limit Theorem no longer applies and population averages cannot be used to characterize it (Newman 2011). Furthermore, for α ≤ 2 the mean and the variance are infinite; hence unlike power law distributions with α > 3.0 the conventional measures often used as indicators of performance of an innovation system are incapable of quantifying this property (Katz 2012b). This suggests that the use of scale-independent measures is mandatory to fully inform innovation policy.

The aim of the present study is to explore the power law relationship between the size (S) of a complex innovation systems (CIS) and its citation-based impact (I). The research questions for the present study are:

  • Are the size of a complex innovation system and its citation-based impact correlated according to a power law?

  • How much greater is the citation-based impact of a complex innovation system when over time it doubles its size?

To answer the research questions, we analyzed articles and reviews of the InCites™ database according to organization types. This paper elucidates how scale-invariant properties can be used to prepare scale-independent measures to inform decision makers about the effect that the size of complex innovation systems can have on research-based impact.

Background

The roots of studies on scaling behavior with bibliometric purposes dates back to the law of Lotka (1926). Naranan (1971) introduced power law as an estimation method to study Bradford law in journals. These studies were the basis of Egghe (2005) lotkaian informetric. The Lotkaian approach has been focused on characterizing distributions which satisfy Price’s Law and consequences for the Laws of Zipf and Mandelbrot (Egghe and Rousseau 1986).

Also, the power law approach has been used to analyze citation networks (Mayernik 2010; Milojevic 2010; Zhao and Ye 2013), citation distribution probabilities (Brzezinski 2015; Milojevic 2010; Thelwall 2016a, b, c), citation-based indicators (Egghe et al. 2009; Ye and Rousseau 2008) and research output (Sutter and Kocher 2001). The literature on scaling studies evidences that there are two general types of scale-invariant relationships, the power law probability distributions and the power law correlations (Katz 2016a, b). While the power law distributions have been extensively analyzed, the studies of power law correlations are less frequent.

Katz (1999) found a power law relationship between citations and size for the world science systems with exponent of 1.27 ± 0.03. This author also found that the scaling exponent was constant over time and relatively independent of system size and nationality. Katz (2000) found that the recognition measured through the number of citations of an institution, community or nation increases in a power law relationship or scale-independent manner with the size of the institution, community or nation. A positive and significant power-law correlation between international collaboration and size of countries was found to have an exponent of 1.14 ± 0.03 (r 2 = 0.95) by Katz (2000). Coccia and Bozeman (2016) found a power law growing process for international collaboration across applied research fields.

van Raan (2013) reported that important socioeconomic city characteristics such as wealth and innovation capacity exhibit a nonlinear, particularly a power law scaling with population size. van Raan (2008a) also reported a size-dependent cumulative advantage for the impact of universities in terms of total number of citations. No previous study has analyzed the power law relationship between size and citation impact according to organization types as complex innovation systems.

Katz (2006, 2012a, 2016a, b) analyzed the evolution of citation impact, R&D intensity and national wealth through scale-invariant models constructed using scaling correlations. The scale invariant approach will be used in the present study to illustrate how scale-invariant measures and models can inform decision makers on how the size of a complex innovation system predicts its impact.

Methods

A complex innovation system is defined as a system that produces new knowledge and disseminates it through papers published in journals. This process will show emergent scale invariant properties. These properties can be observed in some or all outputs of the system. For example, the nodes in the structure of the collaboration network that emerges from the information generation process shows a power law distribution (Girvan and Newman 2002; Newman 2001, 2004). The distribution of citations to peer-reviewed papers published has been shown to be scale-invariant at many levels of observation ranging from the system as a whole to domains, fields and subfields (Katz 2016a, b; van Raan 2008b). Also, the number of citations and the number of papers the system produces will be power law correlated (Katz 2006; Ronda-Pupo and Katz 2016a, b). In the present study, we analyze two of the characteristics of a complex innovation system. Firstly, its publishing size which is defined as the knowledge it produces and disseminates through the papers it has published in top-tier journals and secondly the impact these papers have had on the scientific community which is measured by the number of times they are cited in other papers published by peers.

Definition of variables and the model

Dependent variable

Citation-based impact (I) is the total number of citations received by Web of Science articles and reviews published by complex innovation systems in the time frame analyzed. This way the citation count is a partial measure of the impact the innovation system has had in the scientific community.

Independent variable

Size (S) The word ‘size’ refers to an amount of something that is measured. The size of an innovation system can be measured by the number of investigators, the amount of budget it receives or the number of grants it wins, among others. Frame and Carpenter (1979) pioneered measuring scientific sizes using the number of papers published. In the present study, we measure the size of a complex innovation system through the number of Web of Science documents (articles and reviews) which it had published during the time frame analyzed. That way we refer to the number of documents published by a system as a partial measure of its size.

The model

Any pair of coupled exponentially growing or decaying processes show a scaling correlation (Katz 2005; Sahal 1981). The scaling exponent for a power law correlation between two exponentially growing functions is given by the ratio of the exponential exponents. That is, α = β 2/β 1 where \(x \approx e^{{\beta_{1}^{t} }}\) and \(y \approx e^{{\beta_{2}^{t} }}\) are the exponentially growing parameters (Katz 2012b). A similar approach was used in (Coccia and Bozeman 2016) to study the evolution of international collaboration patterns.

The model for the study follows the power-law model used in Ronda-Pupo and Katz (2016b). Where I stands for citation-based impact, S for size of complex innovation systems, γ is a constant (intercept) and α is the scaling factor (slope of the log–log regression line).

$$I = \gamma S^{\alpha }.$$
(1)

The logarithmic transformation of the model is expressed as a simple linear relationship:

$${\text{log(}}I ) { = }\alpha \,{\text{log(}}S ) {\text{ + log(}}\gamma )$$
(2)

The parameters of the correlation γ and α are calculated with the Ordinary Least Squares (OLS) method because it produces fitted values with the smallest error being defined as \(\sum {\left( {y_{i} - \hat{y}_{i} } \right)}.\) (Leguendre and Leguendre 2012). Furthermore, the aim of the study is to predict citation-based impact on size of complex innovation systems. In this case OLS is the appropriate method because it is asymmetrical (Smith 2009).

The symbol α will be used to denote the exponent of the scaling correlation between citation-based impact and the size of complex innovation systems. It measures the scale-invariance properties of this relationship. Also, α is considered a measure of the magnitude of the Matthew Effect or cumulative advantage. Its value could fall into three ranges. When α = 1 both citation-based impact and size of complex innovation system grow at the same rate. There is no cumulative advantage of one variable over the other. For researchers from biology and ecology communities, there is isometry (Warton et al. 2006). When α > 1, the dependent variable increases more quickly than the independent variable, there is a super linear relationship and a positive cumulative advantage of citation based impact over the size of the system. Citations scales positively with the size of the system (positive allometry). Finally, when α < 1, the dependent variable increases less quickly than the independent variable, there is a sublinear relationship and a negative cumulative advantage of citation based impact over the size of the system. Citations scales negatively with the size of the system (negative allometry).

Data source

The data source for the study consists of all articles and reviews from InCites™ dataset updated 2016-06-13. It Includes Web of Science content indexed through 2016-04-04. The data retrieval was exported 2016-06-17. To prepare the data we downloaded the information for Entity types = Organizations, filtered by: organization types. Thomson Reuters assigns each unified organization into a unique organization type to facilitate filtering for broad grouping. Table 1 shows the organization types and their description according to Thomson Reuters (Reuters 2016). Document type = article and review; Thresholds: web of science documents: minimum = 2, Citations: minimum = 2. Time = 1980 through 2012, inclusive.

Table 1 Organization types and their description.

Results

Table 2 shows sizes and citation-based impact of organizations included in the present study according to classification types.

Table 2 Size and citation-based impact according organization types

Figure 1 and Table 3 show the results of the power-law correlation for citation-based impact and size according to organization types. Normality for all datasets was tested and met (Shapiro–Wilk) and met. Also, constant variance passed for all datasets. The median of the scaling exponent of the power law correlation is 1.19 ± 0.02 suggesting that the median of citation impact of complex innovation systems increases non-linearly with their size about 21.19 ± 0.02 or 2.28 times when they have doubled their size. Also, results show that academic systems have a stronger cumulative advantage. The exponent for this type of organizations is 1.22 ± 0.01. The I of Academic Systems increase 21.22 ± 0.01 or 2.32 times when it doubles its size. The scaling exponents for the complex innovation systems analyzed in the present study are similar to the exponents reported in Katz (1999) for the research fields Education, Medical, Industry and others for the UK science system.

Fig. 1
figure 1

Power law relationship between citation-based impact (I) and size (S) of complex innovation systems (CIS) according their classification

Table 3 Scaling exponents (α) and standard deviations (SD) of size (S) predicting citation-based impact (I)

The Student’s t statistic was used to verify whether all complex innovation systems under study scale with their size. Table 3 shows the Student’s t statistic for all complex innovation systems according to their Thomson Reuters classification. All values statistically differ from 1, meaning that the relationship is not linear. In addition, to ascertain the existence of a true power law correlation we followed Leguendre and Leguendre (2012, p. 550, see Eq. 10.13). The procedure uses the parameters α (scaling exponent) and r (Pearson correlation) from Ordinary Least Squares (OLS) to calculate the scaling correlation through Standardized Major Axis (SMA) see Eq. (3) below. If α SMA  ≈ α OLS the variables are highly correlated. When no correlation exists always α SMA  > α OLS . The results for all correlations show α SMA  = α OLS . These results support the existence of a scaling correlation between citation impact and the size of complex innovation systems.

$$\alpha_{SMA} = \frac{{\left| {\alpha_{OLS} } \right|}}{{r_{xy} }}\quad \text{when}\;r_{xy} \ne 0.$$
(3)

Discussion and conclusion

The answer to research question one of the present study is: a power law correlation between I and the size of a complex innovation systems exists. The I scales with the Size of the innovation systems. The median of the scaling exponent is about 1.19 ± 0.02. The scaling exponents for organization types, with the exception of Academic Systems, show no significant differences. This suggests that the I of innovation systems is not dependent on the type of the system but on its size. The scaling exponent for Academic Systems was 1.22 ± 0.01 suggesting there is a citation cumulative advantage over the rest of organization types. This behavior could be due to the fact that these organizations include a cluster of universities that cover all research areas, and also because they may have higher budgets to do research than the rest of organizations. Similar results were reported for research fields within a national science system (Katz 1999).

The answer to research question two of the study is: The innovation systems included in the study increase their I ≈ 21.19± 0.01 or 2.28 times when they double their size. The scaling exponent >1 suggests that citation impact increases more quickly than the size of the organizations. This result supports Katz (2000) hypothesis which is that the recognition of organizations grows nonlinearly as a function of their size.

In the present study we have found scale-invariant emergent properties of the complex innovation systems under study. These properties can be qualified and quantified using the parameters of scale-invariant distributions and correlations. These parameters can be used to prepare measures and models useful for informing public policy about scale-invariant emerging properties of a complex innovation system. Policy makers and research evaluation and founding agencies could derive performance indicators by dividing observed values by expected values calculated using a power law regression on the correlation data addressing the non-linear properties of the citation impact of a system. Also, the scaling exponent of the power law correlation can be used to predict the future impact of a complex innovation system according to the increase of its size as suggested by Katz (2000). Scale independent indicators can be constructed using the power law results to compare the observed citation impact of a complex innovation system with its expected citation impact and according to its size. For example, the exponent of the power-law relationship between a pair of variables, y = γx αcan be used to produce a relative y measure that is scale independent (Ronda-Pupo and Katz 2016a). Using this relationship, the expected y-value, y e , for a given x and α can be calculated. Then, using the observed y value, y 0, for a given x, one can calculate the scale-independent indicator, RI, given by

$$RI(x,n) = y_{0} /y_{e}$$
(4)

When RI = 1, the observed citation-based impact equals the expected impact. If RI > 1 then the observed citation-based impact is above the expected, and conversely, if RI < 1, the observed citation-based impact is below the expected. The innovation system with the higher RI is the one with the best citation impact.

This research opens new research lines such as the study of the scaling relationship between I and size of innovation systems at a higher level of disaggregation i.e. research domains such as applied sciences, natural sciences or health sciences or at the level of national innovation systems. Also, exploring the scaling relationship between size and I at the level of national science systems would favor the evaluation of public policies on scientific research.