1 Introduction

The role of knowledgeFootnote 1 is considered as crucial for the innovation of regions. But is it the ‘internal knowledge’ generated within the region that matter or is it ‘external knowledge’ brought into the region through trade networks? Moreover, considering either of these two intra/extra-regional sources of knowledge, is it the ‘intensity’ of the knowledge that matters or is it rather the ‘variety’ of knowledge? These questions have recently received attention in the literature, but few studies try to address them in a common empirical setting. Some are still unexplored, such as the role of ‘variety’ (related and unrelated variety) for regional innovativeness, and are still open both in the traditional and in the more recent literature (see Beaudry and Schiffauerova 2009; De Groot et al. 2009).Footnote 2 This paper directly bears on these questions, and provides an empirical analysis of the role of variety and intensity of internal and external knowledge for regional innovation.

A common argument in the literature is indeed that innovation depends on the ‘intensity’ and ‘variety’ of knowledge (Audretsch and Vivarelli 1996; Audretsch and Feldman 2004; Beaudry and Schiffauerova 2009), reflecting a general shift from cost- to knowledge-based models of regional growth. Knowledge-based theories of regional growth and innovation (see for example Maskell 2001; Maskell and Malmberg 2002) emphasize the nature of local knowledge (Tallman et al. 2004), the intensity (and frequency) of knowledge transfer processes among local firms (Gordon and Mccann 2000; Mesquita 2007), and the variety of knowledge in the region (Jacobs 1969; Glaeser et al. 1992; Frenken et al. 2007). It is further argued that the process of knowledge generation is not only activated by means of interactions between firms located inside regions. It can also be brought into the region from outside, through international trade networks (Castellani 2002; Bathelt et al. 2004; Keller 2004).

The aim of this paper is to empirically test these theoretical arguments in a common empirical setting. The baseline question is: what is the role of intensity and variety of internal and external knowledge for regional innovation? This is indeed not a novel question, but the contribution of the paper is to demonstrate the role of knowledge for the regional innovation while bringing together various streams of literature, i.e. the literature concerning regional knowledge production function (RKPF) and the literature concerning international technology diffusion. Furthermore, the paper follows recent literature (Frenken et al. 2007) and distinguishes between related and unrelated variety, making it possible to empirically assess the influence of each type of variety on regional innovation. To the best of the authors’ knowledge, this is the first attempt to implement the idea of related and unrelated knowledge variety in a study of regional innovation.Footnote 3

Using panel data on patent applications distributed across 81 Swedish functional regions, the analyses in the paper show that both the intensity and variety of internal and external knowledge matter for the regional innovation, where related variety seems to play a superior role compared to unrelated variety.

The rest of the paper is organized as follows. Section 2 develops five hypotheses concerning the role of knowledge on regional innovation. Section 3 presents the dataset and the methodology followed in the empirical analysis. Section 4 presents the main findings of the paper. Section 5 summarizes and concludes.

2 The role of intensity and variety of regional knowledge on the regional innovation

A number of studies, both in the field of regional economics and strategic management, have recognized that sustained competitive advantages of regions are related to the ability of regional firms to develop and maintain their innovation (Grant 1996; Krugman 1991; Maskell and Malmberg 1999; Saxenian 1996). Such innovation is furthermore argued to be dependent on the ‘intensity’ and ‘variety’ of the knowledge sources available for a region (Audretsch and Vivarelli 1996; Audretsch and Feldman 2004; Beaudry and Schiffauerova 2009).

One source of knowledge is “internal” knowledge sources generated within a region. A large literature on the knowledge production function (KPF) show that new knowledge is essentially generated via “intensity” of R&D activities (Griliches 1979) carried out by firms, university, and research centres (Audretsch and Feldman 1996; Acs et al. 2002). The original Griliches’ firm-level KPF framework has been translated to the regional level, so-called regional knowledge production function (RKPF) (Jaffe 1989; Feldman and Florida 1994). This literature emphasizes that the relevant knowledge for many local firms is the knowledge that ‘spills over’ from local R&D activities.

In addition, it is shown that the accumulation (intensity) of knowledge per se is not sufficient for a strong innovative performance. The “variety” of knowledge inside a region also matters (Jacobs 1969; Saviotti 1996). Knowledge variety refers to that the knowledge, know-how, and expertise in a region is most often heterogeneous. Exposure to heterogeneous knowledge should improve both the creative potential of firms in the region as well as their ability to develop innovation (Rodan and Galunic 2004). This associates to Schumpeter’s (1934) idea of “novelty by combination”. Duranton and Puga (2001) called the regions with available variety of knowledge as “nursery cities”, because these regions allow firms to try a variety of processes before finding their ideal process innovation, without costly relocation after each trial. As a consequence, the existence of variety of regions’ internal knowledge can be considered as an important factor to explain a region’s innovation. In this argument, the source of knowledge is external to the firms but still internal to the region, highlighting the role of region (location) on innovation (Feldman 2003).Footnote 4 Drawing on above, the innovation of a region rises with both the intensity and the variety of the internal knowledge within the region (Asheim et al. 2011; Berliant and Fujita 2011). This leads to formulate the following two hypotheses:

Hp1

The higher the intensity of the region’s internal knowledge, the higher will be the region’s innovation.

Hp2

The higher the variety of the region’s internal knowledge, the higher will be the region’s innovation.

It is also expected that such variety of knowledge can have even more positive impact on innovation, if it is a “related” variety rather than “unrelated” variety. The notion of related variety aims to capture the balance between cognitive proximity and distance across sectors in a region that is needed for knowledge to spill over effectively between sectors (Nooteboom 2000). The unrelated variety measures the extent to which a region is diversified in very different types of activity. According to Frenken et al. (2007), the higher the number of technologically related sectors in a region, the higher inter-sectorial knowledge-spillovers between those related sectors, and presumably the more learning opportunities for them. This will eventually enhance regional innovation (Feldman 1994). The importance of knowledge-spillovers for regional innovation is illustrated by the following statement in Audretsch and Feldman (2004, p. 2719): “innovative activity should take place in those regions where the direct knowledge-generating inputs are the greatest (e.g. R&D investment), and where knowledge spillovers are the most prevalent (which can be achieved by higher related variety in a region, as noted above)”. The benefit of related variety of knowledge for innovation-related measures is shown both in the firm-level (Breschi et al. 2003) and in regional-level studies (Feldman 1994; Feldman and Audretsch 1999; Ejermo 2005; Antonietti and Cainelli 2011). In Swedish case, Ejermo (2005) finds a positive effect of “weighted-average-relatedness of neighbours” (WARN) on the patent application. Such a WARN index gets higher value if the “relatedness” between the patenting activities in a region gets higher. The unrelated variety effect, on the other hand, captures the portfolio-effect, which functions as a regional shock absorber (Essletzbichler 2007). That is, when a region has a large number of unrelated industries, it may not be too vulnerable to sector-specific shocks, such as unemployment (Boschma et al. 2012). While unrelated variety may be seen as a variety in general and hence beneficial for innovation, yet, related variety is expected to be more important. The main argument here is that the more related industries in a region, the more possibility of intra-regional knowledge spillover between these related industries, which, consequently, may increase the chance of knowledge generation and hence increasing the regional innovation. This leads to the following additional hypothesis:

Hp3

The impact of related variety on the region’s innovation is higher than the impact of unrelated variety.

Processes of knowledge generation and combination are not only activated by means of interactions between local knowledge resources. Knowledge can also be brought into a region from “outside” through the international trade networks. Here, the knowledge sources are external both to the firm and to the region, i.e. global pipeline (Bathelt et al. 2004).Footnote 5 This is in line with the literature on international technology diffusion, which proposes international trade as a conduit for flow of knowledge into the local firms within the regions (Keller 2004). This is also in line with ‘Learning-By-Exporting’ literature, arguing that firms that trade internationally have better access to knowledge about customer preferences, production techniques and foreign technology, which in turn may stimulate innovation and productivity (Clerides et al. 1998; Castellani 2002). All these literature emphasize on the importance of the intensity of the external knowledge as the source of knowledge generation (and consequently innovation capability) within the regions. In addition, there are recent evidence suggesting that the more “related” import and export portfolio of the region, the more learning opportunity within the region (Boschma and Iammarino 2009; Boschma et al. 2012), which in turn can imply for more innovation of the region. Such arguments lead to the following two hypotheses:

Hp4

The higher the intensity of the external knowledge brought into the region, the higher will be the region’s innovation.

Hp5

The higher the related external knowledge brought into the region (the more imports and exports are related), the higher will be the region’s innovation.

To test the five proposed hypotheses, an econometric analysis on 81 functional regions in Sweden over 2002–2007, controlling for a set of control variables, is applied. This is presented in the rest of the paper.

3 Empirical analysis

3.1 Data

The geographic unit in the empirical analysis is functional regions. The Swedish Development Agency (NUTEK) has divided Sweden into 81 functional regions, each composed of several municipalities. The basic criteria for such division have been the common local labour market (LLM) and commuting time (NUTEK 2005). Andersson and Karlsson (2007) find that knowledge flows in Sweden transcend municipal borders, but they tend to be bound within functional regions. This is because of the fact that functional regions differ from each other in terms of their production of and access to knowledge (Karlsson and Johansson 2006). This makes it plausible to choose functional regions as the unit for the analysis of innovation, as this level of aggregation should mean that a large part of spatial dependence is internalized within the unit of analysis. Moreover, as it will be shown in Fig. 1, patent applications in Sweden are strongly concentrated in the "islands of innovation" rather than neighbouring regions. This adds additional support that spatial dependency should not be a major concern in this study. Appendix 1 shows the map of the Swedish functional regions.

Two sources of statistics are used to build the dataset: Statistic Sweden (SCB) and the European Patent Office (EPO) database. Data on R&D investment, total employment in two-digit and five-digit NACE industry, higher educated employment, import and export, living place of inventors, and population during 2002–2007 all originate from Statistics Sweden. The European Patent Office (EPO) database provides patent data for Sweden which covers the period 2002–2007.Footnote 6 It accounts up to 85 % of all Swedish patent applications in this period. A Swedish patent application is the one that has at least one inventor with living address in Sweden. Patents are regionalized according to the place of the inventors. If a patent application has more than one inventor, following Jaffe et al. (1993), it is equally fractionalized based on the number of inventors. For instance, if a patent application has four inventors, each inventor (and the corresponding functional region that s/he lives) receives 25 % of that patent application. The final dataset is the result of merging the data concerning the determinants of regional innovation with patent data, which provides a balanced panel dataset of 486 observations consisting 81 units (functional regions) over the 6 years period (2002–2007).

3.2 The model and measurements

Analysing the determinants of regional patent (or other measure of innovation like product announcement) is extensively performed in so-called Regional Knowledge Production Function (RKPF) framework (Jaffe 1989; Feldman 1994; Feldman and Florida 1994; Acs et al. 2002). This framework shifts the unit of analysis from traditional firm-level to regional-level, while maintaining the original Cobb–Douglas specification (Audretsch and Feldman 2004). The general specification of RKPF framework is:

$$y_{r} = \alpha {\mathbf{X}}_{r}^{\beta }$$
(1)

where y r is inventive or innovative output in region r, and X r is the vector of innovation inputs within the region r, such as R&D investments, inter-industry knowledge spillover within a region (usually measured by the concentration of related manufacturing industries in the region), and human capital (described in Hp1, Hp2, Hp3).Footnote 7 Therefore, the main innovation inputs in RKPF framework have been considered to be “internal” sources of knowledge generated within the region. This paper indeed extends the RKPF specification by adding the “external” sources of knowledge brought into the region, as additional inputs (explanatory variables) of innovation (output) of regions. This is motivated by international technology diffusion theory and also so-called Learning-By-Exporting literature (described in Hp4, Hp5). Such extended specification will be presented in this section, after the proper estimator is chosen.

The negative binomial regression model is applied in order to estimate the relationship between regional innovation, proxied by patent applications, and its determinants presented in Sect. 2. The reason for choosing such estimator is because of the special feature of the dependent variable. The dependent variable patent application is count data.Footnote 8 It also suffers from over-dispersion as the sample variance is 273 times the sample mean.Footnote 9 In order to handle this situation, the literature suggests several models such as negative binomial, zero-inflated negative binomial (ZINB), and hurdle models (Cameron and Trivedi 2008).Footnote 10 The dependent variable does not have many zero values. Only 24 % of the patent values are zero in the sample (119 out of 486 observations), which provides little justification for using a ZINB model.Footnote 11 Indeed, a Vuong test of zero-inflated negative binomial versus (standard) negative binomial speaks in favour of (standard) negative binomial. The hurdle model is not a preferred model, too, for the same reason as for the zero-inflated models. Using the negative binomial model, the formulation for the regional innovation, which is an extension of RKPF, is written as follows:

$$\bf \Pr (y_{{rt}} = \widetilde{{y_{{rt}} }}|{\mathbf{X}}_{{1rt}} {\mathbf{X}}_{{2rt}} {\text{ }}Z_{{rt}} ,\sigma _{{rt}} ) = \frac{{e^{{ - \lambda _{{rt}} }} \times (\lambda _{{rt}} )^{{\widetilde{{y_{{rt}} }}}} }}{{\widetilde{{y_{{rt}} }}!}}\quad \widetilde{{y_{{rt}} }} = 0,1,2,3, \ldots$$
(2)

where,

$$\lambda_{rt} = \exp (\beta_{1} \varvec{X}_{1rt} + \beta_{2} \varvec{X}_{2rt} + \beta_{3} \varvec{Z}_{rt} ) \times \exp (\sigma_{rt} )$$

where y rt is the number of patent applications in functional region r in year t, X 1rt is the vector of internal knowledge variables, X 2rt is the vector of external knowledge variables, Z rt is the vector of control variables, β j are the coefficient parameters to be estimated, and exp(σ rt ) is assumed to have a gamma distribution with mean 1 and variance alpha, which can be estimated from the data. Alpha is the over-dispersion parameter, which corrects for the over-dispersion by adjusting the variance independently from the mean (Cameron and Trivedi 2008).

The Likelihood-Ratio (LR) test of panel versus pooled has been always in favour of panel models (reported in Table 2), hence the panel application of negative binomial model is chosen. For the dependent variable and most of the regressors, the vast majority of the variation in the data consists of the between-variation rather than the within-variation. Therefore, the fixed-effects estimator may not be very efficient, since it relies on within-variation (Andersson and Lööf 2011). Furthermore, it is argued that fixed-effects estimator may even wrongfully include the impact of those variables which exhibit only slight changes over time, i.e. in this paper Related and Unrelated Variety (Fritsch and Slavtchev 2007). Therefore random-effects estimator (RE) is used in the panel models.Footnote 12

3.2.1 The dependent variable

The phenomenon under study is the innovation of Swedish functional regions. The number of patent applications for 81 Swedish functional regions during 2002–2007 is used as a proxy for it (see Jaffe and Trajtenberg 2005; Acs et al. 2002). Patents have been found to be a good proxy of innovative activity in general (Griliches 1990) and for regional-level analysis in particular (Acs et al. 2002).Footnote 13 This is because patents are granted for inventions which are novel, inventive, and have industrial application (Andersson and Lööf 2011).Footnote 14

3.2.2 Independent variables

Based on the underlying theories discussed in part 2, the independent variables in this paper are grouped into two categories: (1) internal knowledge and (2) external knowledge.

Internal knowledge: To capture the characteristics of the region’s internal knowledge, in terms of intensity and variety, three different variables are included: the intensity of R&D activities of the region, the unrelated variety of knowledge within the region, and the related variety of knowledge within the region. The first one captures intensity aspect and second and third variables capture variety aspect.

The measure of the intensity of R&D activity (R&D rt ) is the log amount of R&D investments of the region r in year t in Million Swedish Kroner (MSEK).Footnote 15

The unrelated variety (URV) of knowledge within the region is measured as the entropy at the two-digit level (Frenken et al. 2007; Boschma et al. 2012). The unrelated variety (URV) index for region r in year t is given by:

$${\text{URV}}_{rt} = \mathop \sum \limits_{g = 1}^{G} {\text{Pg}}_{rt} \log_{2} \left( {\frac{1}{{{\text{Pg}}_{rt} }}} \right)$$
(3)

where, Pg rt is the employment share in 2-digit industry NACE code for region r in year t and G is the maximum number of two-digit industries in region r and year t. Theoretically, URV rt gets the minimum value of 0 and maximum value of log2G. The minimum value happens when all employees in the region working in the same 2-digit industry, hence no variety exists at all.  The Maximum value happens when there is an equall distribution of employees over all 2-digit industries. In this case G encompasses all 2-digit sectors of economy. Footnote 16

For measuring the related variety, following Frenken et al. (2007), it is assumed that five-digit industries are technologically related when they share the same two-digit class. These industries are perceived to show some degree of cognitive proximity, because these five-digit sectors (e.g., sub-branches in chemicals) will share some technology and product characteristics in the same two-digit class (e.g., chemicals). At the same time, these industries are considered to show some degree of cognitive distance, because these sectors differ at the five-digit level. Then, the more sectors at the five-digit level within each two-digit level in a region, higher the value of related variety. Therefore, related variety (RV) in region r and time t, as the weighted sum of entropy within each two-digit sector, is given by:

$${\text{RV}}_{rt} = \mathop \sum \limits_{g = 1}^{G} {\text{Pg}}_{rt} {\text{Hg}}_{rt}$$
(4)

where:

$${\text{Hg}}_{rt} = \mathop \sum \limits_{{i\in {\text{Sg}}}} \frac{{{\text{Pi}}_{rt} }}{{{\text{Pg}}_{rt} }}\log_{2} \left( {\frac{1}{{Pi_{rt} /Pg_{rt} }}} \right)$$

where, Pi rt is the employment share in 5-digit NACE code for region r in year t, Pg rt is the employment share in 2-digit NACE code for region r in year t, G is the maximum number of two-digit sectors in region r and year t, I is the maximum number of five-digit in the region, and all five-digit sectors i fall exclusively under a two-digit sector Sg. Similiar to URV rt , RV rt gets the theoretical minimum value of 0 when all employees of the region in 2-digit industry g are working within the same the same 5-digit industry i, where i ∈ Sg. The maximum value, log2I, is achived when there is an equal distribution of employees over all 5-digit industries i, where i ∈ Sg.

External knowledge: To capture the characteristics of the region’s external knowledge, in terms of intensity and variety, two different variables are included: the amount of export and import, as intensity measure, and related trade variety index, as variety measure.

The measure adopted as a proxy for the intensity of external knowledge brought into the region relates to the amount of the international trade linkages of each functional region (Boschma and Iammarino 2009). It is measured as:

$${\text{IMP}}\_{\text{EXP}}_{rt} = {\text{Ln}} (Import\,value_{rt} + Export\,value_{rt} )$$
(5)

where, \(Import\,value_{rt}\) and \(Export\,value_{rt}\) are the value of the import and export in manufacturing in region r in year t, respectively. The higher the value of IMP_EXP rt , the greater is the external knowledge that flows into the region.

The measure adopted as a proxy for the variety of external knowledge brought into the region is the Trade Related Variety (Boschma et al. 2012). It aims to measure the extent to which the export portfolio of a region is related to its import portfolio. Let 1 be a five-digit industry within the two-digit class I(i), with i = 1,…, n. Then following Boschma and Iammarino (2009), trade related variety (TRV) is given as follows:

$${\text{TRV}}_{rt} = \mathop \sum \limits_{i} Import_{5}^{M} \left( i \right) Export_{5} (i)$$
(6)

where, Import M5 (i) is the import entropy in five-digit industries other than 1, but within the same two-digit industry I(i), i.e. (i ≠ 1). Export 5 (i) is the relative size of the five-digit export industry 1 (with i = 1,…, n) in the entire provincial export.

3.2.3 Control variables

Several control variables are considered: population density, human capital, manufacturing concentration index, high-technology manufacturing concentration index, number of high-tech large manufacturing firms, and year dummies. Population density (POPULATION) controls for the size of the regions and captures urbanization economies. It is measured as the population per square kilometre in each region each year. Population density is expected to have positive effect on innovation (Feldman 1994).

Human capital (HC) is a standard variable in KPF framework. It is shown to have the significant positive impact on innovation in firm level (Andersson and Ejermo 2005) and regional level studies (Lee et al. 2010). Such a positive effect on innovation is motivated generally by endogenous growth theory (Romer 1986), and specifically by Lucas (1988)’s model, arguing that the ability to develop new technology depends on the average level of human capital in the local economy. It is measured as the fraction of higher educated employees (employees with three or more years of university education) in region r year t.

It is shown that sectors differ in their propensity to patent. First of all, service sector is less likely to patent its knowledge production compared with manufacturing (Hipp and Grupp 2005). In order to incorporate this argument, the paper includes location quotient of the manufacturing specialization (LQ_MAN). It is calculated as follows:

$${\text{LQ}}\_{\text{MAN}}_{{rt}} = \frac{{\left( {\frac{{{\text{Manufacturing}}\,\,{\text{employment}}\,\,{\text{in}}\,\,{\text{region}}\,\,r\,\,{\text{year}}\,\,t}}{{{\text{Total}}\,\,{\text{employment}}\,\,{\text{in}}\,\,{\text{region}}\,\,r\,\,\,{\text{year}}\,t}}} \right)}}{{\left( {\frac{{{\text{Manufacturing}}\,\,{\text{employment}}\,\,{\text{in}}\,\,{\text{economy}}\,\,{\text{year}}\,t}}{{{\text{Total}}\,\,{\text{employment}}\,\,{\text{in}}\,\,{\text{economy}}\,\,{\text{year}}\,t}}} \right)}}$$
(7)

The higher the value of LQ_MAN, the higher concentration of manufacturing sectors in the region. This variable controls for the fact that manufacturing sectors have higher propensity to patent than service sectors, and expected to have positive sign (Fritsch and Slavtchev 2007; Paci and Usai 1999).

Second, even within manufacturing, sectors have shown different behaviour in terms of propensity to patent, because different sectors have different technology and innovation opportunities, and are thus characterized by different technological regimes (Malerba and Orsenigo 1997). For instance pharmaceutical and chemical sectors are more likely to patent because they are in high-tech manufacturing sectors (Scherer 1983). As for controlling this second point, the paper includes location quotient of the High-Tech manufacturing sectors (LQ_HT).Footnote 17 It is calculated as follows:

$${\text{LQ}}\_{\text{HT}}_{{rt}} = \frac{{\left( {\frac{{{\text{High-Tech}}\,{\text{manufacturing}}\,\,{\text{employment}}\,\,{\text{in}}\,\,{\text{region}}\,\,r\,\,{\text{year}}\,\,t}}{{{\text{Total}}\,\,{\text{employment}}\,\,{\text{in}}\,\,{\text{region}}\,\,r\,\,{\text{year}}\,\,t}}} \right)}}{{\left( {\frac{{{\text{High-Tech}}\,\,{\text{manufacturing}}\,\,{\text{employment}}\,{\text{in}}\,\,{\text{economy}}\,\,{\text{year}}\,\,t}}{{{\text{Total}}\,\,{\text{employment}}\,\,{\text{in}}\,{\text{economy}}\,\,{\text{year}}\,\,t}}} \right)}}$$
(8)

This variable is also expected to have positive effect on innovative capability.Footnote 18 Both LQ_MAN and LQ_HT are controlling for heterogeneous industry structure across regions, which stem from the heterogeneous propensity of sectors to patent.

One would like to see if the result is driven merely by the presence of a few large firms in industries prone to patenting activity or not. In order to account for this, the number of large firms (firms with 500 or more employees) in High-Tech manufacturing sectors in the region (LRG_HT) is included as another control variable. It is expected that the presence and dominance of large firms would have negative effect on regional innovative capability, ceteris paribus (Acs et al. 1994, 2002).

Finally, year dummies are included to capture heterogeneity between years. This will take into account the macroeconomic effects and business cycles that may affect regional patent applications. In order to reduce simultaneity concerns, 1 year lag for all right hand side variables is used in the subsequent empirical analysis.Footnote 19 The definitions of all variables are documented in Appendix 2.

3.3 Data descriptions and correlations

The distributions of main variables, i.e. PATENT, R&D, RV, URV, EXP_IMP, TRV, over functional regions are illustrated in Fig. 1. The values are the average value during 2002–2007.

Fig. 1
figure 1figure 1

Distribution of main variables over Swedish functional regions (average value during 2002–2007)

The patent applications are geographically concentrated in Sweden, which is well in line with previous findings in US and Europian studies (Audretsch and Feldman 1996; Breschi 1999). In other words, it also shows the clear evidence of over-dispersion of patent application (Bettencourt et al. 2007), with four regions over performing than other regions, i.e. Stockholm, Gothenburg, Malmö, and Linköping, in which the first three ones are the three Swedish metropolitan areas. On the other extreme, five functional regions do not have any patent application during the study period, i.e. region 66, 67, 73, 76, and 78 (see Appendix 1 for the name of regions). One interesting point is that RV is the highest exactly among those four regions with highest patent applications, while this is not the case for URV. This is already an initial indication of superior role of RV compared with URV. Another interesting point is the regional portfolio of Stockholm, as the highest producer of patent applications: Stockholm develops its variety more in a related sense, rather than unrelated sense.

The correlation matrix and descriptive statistics for all variables are presented in Table 1.

Table 1 Correlation matrix and descriptive statistics

Not only internal but also external knowledge sources (both intensity and variety aspects) are positively correlated with patent application. Interestingly RV is correlated with patent application twice as the URV with patent application. This is again an indication for the superior role of related variety compared with unrelated variety, as noted in Fig. 1. The variance inflation factor (VIF) test is performed to check for multicollinearity between independent variables. All independent variables got the value of lower than 4 in this test and the overall VIF score was equal to 2.16. Therefore, it is expected that multicollinearity does not substantially bias the regression results in part 4.Footnote 20

4 Empirical results

The results of negative binomial random effect estimation of knowledge-based determinants of patent application for 81 Swedish functional regions over the period of 2002–2007 are reported in the Table 2.

Table 2 Determinants of patent application (2002–2007); panel negative binomial estimation

First model (column (1)) considers only the effect of the intensity and variety of the region’s “internal” knowledge. Second model adds the intensity and variety of the “external” knowledge. Third model controls for the size of regions by adding POPULATION. Fourth model takes into account the industry heterogeneity across regions by adding LQ_MAN and LQ_HT. Finally, fifth model add LRG_HT to take into account the possible dominancy of few (large) high-tech firms in the region. This model is the full model which includes all explanatory variables and control variables.

R&D investment and HC are positive and highly significant in all models, as expected. They show the importance of intensity of internal knowledge (i.e. generated inside the region) for innovation of the region. Human capital in particular signifies the importance of highly educated individuals for producing patents, and that there are positive externalities to schooling. URV is always positive showing the importance of variety (in general sense) for innovation. However, its significance diminishes from model 1 to the model 5 (full model). On the other hand, the interesting point is that the related variety of knowledge within the regions (RV) is always positive and significant, even after controlling for population (in model (3)), heterogeneous industry structure across regions (in model (4)), and firms size composition of regions (in model (5)). The robust result concerning RV can be explained as in Hypothesis 3: A region with higher RV can enjoy the higher learning opportunity and knowledge spill-overs between the existing related sectors within that region (compared with regions with high URV) (Frenken et al. 2007), which eventually lead to higher innovation for the region (Feldman 1994; Audretsch and Feldman 2004). These results confirm hypotheses 1, 2, and 3.

The intensity of external knowledge (IMP_EXP) is positive and significant in the all models in which it is included, thus confirming hypothesis 4. However, unlike the expectation, the TRV measure shows negative sign, nevertheless, the significance is weak and it is not a robust result to explain the innovation of the regions. This means the null of hypothesis 5 cannot be rejected. TRV indeed has shown some vague results in previous studies: while it shows the positive and significant effect on regional employment growth, it shows no significant impact on regional value-added growth and labour-productivity growth, even with the negative sign in the later one (Boschma and Iammarino 2009).

As for control variables, population density is always significant and positive, which is in line with previous research on RKPF framework (Feldman 1994). This shows the positive effect of scale or (pure) urbanization economies. Model (4) shows that regions with concentration of manufacturing sectors (LQ_MAN) in general and high-tech manufacturing sectors (LQ_HT) in particular are performing better in terms of applying for patent, as expected (Scherer 1983; Hipp and Grupp 2005). Moreover, controlling for such industry heterogeneity (i.e. acknowledging different propensity to patent in different industries) across regions did not changed the main results concerning internal and external knowledge. Finally in the full model, LRG_HT shows the negative sign. Hence it seems merely a few large firms prone to patenting activity are not driving the innovation of regions; rather the patenting activities seem to be also spread out between many smaller firms in the region.Footnote 21 This is in line with previous studies suggesting that regions dominated by small firms have higher innovation, ceteris paribus (Acs et al. 1994, 2002). However, the coefficient is not significant, makes one to be cautious about interpretation. Nevertheless, this variable controls for the possible concentration of patent activities in few (large) firms prone to patenting, and the main results remained the same.

Since the Maximum Likelihood Estimation (MLE) is used, one way to compare the models with each other is to use Akaike information criterion (AIC) or Bayesian information criterion (BIC). Both criteria get smaller when moving from model (1) to (2) (BIC is not reported in Table 2). This means that by adding the external knowledge variables in model (2), this model is getting better in terms of fitness compared with model (1), which only includes internal knowledge, while there is no evidence of over-fitting. In other words, internal knowledge and external knowledge together can produce the better fit for modelling the patent application compared with including only one of them. Controlling for POPULATION in model (3) further improved the model. While moving to model (4) did not improve the model, model (5), which is the full model, turns out to be the best model in terms of AIC. The same evidence is also obtained by performing the Likelihood Ratio test of restricted versus unrestricted models, when moving between models.Footnote 22 This can be seen as fulfilling the stated aim of the paper, i.e. to empirically test various theoretical conjectures (RKPF and international trade theory) in a common empirical setting.

The likelihood ratio (LR) test of including alpha, over-dispersion parameter, is reporter in Table 2, too. In all models the null hypothesis of alpha equal to zero in strongly rejected. This means over-dispersion parameter is significantly different from zero and thus shows (again) that the negative binomial is a preferred estimation strategy over the Poisson or zero-inflated Poisson models. Similar estimation choice has been preferred by previous studies using Swedish patent application data in regional-level (Ejermo 2005) and firm-level studies (Andersson and Lööf 2011).

One may argue that the dominant and positive effect of related variety on innovation may not be equally strong in various stages of Industry/Product Life Cycle. Specifically, one may expect the lower effect of related variety (and variety in general) in later stages of ILC, as the later stages are characterized by dominant design, standardization, and less heterogeneity of the firms and products (Vernon 1966; Utterback and Abernathy 1975; Duranton and Puga 2001). Yet, the data at hand does not allow considering the effect of ILC on related variety and other explanatory variables, leaving such interesting point for further research.

5 Conclusion

This paper analysed the effect of (1) intensity and variety aspects of internal knowledge and (2) intensity and variety aspects of external knowledge on the regional innovation, measured by patent applications. As for variety aspect, the paper distinguished between related and unrelated variety for internal knowledge, and used trade related variety to capture the variety of external knowledge.

The results of the empirical analysis show that the innovation of regions rises with both the intensity and the variety of the internal knowledge. The interesting point is that when it comes to variety of internal knowledge within the region, knowledge variety per se does not substantially affect regional innovativeness, as captured by ‘unrelated’ variety, but it has the robust and positive impact if it is a ‘related’ variety. This finding is line with previous literature, though this seems to be the first attempt to implement the idea of related and unrelated knowledge variety in a study of regional innovation (see Glaeser et al. 1992; Frenken et al. 2007). The results also suggest that the intensity of external knowledge flowing into a region has a positive effect on innovation of a region, which is in line with international trade diffusion (Keller 2004) and learning-by-exporting literature (Clerides et al. 1998; Castellani 2002; Andersson and Lööf 2009). The results are not robust concerning the effect of trade related variety, aiming to capture the relatedness of the import and export portfolio of regions. Specifically on internal and external knowledge, these two categories together produce a better fit for modelling the patent application compared with including only one of them. In other words, this is fulfilling the stated aim of the paper, i.e. to empirically test various theoretical conjectures (RKPF and international trade theory) in a common empirical setting. Further, some of the control variables included in the analysis provided additional insight.

In a nutshell, some regions produce more patent than others in Sweden because of several reasons: (1) they are better in generating internal knowledge (both variety and intensity aspect), (2) they are benefiting more from external knowledge flowing into the region from outside through international trade linkages (intensity aspect), (3) they are dominated by High-Tech manufacturing sectors, and (4) they are benefiting from urbanization (scale) economies.

What conclusions can be drawn with respect to policy? The main result shows that having related industries within a region enhances the regional innovation (because of knowledge spillover occurring between those related industries). This implies that regions need to be smart and develop the portfolio of complementary sectors, e.g. by having the related variety portfolio. Policy makers may help to this in two ways. First, one might think of a targeted policy for attracting the related sectors to the given region. However, it is shown that the entry and exit of sectors to the regions is governed by a self-selection process of firms (belonging to sectors) and associated path-dependency process, rather than targeted policy (Neffke et al. 2011). Instead, what policymakers may do is to look at the past, identify the potential sectors that could have contributed to the related variety portfolio of a given region (but they never came to the region yet), and eventually remove the possible bottlenecks that have resisted the entry of those related sectors to the region (Neffke et al. 2011). Second, another way of helping a region to have the variety of knowledge is to attract the creative class individuals into a region. Recent studies suggest that creativity has been a missing pillar in the theory of knowledge spillover (Audretsch and Belitski 2013). Diversity, openness to other cultures, and portion of Bohemians all contribute to higher creativity level in the regions and hence higher knowledge spillover (Boschma and Fritsch 2009; Audretsch and Belitski 2013) and eventually more innovation. In both of these ways, policymakers may help the regions to develop the related variety type of portfolio, however, only in the long run. Such policies to promote the variety in the region can be a complementary one beside the classic policy of increasing the R&D intensity of the regions. That is having both variety and intensity in mind as complementing each other for having a higher regional innovation.