1 Introduction

Technological innovation in the business world has become a determining factor of competitiveness and is thus a key element of productivity growth in advanced economies (Buesa et al. 2010). However, from a global perspective, not all countries or regions have the same capacity for innovation, nor do all the factors that define the development of innovative processes affect different areas with the same intensity in relation to productivity growth rates.

The well-known study by Griffith et al. (2006) used the structural model proposed by Crépon et al. (1998) to analyse the impact of technological innovation on labour productivity in four European countries (France, Germany, Spain, and the UK) from 1998 to 2000. They found significant differences between the factors related to technological innovation activities regarding their impact on the manufacturing productivity of companies based in these four countries.

Their approach could be applied to regions within a country to investigate whether the impact of technological innovation activity on firm productivity differs according to their location, while taking all economic sectors into account. This relationship between location and innovation could be relevant because business R&D is generally located in the headquarters of companies in their countries of origin (e.g. Belderbos et al. 2011; Castellani and Pieri 2013; Narula 2002). The present study investigated this issue in the three most relevant Spanish regions according to their populations and contribution to the Spanish GDP.

In Spain, differences in regional socioeconomic figures affect the way in which businesses approach technological innovation to achieve productivity improvements. As shown in Table 1, in the period ending in 2016, the three regions analysed comprised almost 48% of the Spanish population and jointly contributed more than 51% of the national GDP. However, the analysis of the remaining values shows clear differences between these three regions.

Table 1 Aggregate productivity, R&D intensity, and unemployment in Spanish regions

Although Andalusia is the Spanish region with the largest population, it has worse figures in terms of per capita income, unemployment, labour productivity, and total factor productivity than those of Madrid, Catalonia, and the average values for Spain as a whole. In the period 1970–2007, the average total factor productivity in Andalusia was very similar to the national figure obtained by Coremberg and Pérez-García (2010). Furthermore, Andalusia only invests 0.91% of its regional GDP in R&D. In line with all the data of the series since 2000, the three regions analysed and Spain as a whole continue to show values well below the European average for 2016 of 2.03%, according to EUROSTAT. It is also relevant to note that the OECD (2011) categorized these regions into different classifications. Madrid and Catalonia were classified as industrial production zones, whereas Andalusia was classified as a non-S&T-driven zone (i.e. a region that is not driven by science and technology). These classifications function as a guide regarding innovation in these regions.

Nevertheless, there are very high interregional differences in relative investment in R&D (Edler and Fagerberg 2017). As noted by Acosta et al. (2015), these differences are due to the Spanish regions developing different policies on innovation. However, it seems reasonable to assume that some other factors may also lead to differences in the way R&D investments translate into productivity figures in the Spanish regions. These other factors may include differences between Spanish regions in specialized products, the effect of the location of company headquarters and the location of production, or differences in price levels between the regions considered.

Therefore, the basic objective of this study was to analyse differences between Spanish regions in company productivity due to the implementation of technological product and process innovations as defined by the Oslo Manual (OECD 2005). That is, we studied the economically most relevant Spanish regions regarding possible variations in productivity due to differences in business policies on innovation. To the best of our knowledge, this type of regional analysis has not been previously conducted for the Spanish economy and thus represents a novel contribution to the literature. The CDM structural model originally proposed by Crépon et al. (1998) was used to quantify the relationship between investment in R&D, technological innovation, and business productivity. In this case, and in line with the study by Griffith et al. (2006), we used data from the Spanish Technological Innovation Panel (PITEC), which uses the methodological framework established for the Community Innovation Survey (CIS) developed by EUROSTAT.

In summary, this study is justified by the fact that there are structural differences and differences in innovation policies between regions that are likely to lead to significant variations in innovation-associated productivity.

The remainder of this paper is structured as follows: The next section presents a brief review of the scientific literature in this field, followed by the methodology section. We then introduce the database used and provide a descriptive analysis of this database in Sect. 4. The econometric results obtained are presented in Sect. 5, followed by the main conclusions and economic policy recommendations based on these conclusions.

2 Literature review

Scientific interest in the impact of innovative activity on company productivity was stimulated by the seminal works of Griliches (1979, 1986) and Pakes and Griliches (1980). Studies based on the methodology proposed by this author mainly analysed this relationship in the manufacturing sectors of specific countries. These studies considered innovative activities as an input of the productive process and therefore included variables related to innovation as another explanatory variable of the production function (e.g. Lööf and Heshmati 2006; Hall and Mairesse 1995; Jaffe 1986).

In order to consider innovation activities as a result of the production process rather than as an input alone, Crépon et al. (1998) postulated a three-stage structural model that linked the decision of companies to invest in R&D and the intensity of their investment, the generation of innovations as a result of the investments made, and the productivity of the company after implementing the innovations. Subsequently, other studies applied this methodology to the analysis of business process innovation considered as output and its impact on productivity. Most of these studies used either panel data or cross-sectional data and analysed different sectors of the national economies of single countries. Specifically, these studies analysed the manufacturing sector (e.g. Acosta et al. 2015; Chudnovsky et al. 2006; Hall et al. 2009; Lee 2011; Marin 2014; Wadho and Chaudhry 2018), the service sector (e.g. Álvarez et al. 2012; García-Pozo et al. 2018; Siedschlag et al. 2011; Stelios and Aristotelis 2009), or compared both sectors (e.g. Castellacci 2011; Dutrenit et al. 2013; Goya et al. 2013; Mairesse and Robin 2010; Polder et al. 2009; Siedschlag and Zhang 2015). All these studies found a positive association between innovation and productivity. However, studies that have analysed the economic sectors of a country or region as a whole remain scarce.

The methodology proposed by Crépon et al. (1998), or variations of this methodology, has mainly been used to compare different countries regarding the impact of R&D intensity after it is transformed into innovation activities and their application to improve productivity. For example, Criscuolo (2009) and OECD (2009) analysed variations in productivity due to innovation in OECD countries; Aboal et al. (2015) analysed differences between Central and South American countries; Raffo et al. (2008) compared the results of innovation on productivity in European and Latin American countries; and Griffith et al. (2006), Lööf et al. (2003), Mohnen et al. (2006), and Peters et al. (2014) compared the results of applying the CDM model to European countries.

Few studies have used the CDM model to compare the effects of innovation on productivity between regions. This lack is probably due to difficulties in obtaining statistical information on these aspects at company and regional levels. Most of the studies in this field have used different production functions to analyse associations between innovation and productivity at the macroeconomic level (e.g. Castellani and Pieri 2013; Felsenstein 2015; Vieira et al. 2011). However, although the studies by Segarra-Blasco (2010) and Segarra and Teruel (2011) addressed a single Spanish region, they are noteworthy in that they used a CDM model.

Thus, this comparative study was motivated by the scarcity of empirical research on the impact of R&D investment and technological innovation on company productivity at the microeconomic level in the Spanish economy from a regional perspective.

3 Methodology

As mentioned, the methodology applied in this study was based on the CDM model (Crépon et al. 1998). The main advantage of this methodology is that it can be used to investigate associations between business investment in R&D, the generation of technological innovations, and variations in productivity. Different versions of the CDM model have been widely tested in this field in the economic literature (e.g. García-Pozo et al. 2018; Griffith et al. 2006; Hall et al. 2009; Lööf and Heshmati 2006; Siedschlag and Zhang 2015).

As pointed out by Griffith et al. (2006), the CDM model has a very simple basic structure that can be used to analyse: (1) the decision of companies to invest in innovation and the amount of innovative effort; (2) the innovative results of this effort; and (3) productivity obtained using the results of the innovative effort. In line with Griffith et al. (2006), these three stages can be formally represented as follows:

3.1 First stage: R&D equations

The first decisions of the company are whether to invest in R&D and to determine the amount to invest. These decisions can be formalized using two equations that identify the companies that decide to invest in R&D and that establish the intensity of the innovation effort.

Based on the foregoing, and assuming that \( i = 1, \ldots , \, N \) represents the number of companies and \( r_{i}^{*} \) represents the innovative effort of the company:

$$ r_{i}^{*} = z_{i}^{\prime } \beta + \varepsilon_{i} $$
(1)

where \( r_{i}^{*} \) is an unobservable latent variable, \( z_{i} \) is the vector of the factors considered to be determinants of the innovative effort, \( \beta \) represents the vector of coefficients to be estimated, and \( \varepsilon_{i} \) is the error term. Equation (1) can estimate or measure the innovative effort of companies by using information on their internal R&D expenditures (\( r_{i}^{*} \)), but only for those companies that either invest or report investment; however, this approach would introduce an unwanted selection bias. Therefore, we used a selection equation to identify companies that invest in or report internal R&D. This equation takes the form:

$$ rd_{i} = \left\{ {\begin{array}{*{20}c} {1\quad {\text{if}} \;rd_{i}^{*} = w_{i}^{\prime } \alpha + \varepsilon_{i} > c} \\ {0 \quad {\text{if}} \;rd_{i}^{*} = w_{i}^{\prime } \alpha + \varepsilon_{i} \le c} \\ \end{array} } \right. $$
(2)

where \( rd_{i} \) is a binary observed endogenous variable that takes value 1 for companies that invest in or report internal R&D and 0 otherwise, and \( rd_{i}^{*} \) is the corresponding latent variable that represents the decision to innovate. If the value of the latent variable is higher than the level established by the constant \( C \), the company has decided to invest in or report R&D. \( w_{i} \) represents the vector of variables that determine the decision to invest in R&D, \( \alpha \) is the corresponding vector of coefficients to be estimated, and \( \varepsilon_{i} \) is the error term. Once the company has been identified, the following expression can be used to estimate the value of the investment in R&D:

$$ r_{i} = \left\{ {\begin{array}{*{20}l} {r_{i}^{*} = z_{i}^{\prime } \beta + e_{i} } \hfill & { {\text{if}}\; rd_{i} = 1} \hfill \\ 0 \hfill & {{\text{if}}\; rd_{i} = 0 } \hfill \\ \end{array} } \right. $$
(3)

The system formed by Eqs. (2) and (3) can be estimated by maximum likelihood, providing it is assumed that the error terms \( e_{i} \) and \( \varepsilon_{i} \) follow a bivariate normal distribution with zero mean, unit variances, and correlation coefficients of both errors \( \rho_{e\varepsilon } \). In the literature, this estimation method is called the generalized Tobit model (Type II) or the Heckman selection procedure (1979) by maximum likelihood.

3.2 Second stage: knowledge or production innovation equations

In this stage, R&D investment is transformed into the technological innovations the company will be able to implement. The production innovation equation used for the two types of technological innovation can be expressed as follows:

$$ g_{i,n} = r_{i}^{*} \gamma + x_{i,n}^{\prime } \delta + u_{i,n} $$
(4)

where \( g_{i,n} \) is a binary variable representing the innovative process for each type of innovation (n). The variable \( r_{i}^{*} \) represents the predicted value of effort or investment in R&D by each company obtained by estimating Eqs. (2) and (3) in the first part of the model; \( x_{i,n} \) represents the vector of covariates or determinants of each type of innovation; \( \gamma \) and \( \delta \) are the coefficients to be estimated; and \( u_{i,n} \) is the error term for the two types of technological innovation. As Hall et al. (2009) suggested, all the companies in the sample are taken into account when the predicted effort in R&D is included in the production innovation equation. This value is included because it is assumed that all the companies make some kind of innovative effort regardless of whether or not they invest in R&D or report investing in R&D. In addition, by including the predicted value in the model instead of the actual value, we solve the problems of simultaneity and endogeneity between R&D intensity and production innovation, because \( w_{i} \) and \( z_{i} \) are independent of \( u_{i,n} \) in both cases.

3.3 Third stage: the production equation

In the last stage of the model, we estimate the influence of the two types of technological innovation on production. In this case, and in order to better understand the model, the variable labour productivity is analysed using the Cobb–Douglas production function. The two types of innovation previously estimated and the capital and labour productive factors are included as the explanatory variables. If the subscripts that identify the company and the error term are omitted, this relationship can be expressed as follows:

$$ Y = AK^{\alpha } L^{\beta } e^{{\gamma_{n} I_{n} }} $$
(5)

where \( Y \) represents the company’s production, \( K \) represents the stock of deflated physical capital, \( L \) represents the number of employees, and n represents the two types of technological innovation (i.e. process and product innovation).

The equation estimated in the present study was obtained by taking logarithms in (5) and subtracting in both members of the labour factor \( L \):

$$ Ln\left( {\frac{Y}{L}} \right)_{i} = LnA + \alpha LnK_{i} + \left( {\beta - 1} \right)LnL_{i} + \gamma_{n} I_{n, i} $$
(6)

where \( \alpha \) represents the elasticity of production in terms of physical capital, \( \beta \) represents the elasticity of production in terms of labour, and \( \gamma_{n} \) represents the semi-elasticity of production according to each type of innovation. In this case, by using predicted probability values from the previous stage, we avoid the problem of endogeneity of the explanatory variables that represent the two types of technological innovation.

4 Data

The main source of information used in this study was the Spanish PITEC database (Panel de Innovación Tecnológica). The PITEC is a panel-type database created in 2003 by the Spanish National Institute of Statistics (INE) and the Spanish Foundation for Science and Technology. It can be used to study the technological innovation activities of over 12800 Spanish companies. PITEC uses the same methodological framework as the Community Innovation Survey (CIS) developed by EUROSTAT and the innovation classification criteria listed in the Oslo Manual (OECD 2005). Thus, PITEC data and data from European Union countries can be compared.

This study included observations for all the economic activities availableFootnote 1 during 2008–2016. The observations were grouped according to the location of the companies in the three Spanish regions available in the PITEC database (Madrid, Catalonia, and Andalusia, the only regions available in the database). We also analysed the data for Spain as whole.Footnote 2 This process provided an unbalanced sample of 10354 Spanish companies, of which 737 were located in Andalusia, 2489 in Catalonia, 1912 in Madrid, and 5216 in the rest of Spain.

Several aspects need to be highlighted from the variables used and defined in Table 5 (Appendix). Firstly, all variables expressed in monetary terms were deflated to 2008 euros. Data provided by the BD.MORESFootnote 3 database (Spanish Ministry of Finance and Public Administrations) were used to create a deflator for each economic subsector. Klette and Griliches (1996) and Mairesse and Jaumandreu (2005) have already drawn attention to the advantage of using specific deflators for each subsector when estimating production functions. Secondly, the permanent inventory method was used to estimate the stock of physical capital. This method basically consists in estimating the capital goods by accumulating the investment flows deflated by sector while making a series of assumptions about the average useful life and the depreciation pattern of the goods, following the criteria established by De Busto et al. (2008). Finally, in addition to the variables used in each stage of the model due to their relevance, two groups of dummy variables were included in all stages to control for the impact of the specific economic activity conducted by each company and the year in which each observation was recorded. The observations were treated as cross-sectional data, thereby controlling for the temporal nature of the data.

5 Results and discussion

5.1 First stage: estimation of R&D equations

Table 2 shows the estimations of the marginal effects and coefficients of R&D equations of the CDM model by region. The estimation was performed using the Heckman (1979) selection procedure by robust maximum likelihood versus heteroskedasticity. In this stage, the first equation (the selection equation) was estimated using a binomial logit model that takes the value 1 when the company decides to invest in R&D and 0 otherwise. The second equation employs linear regression using ordinary least squares (OLS) to estimate the intensity or innovative effort of companies according to whether the company decides to conduct innovative activities. Innovative effort was estimated using the logarithm of R&D intensity per employee. It should be noted that the value of the rho (ρ) statistic was significant, which justifies the use of this selection procedure.

Table 2 Results of estimations of engagement in R&D and R&D intensity by Spanish regions

Significant differences were found between regions in the impact of the analysed variables on the companies’ decisions to invest in R&D and the amount of innovative effort. All statistically significant values of the estimated coefficients of innovative effort were much lower in the Madrid region than in Catalonia and, in particular, Andalusia.

Company size increased the likelihood of engaging in more intense innovative activity in Catalonia and in Andalusia, although size had little effect in the case of Madrid. This variable was only included in the logit equation because innovative intensity is implicitly affected by company size, as pointed out by Griffith et al. (2006). The use of legal measures to protect innovations increased the probability of engaging in innovation in companies in all the regions. This increase was highest in Catalonia and next highest in Andalusia. However, this variable only had a positive impact on innovative effort in Catalonia and, in particular, in Andalusia. International cooperation and the participation of companies in international markets have a positive and significant impact on innovative effort. In fact, international competition in their markets increases the probability of engaging in innovation. Local/regional, national, and European Union funding for innovative projects had a positive and significant impact on the decision to conduct innovative activity and its intensity. National funding strongly increased the probability of companies in the three regions deciding to engage in R&D activity. This type of funding was most commonly used in the Madrid region and Andalusia. Once this decision was taken, the impact of national funding on innovative effort was higher in Andalusia than in the other regions. The impact of local/regional or European funds on innovative intensity was similar in the three regions, except in Andalusia where European funding had the greatest impact.

5.2 Second stage: estimation of production innovation equations

Although at this stage Griffith et al. (2006) estimated the equations for the two types of technological innovation individually, we used multivariate probit regression using simulated maximum likelihood as proposed by Cappellari and Jenkins (2003).Footnote 4 They suggested that the multivariate probit model is more appropriate for this type of analysis, because the effects of two types of technological innovation on the estimates of both equations are considered together. This decision is also justified by the significantly nonzero value of the correlation coefficient estimated in all cases for the error terms rho (ρ). As suggested by Hall et al. (2009), this value implies that both types of innovation are influenced by the same nonobservable factors, but at different intensities. In the two production innovation equations, the variable production innovation is a dichotomous dummy dependent variable that takes value 1 when companies have introduced either product or process innovation during the 2 years prior to the date of observation; otherwise, it takes value 0. In addition, all explanatory variables except for the first three (i.e. R&D intensity, new products and labour productivity) are dummy variables that take value 1 when the respective issue is relevant to the company and 0 otherwise. As suggested by Hall et al. (2009), all the companies in the sample are taken into account when the predicted effort in R&D is included in the production innovation equation. All the companies in the sample were included because it was assumed that all companies make innovative efforts, regardless of whether or not they invest in R&D and report it. On the other hand, by using the predicted value in the knowledge production function, rather than the amount actually invested, issues of simultaneity and endogeneity between R&D intensity and production innovation can be avoided.

Table 3 shows the estimations of the production innovation equations for the three study regions and for Spain as a whole. Significant differences were found between the study regions in the estimated values.

Table 3 Product and process innovation equations

As expected, the marginal effect of R&D intensity predicted in the previous stage was strongly statistically significant and positive in the three regions for both types of innovation. It reached its highest value in Catalonia for both types of innovation and with meaningful differences compared to the two other study regions. These results indicate that increased R&D intensity significantly increases the probability of engaging in product or process innovation. The marginal effect of gross investments in tangible goods per employee for process innovation was significant and positive in Madrid and Catalonia, but no significant in Andalusia. It is noteworthy that the marginal effect of company size on process and product innovation had a positive and significant impact in the three study regions, although substantial differences were found between them. The decision to legally protect inventions and innovations had a high impact on product innovation in all the study regions, particularly in the case of the Madrid region. However, its impact on process innovation was not significant in Catalonia. The sources of information used by companies in process innovation are in line with those reported in the economic literature in this field (Lööf and Heshmati 2002; Griffith et al. 2006). Companies themselves are the main source of information on both types of innovation, followed by consumer opinion in the case of product innovation and company suppliers in the case of process innovation. In relation to the two types of innovation, marginal effects had the highest impact in Andalusia. Finally, it is noteworthy that Catalonia and Madrid have the widest range of factors that increase the likelihood of companies engaging in product and process innovation. In this regard, significant differences were found between Catalonia and the other two regions.

5.3 Third stage: estimations of production equations

The last stage of the CDM model addresses the estimation of Eq. (6) for the three study regions and the Spanish economy as a whole. Table 4 shows these estimates.

Table 4 Production equations

Firstly, it should be noted that the values of the adjusted R-squared and F statistic for the four estimations confirm the goodness of fit and significance of the model, respectively. On the other hand, and in line with the results obtained in the first two stages of the proposed CDM model, significant differences were found between the three study regions in the estimated semi-elasticities for the predicted probability of making product and process innovations.

Significant differences between the study regions were found in the productive elasticity of the productive factors (capital and labour) included in the estimated Cobb–Douglas function. Capital stock elasticity was higher in Madrid and Catalonia, whereas labour elasticity was higher in Madrid and Andalusia. The estimates obtained for the Spanish economy as a whole are in line with previous studies, taking into account the different samples, periods, sectors, and methodologies used in these studies (e.g. Acosta et al. 2015; García-Pozo et al. 2018; Goya et al. 2013).

As mentioned, the coefficients of the impact of product and process innovation on productivity reveal marked regional differences. Although the estimates suggest that product innovations are associated with an average increase in productivity of 8.3% and 4.5% in Andalusia and Madrid, respectively, the estimated value of this semi-elasticity did not reach statistical significance in Catalonia. Process innovation significantly increased productivity in Catalonia alone, where productivity increased by 2.8%. In Spain as a whole, process and product innovation positively and significantly increased productivity by 2.2% and 3.2%, respectively.

6 Summary and conclusions

This study analysed the determining factors of technological innovation in Spain by region, the transformation of such innovation into increased productivity, and differences between these regions. We used data from the PITEC database, which uses the methodological framework established for the CIS developed by EUROSTAT, and data for the three most relevant Spanish regions according to their population and contribution to the national GDP (Andalusia, Catalonia, and the Madrid region). In line with the model proposed by Griffith et al. (2006), we estimated a three-stage CDM structural model that links R&D investment and technological innovations and productivity.

The results clearly show differences between the three study regions in R&D investment, its transformation into technological process innovation, and its impact on labour productivity. However, the most relevant and most damaging factor affecting the growth of labour productivity in Spain, and therefore in the regions analysed, is probably the low investment in R&D relative to the GDP on the part of Spanish companies. In the best case, regions such as Madrid and Catalonia invest around 1.5% of GDP, whereas in Andalusia this figure does not even reach 1%. Thus, the three regions are very far from reaching the objective of 2% of the GDP set for Spain by the European Union in the present year of 2020. Although this objective will be difficult to achieve, the gap between the present data and this objective could be reduced if businesses invest in the type innovation (i.e. in processes or products) most needed in their particular region to increase productivity. This gap would also be reduced by national and regional institutions creating incentives to invest in R&D.

In view of the results obtained, we draw attention to some general trends regarding the effect of certain variables in the three stages of the innovative process analysed. These variables present a certain degree of regional homogeneity, but with significant differences in their quantitative relevance. Regarding decisions to invest in R&D and the level of innovative effort, companies in the three regions are influenced by the same variables (e.g. business size, legal protection of their investment in R&D, cooperation with other companies, public financing of these investments). Nevertheless, there are considerable differences in their relevance at the regional level. In the case of access to public funds to finance innovation, it is surprising that an average of only 5.6% of the companies in the three regions access European Union funds. This aspect hinders R + D and should be addressed by political action on the part of the Spanish government and the European Union.

As expected, the greater intensity of investment in R&D favours both product and process innovations in the three regions, having much more importance in the former than in the latter and in the case of Madrid and Catalonia than in Andalusia. Business size, gross investments in tangible goods per employee and the ability to legally protect innovation favour regional innovations to a different extent. For its part, the main sources of information about innovative processes are the work done in the company itself and customers for product innovation and suppliers for process innovation.

Significant differences were found between study regions in estimated labour productivity. In Madrid and Andalusia, increases in labour productivity were only associated with the increased probability of product innovation, whereas in Catalonia they were only associated with process innovation. These results suggest a lack of convergence in business strategies to improve labour productivity in these regions. It seems clear that greater commitment to process innovation in Andalusia and Madrid and to product innovation in Catalonia would provide companies in these regions with significant improvements in labour productivity.

In summary, the results and estimates suggest that interregional differences in the effect of technological innovations on productivity are strongly affected by factors such as innovative effort, business size, stock of business capital, and other variables that influence the motivation to invest in innovation. These differences are also affected by management conditions, which have a differential effect on the innovative process in each of the analysed regions. Taken together, these aspects support the conclusions presented in this article.

Finally, we would like to draw attention to some aspects that would have improved the results. These aspects include the introduction of variables that could represent the work quality of the employees. Increased quality would clearly improve labour productivity and thus increase the efficiency of investment in R&D. Population density can also act as a stimulus for regions to develop active innovation policies. On the other hand, public policies can have a significant impact on innovation decisions made by companies in all Spanish regions. Therefore, the results could have been affected by the use of indicators of public investment in education, although the inclusion of these two aspects in this type of model at the company level would not have been appropriate. However, the main problem regarding the Spanish PITEC database is its lack of continuity (e.g. there is no database for 2017) and the fact that data on the companies surveyed lack temporal homogeneity.