Keywords

1 Introduction

Planners at the European Patent Office (EPO) make forecasts for future patent filings in order to power business planning for all the major points of the patent granting process, such as searches, substantive examinations, grants and renewals. There is an annual cycle that proceeds from the forecasts via the business plan that is finalised in a budget document (Hingley and Nicolas 2006). Academic scholars are interested in patent forecasts in order to better understand the determinants of innovation and productivity, which are fundamental influences on economic growth and human progress.

The purpose of this study is to model EPO filings behaviour and derive the implications for forecasting EPO filings out-of-sample, over the medium and longer term. The motivation for this is a study on forecasting EPO filings by Park (2006), in which only long term trends were taken into account. The report was conducted amid an economic boom around the world, driven by the housing market, relatively low energy prices, and the expansion of multinational firm activity, such as outsourcing and offshoring, particularly by the leading technology companies. The earlier study did not anticipate, nor allow for, the effects of a major economic downturn that has come to be known as the Great Recession. Consequently Park (2006) produced rather optimistic projections of real gross domestic product (GDP) and EPO filing activity. Instead, there were decreases in EPO filings by leading source countries, like the U.S., Japan, and Germany, particularly in 2009, the critical year of the Great Recession. Thus, business fluctuations seem to have some effect on patenting. However, there also was a relatively quick recovery in filings during 2010–2011. Moreover, for the other countries, their patent filings changed relatively little during the period of the global financial crisis. China’s filings actually rose during the Great Recession period. Thus, this raises the issue of how significantly business fluctuations affect patenting and matter to forecasts of future patenting. What is the direction of effect: is patenting pro-cyclical (moves positively with business cycles) or counter-cyclical (moves inversely with business cycles)? How much do business cycles really contribute to changes in patenting? How lasting are the effects of business cycle shocks on patenting? What is the value-added of forecasting business cycles to forecasting EPO filings and applications?

A variety of econometric regression based approaches exist.Footnote 1 One of those methods will be considered here, that is based on a dynamic log-linear model (see Hingley and Park 2015a; Park 2006). In an endeavour to improve the model, a business cycles component has been included to supplement trend GDP.

Business cycles refer to fluctuations of output around some average or long run trend path. The trend path of output is usually determined by productivity, technology, or resources. The deviations of output from trend are usually due to nominal shocks, such as aggregate demand shifts in the presence of wage and price rigidities. The reason business cycles are a concern is that when output is below trend, resources are underutilized and unemployment may increase. When output is above trend, inflationary pressures tend to arise. However, there is much debate about how long lasting business cycles are. One school of thought (for example, the old-Chicago school) is that markets self-correct relatively quickly and that most of the output fluctuations are equilibrium movements or shifts in trend output itself. Another school (for example, the old-Keynesian school) is that economies experience persistent disequilibria. The findings in this chapter present a view that is intermediate between these two schools: output deviations from trend are neither permanent nor equilibrium phenomena.

The chapter is organized as follows. The next section outlines our empirical framework. Section 3 presents estimates of the model and some out-of-sample forecasts. Section 4 re-examines the model from some alternative perspectives, and Sect. 5 discusses other applications. Section 6 concludes. Overall, the study demonstrates the desirability and feasibility of forecasting EPO patent filings under uncertainty about future movements in GDP or the market outlook. The significance is that, to the extent that innovation activities are attuned to market conditions, market fluctuations should affect the capacities and incentives to seek patent protection. Furthermore, the frequency of cycles matters. EPO filings are affected by business cycles when yearly data are considered. For shorter data frequencies—namely quarter to quarter—EPO filings are not as sensitive to business fluctuations. A reason discussed is that significant innovation decisions are not likely to be made on the basis of short-lived circumstances. But on a yearly scale, fluctuations can affect budgetary and other cost factors that impinge upon innovation plans.

2 Empirical Framework

The following regression model is used for EPO filings from a source country:

$$ { \ln }\frac{P}{L} =\upalpha_{0} +\upalpha_{1}\, { \ln }\left( {\frac{P}{L}} \right)_{ - 1} +\upalpha_{2}\, { \ln }\left( {\frac{P}{L}} \right)_{ - 2} +\upalpha_{3}\, { \ln }\frac{R}{L} +\upalpha_{4}\, { \ln }\frac{{{Y}^{T} }}{L} +\upalpha_{5}\, {u} \,+\, {\upvarepsilon} $$

where P is the number of EPO filings filed by a source country,Footnote 2 L is the number of workers in the source country,Footnote 3 subscripts −1 and −2 indicate lags of one year and two years respectively, R is R&D expenditures,Footnote 4 usually lagged by 5 years, and ε is an error term, assumed to be normal with constant variance. ln denotes natural logarithm. The GDP of the source country (Y) is split into two components: Y T the “trend” level of output, and u a business cycle indicator (namely, the ratio of cyclical GDP to trend GDP).Footnote 5 The Hodrick and Prescott (2007) filter method was used to separate cycles from trend (and is further explained in Hingley and Park 2015b).

Filings P are transformed as indicated to ln(P/L). This allows for a standardisation between countries because L is a proxy for country size, and for stabilising the error by using the logarithmic transformation. Based on Ditka (2006), the value of R is lagged by five years in order to incorporate the concept that R&D expenditures take time to produce patentable inventions. The effects of possible booms and/or recessions within the forecast period can be assessed by manipulating u. Table 1 shows some sample statistics associated with the calculated u. The table illustrates that business cycles vary by country. Of course, there are common global shocks. But not all of these shocks are transmitted to national economies in the same way. Some economies are better insulated against external shocks, depending upon on their policy or institutional regimes. Furthermore, different countries may be in different phases of the business cycle. Hence, our decomposition method has been applied separately to each of the countries in the sample.

Table 1 Sample statistics of the business cycle variable, u

The model is fitted over a group of countries that includes a rest-of-the-world (ROW) class. We forecast total filings usually up to 6 years beyond the training data period. In order to obtain the forecasts for filings in a future year, the individual forecasts derived from the model by country are summed across countries. At the level of filings P, the assumed distribution of the error ε, and hence also that of P itself, is assumed to be lognormal. The technique for estimation takes this into account. Other reported studies have often not done this kind of transformation and have therefore worked with imprecise confidence intervals. Let v = P/L, with log(v) distributed N(μ, σ2), a normal distribution with mean μ and variance σ2. The model contains fixed effects—a separate intercept for each of 28 countries (α01, α02, …, α028)—and common slope parameters (α1, α2, α3, α4 and α5). The linear model is fitted to the transformed data for the various countries simultaneously to determine the estimates of \( \hat{\upmu} \) and \( \hat{\upsigma}^{2} \) for each country. This is done in the usual way by gathering the data underlying the independent variables, including the {0, 1} dummy variables for the intercepts, into an (n × p) design matrix Z. The parameters to be estimated are themselves stacked into a (p × 1) parameter vector B. Let T indicate transposition:

$$ \begin{aligned} {B}^{T} & = [\upalpha_{0 1} ,\upalpha_{0 2} , \ldots ,\upalpha_{0 2 8} ,_{,}\upalpha_{ 1} ,\upalpha_{ 2} ,\upalpha_{ 3} ,\upalpha_{ 4} ,\upalpha_{ 5} ] \\ {Z} & = \left[ {\begin{array}{*{20}c} 1 & 0 & \ldots & 0 & {{a}_{ 1, 1} } & {{a}_{2, 1} } & {{a}_{3, 1} } & {{a}_{4, 1} } & {{a}_{5, 1} } \\ 1 & 0 & \ldots & 0 & {{a}_{ 1,2} } & {{a}_{2,2} } & {{a}_{3,2} } & {{a}_{4,2} } & {{a}_{5,2} } \\ . & . & . & . & . & . & . & . & . \\ 1 & 0 & \ldots & 0 & {{a}_{ 1,24} } & {{a}_{2,24} } & {{a}_{3,24} } & {{a}_{4,24} } & {{a}_{5,24} } \\ 0 & 1 & \ldots & 0 & {{a}_{ 1,25} } & {{a}_{2,25} } & {{a}_{3,25} } & {{a}_{4,25} } & {{a}_{5,25} } \\ 0 & 1 & \ldots & 0 & {{a}_{ 1,26} } & {{a}_{2,26} } & {{a}_{3,26} } & {{a}_{4,26} } & {{a}_{5,26} } \\ . & . & . & . & . & . & . & . & . \\ 0 & 1 & \ldots & 0 & {{a}_{ 1,48} } & {{a}_{2,48} } & {{a}_{3,48} } & {{a}_{4,48} } & {{a}_{5,48} } \\ . & . & . & . & . & . & . & . & . \\ . & . & . & . & . & . & . & . & . \\ . & . & . & . & . & . & . & . & . \\ 0 & 0 & \ldots & 1 & {{a}_{ 1,792} } & {{a}_{2,792} } & {{a}_{3,792} } & {{a}_{4,792} } & {{a}_{5,672} } \\ \end{array} } \right] \\ \end{aligned} $$

where a i,j is the value of the independent variable i for the jth observation in the dataset. The parameter estimates \( {\hat{\text{B}}} \) are calculated by least squares and the associated error variance of ε is calculated as \( \hat{\upsigma}^{2} \):

$$ \begin{aligned} {\hat{\text{B}}} & = {Z}\cdot \left( {{Z}^{T} {Z}} \right)^{ - 1}\, \cdot\, {Z}^{T}\, \cdot\, { \log (v)} \\ \hat{\upsigma}^{2} & = ({ \log (v)}\,{-}\,{Z}\hat{\text{B}})^{T} \cdot ({ \log (v)}\,{-}\,{Z}\hat{\text{B}})/({n}\,{-}\,{p}) \\ \end{aligned} $$

where log(v) is the (n × 1) vector log (v 1,1, v 1,2, …, v 1,24, v 2,1, …, v 28,672)T, that is the string of transformed observations (or differences in transformed observations) from each country by years in the training set, laid out with years within countries repeating faster than countries.

On the logarithmic scale, the fitted values of the observations are given by the matrix inner product \( {Z}\hat{\text{B}} \) within the training set. The forecasts for each country for each future time point are given by projecting further (1 × p) rows z that are equivalent to rows of Z but taking independent variables beyond the data set and calculating the inner product \( {\text{z}}\hat{\text{B}} \).

The lognormal distribution of v has mean γ = exp(μ + σ2/2) and variance Var[γ] = γ2 (exp(σ2) − 1) (see Johnson et al. 1994). The fitted values on the scale of v are therefore taken from the linear model as \( \hat{\upgamma} = { \exp }(({z}\hat{\text{B}}) + {\hat{\sigma}}^{ 2} / 2) \). The estimated number of filings from a country at a given time point is then \( {w} = {L}\,\hat{\upgamma} \), with an estimated variance, \( {\text{Var[w]}} = {L}^{2} \,\hat{\upgamma}^{ 2} ({ \exp }(\hat{\upsigma}^{2} )- 1) \).

It should be noted that, during the forecast period beyond the training data set, it is necessary to forecast L and also the independent variables Y T, u and R.Footnote 6 This is usually done by straight line regression projection from the 10 most recent available years of data in the training set, and is a source of additional variability for the forecasts. The goal is to forecast Total filings as the sum of the forecasted filings per country of origin. This is \( {\hat{\text{w}}}_{\text{TOTAL}} = \sum\,{\text{w}} = \sum\,{\text{L}}_{i} \hat{\upgamma}_{i} \), where \( \sum \) indicates summation over countries. \( \hat{\upgamma}_{i} \) is the filings estimate for country i.

In the following, Ξ is the (28 × 28) covariance matrix between countries on the log scale, that is estimated from the linear model by \( \hat{\Xi } = \left( {{Z}^{T} {Z}} \right)^{ - 1}\, \hat{\upsigma}^{2} \). Two alternative approaches are taken to estimate the variance of \( {\hat{\text{w}}}_{\text{TOTAL}} \):

  1. 1.

    The sum of the estimated variances for the countries.

    $$ {\text{Var}}[{\hat{\text{w}}}_{\text{TOTAL}} ] = {\text{Var}}\left[\sum\limits_{i} {{L}_{i} } \cdot \hat{\upgamma}_{i} \right] = \sum\limits_{i} {{L}_{i}^{2} } \cdot \hat{\upgamma}_{i}^{2}\, ({ \exp }(\hat{\Xi }_{ii} ) - 1) $$

    The summation is over all countries i: 1, …, 28.

  2. 2.

    The sum of the estimated covariances for all pairs of countries i and j.

    $$ {\text{Var}}[{\hat{\text{w}}}_{\text{TOTAL}} ] = {\text{Var}}\left[\sum\limits_{i} {{L}_{i} } \cdot \hat{\upgamma }_{i} \right] = \sum\limits_{i} {{L}_{i} } \sum\limits_{j} {{L}_{j} } \cdot \hat{\upgamma}_{i} \cdot \hat{\upgamma}_{j}\, ({ \exp }(\hat{\Xi }_{ij} ){-} 1) $$

    This uses a formula for the variance of a sum that is based on an extension of the formula for the variance of the mean of the lognormal distribution (Soderlind 2013). The summation is over all country pairs i and j. The country pairs include each country paired with itself, in which case the covariance term is the same as the term that feeds into expression 1 above.

Approach 1 is slightly easier to calculate than approach 2, but the covariances are likely to be relevant effects, since the model is fitted simultaneously over countries with some common parameters between countries, with international common developments affecting patent filings in many countries at the same time, and because of possible non-stationarity of the historical training set data (Ditka 2006). Approach 2 helps to take account of these factors. Here \( {\text{Var}}[{\hat{\text{w}}}_{\text{TOTAL}} ] \) will be calculated by both approaches. The estimates will be compared to get an idea of stability of the model and to assess the attempts to assure stationarity by differencing both the variables.

As was already mentioned, a linear trend approach is used to forecast the independent variables Y T, u, and R, that are then included as inputs to Z in order to use the estimated parameter matrix \( {\hat{\text{B}}} \) to find the fitted values \( \hat{\gamma }_{i} \) for each country i for each forecasted year outside the training data set. In the case of R, the assumed 5 year lag means that historical known values can be used for the forecasts of the next few years in the future. Since the model includes autoregressive terms that relate to lags of filings at one and two years, the forecasts for more than two years out themselves use inputs of filings forecasts from one and two years previously. These inputted forecasts are subject to variability by the same error process, and the other inputs are also subject to error because of the usage of a trend to estimate them. Therefore it is likely that \( {\text{Var}}[{\hat{\text{w}}}_{\text{TOTAL}} ] \), as given directly either by method 1 or 2 above, is not great enough to cover all the variability that is inherent in this approach. In order to cope with this to some extent, a pragmatic compound variance method is used. The variance of the filings forecast for a future year is taken as the sum of the variances, by either method 1 or 2, taken over all the forecasted years up to and including s:

$$ {\text{Var}}[({\hat{\text{w}}}_{\text{TOTAL}} )_{s} ] = \sum\limits_{{\text{l} = 1}}^{s} {{\text{var(}}\hat{\text{w}}_{\text{TOTAL}} )_{\text{l}} } $$

From this variance, 95 % confidence limits for the forecast of total filings in a future year are calculated by the usual normal assumption, \( ( {\hat{\text{w}}}_{\text{TOTAL}} )_{s}\,\pm\,1. 9 6 \ast {\text{SE}}[({\hat{\text{w}}}_{\text{TOTAL}})_{s}] \), where SE indicates standard error and is the square root of \( {\text{Var}}[({\hat{\text{w}}}_{\text{TOTAL}})] \). The confidence limits are appropriate for the predicted values of the mean. The mean is forecasted because the process uses essentially unchanging historical training data for all years up to the last year in the data set, with only one added data point per country in each successive annual forecasting exercise.Footnote 7

3 Results

3.1 Model Estimates

The model is fitted to a 28 source country-of-origin data set using their annualised EPO total filings from 1990 to 2013.Footnote 8 The linear model is fitted both to the levels data and to the first-differences data. The parameter estimates and standard errors obtained by both approaches are shown in Table 2. Following Park (2006), the intercept α0 is allowed to vary from country to country while the other parameters α1 to α5 are considered as common over countries so that pooled estimates are obtained The 5 slopes appertaining to α1 to α5 are shown in Table 2. The standard deviations of a data point estimate at the bottom of the table are similar for the two approaches. In terms of formal significance testing, several of the parameter estimates are not statistically different from zero under an assumption of normality, in that the absolute value of the estimate is more than 1.965 times its estimated standard error.

Table 2 Estimates of the log-linear model

For the model in levels, all parameter estimates are significant by this approximate test except for the second order autoregression (AR2) and R. But for the model in differences, all the country intercepts are not significant except those for BR, CN-HK, KR, NZ, PT, SG and ES. AR1, AR2 and R are technically not significant as well, although AR1 is almost significant. However, we believe that all the included variables are useful structural descriptors of the process and should remain in the model to assist in making the forecasts. The structural dependency of the model on the parameters is manifested for the model in levels, where the non-stationary process is adequately described. What is not guaranteed for levels is the extent to which the set of independent variables is causal for the filings process.

For the model in differences, the significance is also assessed at the level of differences, and it may be that effects are in fact significant when the data are transformed back to levels. We do not succumb to the temptation to remove parameters in order to obtain a reduced model where every remaining parameter is statistically significant. There are 672 observations for the model in levels and 644 observations for the model in differences, so the removal of 33 degrees of freedom for estimation does not seriously degrade the quality of the residual sum of squares that is used to estimate the error terms.

At least for the model in differences, no particular importance should be given to the sign (positive or negative) of a parameter estimate, even when it is significant. A process by which differences of an independent variable have an effect on differences in filings, particularly after transformation, can look negative when it would in fact be positive back on the scale of levels.

Common economic factors that affect patenting in several countries at the same time, as well as the pooling of parameter estimation over the countries, induces some correlations between the parameter estimates. Table 3 shows simplified parameter correlation matrices for both levels and differences. Since the 28 intercepts behaved rather similarly to each other in terms of correlations, examples are shown only for countries from the four important sources: China, Germany, Japan and US.

Table 3 Correlation coefficients for parameter estimates

For the model in differences, 21 out of the 36 distinct correlation coefficients have negative signs, while for the model in levels only 9 have negative signs. In general, the correlation coefficients between parameters are far less significant for the model in differences than for the model in levels. This is an indicator of stationarity for the model in differences. Looking more deeply at the model in differences, the correlations between country intercepts are positive but generally not significant, although there are relatively high correlations between CN-HK and US; and between CN-HK and KR (0.340 not shown), although not especially between KR and US (0.083 not shown). R has negative correlations with country intercepts, including quite a high negative value with CN-HK and with KR (−0.175 not shown). The EPO filings from CN-HK and KR have been growing more strongly over the period than from most other countries, which may have led to these effects. Rapid development of EPO patenting from CN-HK and KR may also be reasons for their significant intercepts in Table 2 in terms of the model in differences.

3.2 Model Forecasts

Table 4 shows the fitted values or forecasts for total filings \( {\hat{\text{w}}}_{\text{TOTAL}} \), for years 2014–2019, together with estimates of their variances \( {\text{Var}}[{\hat{\text{w}}}_{\text{TOTAL}}] \) by the two methods. The forecasts depend on the future levels of the independent variables R, Y T and u that are assumed.Footnote 9

Table 4 Total filings forecasts by the model in levels and first-differences

Figure 1 shows a comparison between the actual and predicted values in levels and differences for the years 2014–2019. The total filings numbers are reported first, and then the filings from the important countries of origin CN-HK, DE, JP and US (as in Table 3). 95 % confidence intervals are calculated for each individual forecasted year, using the second variance method discussed in Sect. 2, and the forecasts and limits are connected over time by smoothed lines using Excel. Both models give optimistic forecasts for total filings. The model in levels proposes 368,272 in 2019, a compound annual growth rate of 6.1 % from 257,747 in 2013. The model in differences proposes 388,887 in 2019, which represents a compound annual growth rate of 7.1 % from 2013. Both models suggest low growth from 2013 to 2014 however (1.7 % for the model in levels and 3.9 % for the model in differences). Thereafter the two models roughly agree until about 2016, after which point the model in differences starts to move ahead. Forecasts towards the end of the period are of course very uncertain, so details of differences between the models out there should not be over-interpreted.

Fig. 1
figure 1

Filings forecasts by the model in levels and the model in differences. Notes Black lines are the forecasts and grey lines are the 95 % confidence intervals for the forecasts. a Total filings by model in levels. b Total filings by model in differences. c Filings from China. d Filings from Germany. e Filings from Japan. f Filings from United States of America

The confidence intervals widen towards the end of the forecasting period in both models. Standard error 1 is always less than standard error 2, which may indicate a tendency for positive covariances between forecasted filings for countries. At the beginning of the period, the relative sizes of the standard errors 1 and 2 are more similar for the model in differences than for the model in levels (in 2014 the ratio is 1.14 for levels and 1.07 for differences), but this situation is reversed at the end of the period (in 2019 the ratio is 1.19 for levels and 1.15 for differences). This may be related to the improved stationarity of the times series when using the model in differences.

Regarding the country forecasts in Fig. 1, there are some variations between the models for levels and differences, with levels giving higher forecasts than differences for China and United States, but differences giving higher forecasts than levels for the other two countries that are shown. For Japan (Fig. 1e), the forecasts for the model in levels decline while those in differences increase after experiencing a downward “kink” in 2014. However, this difference may not be significant because of the overlap between the confidence intervals until 2019. The strong level of projected growth for China in Fig. 1c should be noted, where about 39,000 additional filings per year are expected by 2019 compared to 2013, according to the model for differences, which makes a strong contribution to the overall increases of total filings that are envisioned in Fig. 1a, b.

4 Alternative Perspectives

This section briefly addresses some open issues, with the goal of checking for robustness and of suggesting extensions to this study. In particular, it examines three alternative ways to study the effects of business fluctuations on patenting filings at the EPO. The first is to examine quarterly data. The second is to capture cyclical movements in R&D. The third is to examine data by technological field (or sector).

Table 5 reports the results from estimating the EPO filings per-worker model on quarterly data. Column 1 shows the results for the period 1978–2011 and column 2 for a truncated period. For the quarterly data analysis, seasonal effects are controlled for by using quarterly dummies. Quarterly data are relatively noisier than annual data. Cyclical fluctuations may be small year to year, but within each year, there may be quite a bit of economic disturbances from month to month or quarter to quarter. However, what the results seem to indicate is that while patenting can vary with the business cycle, it does not vary strongly with short frequency business cycles. The ‘u’ variable is insignificant at explaining EPO filings.

Table 5 Main regression model estimated using quarterly data

The reason that short frequency business cycles are not a strong determinant of EPO filings is that quarter-to-quarter disturbances are relatively transient and unpredictable. It does not seem prudent for businesses to make innovation decisions based on such events. The objective of innovation is to develop new products and processes to improve efficiency or productivity, maintain or expand market share, and strengthen the brand and core competencies of the firm. Innovation therefore typically targets a firm’s long run profitability, which is why patenting and innovation are expected to depend, in general, on longer run factors, but as was seen with the results earlier with the annual data and discussion of the theory, year-to-year business cycles can nevertheless affect the availability of resources for innovation (or potentially the opportunity cost of innovation)—hence the sensitivity of annual patent filings to long frequency business cycles.

But at the quarterly data level, the time interval is fairly short. Firms typically seek other means to gain or maximize shorter run profits, such as financial or portfolio investments, derivatives, swaps, pricing strategies (such as discounts, non-linear pricing, price discrimination), or employment strategies (using contractors and temporary workers). Firms generally do not use R&D investments to manipulate and influence short run profits. Principally, R&D and patenting strategies are for time horizons longer than a few quarters—hence the reason why business cycles are more likely to affect patenting across years rather than across quarters.

The next issue is whether other aspects of the model also undergo cyclical shocks. In the analysis thus far, only GDP was separated into its trend and cyclical components. It may also be possible for business R&D expenditures to have a cyclical and a trend component. This was investigated, and the results are shown in Table 6. The table introduces two new variables upon applying the Hodrick-Prescott (HP) filter to R&D: trend R&D per worker and the ratio of cyclical R&D to trend R&D (uR). The uR variable is statistically insignificant at conventional levels, whether the sample period is 1978–2011 or the shorter one of 1990–2011. It is also insignificant whether we omit or control for business cycles in GDP. Thus, for now, cyclical movements in R&D do not seem to affect patent filings at the EPO. This suggests that patentable innovations are a function of long run R&D programs and are not influenced by short run boosts or declines in R&D funding. But these results are for all countries and sectors pooled. It remains to be seen whether the EPO filings of individual source countries or individual sectors are more sensitive to short run movements in R&D.

Table 6 Extensions: cyclicality of research and development

A third issue is whether different technological fields are affected differently by business cycles. Table 7 offers a preliminary look using the International Patent Classification (IPC) sectors. According to the results shown, EPO filings in Human Necessities (IPC A), Physics (IPC G), and Electricity (IPC H) are mildly procyclical (i.e., the coefficient of ‘u’ is statistically significant at the 10 % level), but filings in Chemistry (IPC C) are strongly procyclical (where the positive coefficient of ‘u’ is significant at the 5 % level). For all the IPC sectors except textiles (IPC D), EPO filings are strongly influenced by trend GDP. Still, the eight IPC sectors A to H are nonetheless very broadly defined. Each of these sectors consists of diverse fields of technology. Thus, future research could study the effects of cyclical shocks at a more disaggregated IPC level.

Table 7 Extensions: business cycles by IPC sector

Another related issue is whether business cycles affect the EPO filings of different source countries differently. For good country-by-country analyses, more time-series observations are needed. Tentatively, therefore, this section provides a sample of results for the U.S., a major contributor to EPO filings. As shown in Table 8, business cycles have a positive and statistically significant effect on total U.S. filings at the EPO, where by ‘total’ it is meant aggregate sectors (see column 1). By individual sector, U.S. EPO filings in Performing Operations and Transportation (IPC B), Mechanical Engineering, Physics, and Electricity (IPC F, G, and H) are all significantly pro-cyclical. These results are for the annual sample. The quarterly sample results are quite similar (results not shown); the only difference is that the filings in Human Necessities are also significantly pro-cyclical in the quarterly sample. For EPO applications, the business cycle variable ‘u’ is only statistically significant for Electricity (IPC H), and only at the 10 % level (the results are also not shown to conserve space). With more data, a richer analysis of business cycles at the national level can and should be pursued.Footnote 10

Table 8 Extensions: EPO Filings by the U.S. and Cyclical Shocks

5 Other Applications

A side benefit of the modelling approach taken here is that the common effects over source countries for EPO filings can be analysed. Table 3 looked at correlations between countries in terms of their estimated intercept parameters. Another way is to determine the correlation matrix of transformed filings estimates between countries (\( \hat{\Xi} \) in Sect. 2). This may indicate common behavioural patterns between applicants in groups of countries that could be associated with the overall degree of industrialisation, possession of common industries with similar patenting behaviours, geographic proximity, and so forth. Table 9 shows the derived country correlation matrix for 2013 between the major source countries for the model in differences.

Table 9 Correlation Coefficients between countries for one-year ahead ‘transformed’ filings

The correlations between these forecasts for total filings in 2013 from the most important countries are low and can be considered to be non-significant. From the full 28 country set, pairs of countries that have the largest positive correlations are those of Ireland with Greece (0.29) and with Spain (0.21); Portugal with Singapore (0.15); and China with Denmark (0.14). Pairs of countries with negative correlations are less prominent, the largest in size being Greece with Other countries (−0.13 – 0.09). While it may be tempting to suggest causal inferences about the relationships between these countries, one should be cautious about the power of the analysis and the lack of clarity regarding possibilities for common external causes for the associations. At any rate, such interesting questions about determining communalities of patenting behaviour between countries of origin go beyond the narrow requirements of forecasting for EPO’s budgetary purposes.

6 Concluding Thoughts

The usefulness of the selected type of models has been demonstrated on historical EPO total filings data, and we recommend applying the model to year-to-year differences rather than to levels. We caution that the apparent widths of the 95 % confidence limits in diagrams such as Fig. 1 are somewhat too low, especially for the later parts of the forecasted horizon. In general, the quality of the forecasts remains dependent on the assumed future levels of independent variables for GDP and R&D, the uncertainties of which are not explicitly considered in the confidence intervals.

For future research, it would be useful to further develop the model at the industry level or at the individual country level, as more observations become available. It would also be interesting to examine how business cycles in one country are transmitted internationally to affect innovation activity in another country. Specifically, the R&D and patenting of non-European countries may be affected not only by business cycles occurring in their own countries but also by fluctuations occurring in Europe. Other directions are more technically-oriented; for example, to study and incorporate the expectations formation of patent filers. EPO filings and applications may be driven by their expectations of future business cycles. In the models studied here, the contemporary value of ‘u’ entered the regression models. It may be the expected ‘u’, say u e (the expected value of u in some future period), that affects current patenting decisions. Likewise, applicants may be affected by their forecasts of future trend GDP or future R&D budgets in designing their current innovation strategies. Modelling the expectations process is rather complex but should be worth exploring.Footnote 11 Another technical approach is to apply dynamic panel data methods, as in Hingley and Park (2015b), to account for the lagged dependent variables and country fixed effects.

Insofar as the aim of such research is to improve filings forecasts for the EPO budget, any method must take account of all the filings even where breakdowns of the data are considered separately or in parallel. Wider research into patenting behaviour can relax this requirement to some degree. The approach described here may turn out to be useful for such studies, including extensions to separate out particular patenting sectors or regions.

Finally, we believe that the way that the lognormal data were analysed here may be of some interest for wider areas of application. While the central limit theorem implies that lognormal data can indeed be described approximately in terms of normal error theory and the usual normal confidence limits, either care should be taken that the data sets are large enough for this or otherwise an approach such as the one described here should be used.