Introduction: some technological inventions are more influential than others

While the importance of technological innovation is widely acknowledged in terms of value creation on the level of firms and economies as a whole, the nature and impact of technological inventions vary widely. Some new technologies imply a relatively small extension of prior art (i.e. prior technological inventions), while others are highly novel and disrupt or reshape the technological landscape. A large number of new technologies never reach the commercialization phase whereas others allow companies to grow at impressive rates or even stimulate the creation of new industries. For instance, Scherer and Harhoff (2000) found, in the case of eight samples of company and university owned patents, that 10 % of the patents in the sample generated 48–93 % of the total returns. A variety of concepts have been advanced to delineate important technological inventions ranging from radical, revolutionary and breakthrough to discontinuous and disruptive. Technological breakthroughs or radical inventions introduce new concepts that depart significantly from past practices, have the potential to disrupt existing markets, generate new markets, and elicit follow-up innovations. Thus, they can be seen as critical building blocks of a company’s or a nation’s creative destruction capacity and as a key determinant of long-term economic growth.

In general, the definitions used in the management literature tend to characterize the differential nature of inventions both in technological and in economic/financial terms. Adopting a technological perspective, radical inventions rely on a different set of science and engineering principles than previously existing technologies (Henderson and Clark 1990), and/or incorporate substantially different core technologies (Chandy and Tellis 2000). Incremental inventions, in contrast, improve and extend existing technology. Henderson and Clark (1990) introduced the notion of architectural innovation in which core components remain unchanged but are linked differently in a new architecture. Radical innovations, according to their classification, are those where not only the concepts are linked together differently but the core concepts themselves are overturned. Along the economic and financial dimension, technological breakthroughs are listed as adding significant new value to the marketplace or through their impact on competitive dynamics. For example, Tushman and Anderson (1986) defined a technological breakthrough as an order-of-magnitude improvement in the maximum achievable price-versus-performance frontier of an industry. Finally, breakthroughs have been defined in terms of the profound impact they have on firms, industries and markets. Utterback (1994) defined radical innovations or discontinuous change as “change that sweeps away much of a firm’s existing investments in technical skills and knowledge, designs, production technique, plant and equipment,” and Henderson (1993) described an innovation as being radical when it renders a firm’s information filters and organizational procedures (partially) obsolete. In addition, a number of concepts closely related to radical innovations are popular in the management literature. Tushman and Anderson (1986) classified technological breakthroughs as either competence-enhancing or competence-destroying, depending on whether they either reinforce or destroy established firms’ existing competencies, skills, and knowledge. Technological breakthroughs are also described as inventions that serve as the basis for many subsequent technological developments (Fleming 2001; Ahuja and Lampert 2001) and, as such, shape the development of fields and related industries. Christensen (2003) focuses on disruptive technologies and their implications for established firms in an industry. A disruptive technology will have features that initially only a fringe market segment will value. It redefines the performance trajectory (e.g. in the case of the disk drive industry, shrinking the size of disks). These disruptive technologies need not be radical in nature: in fact, Christensen notes that, in general, disruptive innovations are technologically straightforward.

In the evolutionary economics tradition, radical innovation is commonly evoked in typologies that attempt to characterize the degree of innovativeness of a product or a process (Dosi 1982). Freeman (1992) proposed a taxonomy for technological innovation involving four levels of change: incremental innovation, radical innovation, changes of technical systems, and changes of techno-economic paradigms. Radical innovations, according to Freeman, are discontinuous as they introduce far-reaching changes in technology and affect different parts of the economy, ultimately leading to entirely new sectors.

In the last two decades, we have not only witnessed the introduction of a variety of definitions, but a number of patent-based indicators have been advanced to assess the nature and value of patented technological inventions. Patents contain detailed information on the nature of the technology and leave a trail of patent citations, backward citations (i.e. the citations made to prior patents) and forward citations (i.e. the citations received from future patents). This information allows us to trace elements of the origin of technologies as well as their influence on future generations of technologies (when they directly or indirectly serve as prior art). In addition, patent citations provide indications of the economic value of patents (Griliches 1984; Jaffe and Trajtenberg 2002). Our contribution builds on the most notable patent-based indicators used in the literature to assess the nature and value of patents, and aims to assess which indicators allow us to identify the most important inventions that shape the development of a technological field. In order to do so, we identified major contributions within the field of biotechnology (time period: 1976–2001). In a subsequent step, the different indicators used in the literature are calculated for all granted USPTO biotech patents (time period 1976–2001). Finally, we rely on logistic regression models to assess which indicators are able to identify the most influential patented technologies. Our findings reveal that combining available indicators results in recall rates exceeding 68 % while precision amounts to 84 %. Ex-post indicators measuring technological impact and economic value clearly outperform ex-ante indicators reflecting the nature and novelty of an invention.

The remainder of the paper is outlined as follows. First, we introduce the data and indicators used in this analysis. Next, we discuss the descriptive statistics and results from multivariate analysis. We conclude with a discussion of the implications as well as the directions for further research.

Patent indicators that assess the nature and value of technological inventions: an overview

Different indicators relying on patent data have been used in the literature to assess the nature and impact of patented inventions.

Patent-indicators to assess the nature and novelty of technological inventions

To assess the nature of an invention, patents can be compared in terms of backward citations, technology classes or both. Patents without backward citations to technical prior art have been labeled ‘pioneering’ (Ahuja and Lampert 2001) while dissimilar patents have been defined as having backward citations that are different compared to prior patents in the same field (Dahlin and Behrens 2005). The originality of a patent can be identified through a patent’s backward citations with original patents relying on prior art from a broad range of technology fields (Trajtenberg et al. 1997). Finally, more creative inventions have been identified as displaying novel pairwise combinations of technology subclasses or components at the patent level (Fleming et al. 2007).

Dissimilar and unique backward citations

A first method of identifying technologically radical inventions was developed by Dahlin and Behrens (2005) using backward patent citations to other patents. By calculating the overlap scores between the backward citations of each patent P granted in year t with all other granted patentsFootnote 1 in the same field, and averaging these overlap scores within each year relative to the grant year t, one can identify which patents have a dissimilar citation structure with respect to prior art and a unique citation structure with respect to patents granted in year t. Those patents that have low overlapping scores compared to prior art in the field are considered more inventive and unusual. Note that patents without backward citations have the lowest possible overlap score and, as such, are considered more revolutionary or pioneering (Ahuja and Lampert 2001).

New pairwise combination of technology subclasses

New technological inventions originate from recombining and extending pre-existing technological inventions (Nelson and Winter 1982; Basalla 1988). Fleming (2001) conceptualizes technological invention as a recombinant search process across the technology landscape in which inventors experiment with the recombination of technological components. He argues that a patent’s technology subclasses capture the different components used to develop the technology. Using patent data, Fleming (2001) empirically shows that breakthroughs, i.e. patents with the highest variability in forward citations (i.e. the citations received by a patent from future patents), most likely originate from the recombination of familiar technological subclasses, i.e. from subclasses with relatively more prior patents. Nevertheless, the findings show that patents re-using the same combination of subclasses as prior patents are less likely to be breakthroughs. Thus, breakthroughs most likely materialize from recombining disconnected but pre-existing technology subclasses. To identify particularly original contributions with a potentially high impact on future technology development, Fleming et al. (2007) sought to look at patents that were the first in history to recombine at least two previously disconnected technology subclasses.Footnote 2

Originality

Trajtenberg et al. (1997) developed a backward-looking measure of the ‘basicness’ of an invention. Originality captures the extent to which the nature of the research underlying the patent is based on technical prior art from a broad range of technology fields.

Patent indicators to assess the impact and value of technological inventions

To assess the impact and value of the patent, scholars have primarily used forward citations, backward citations and technology classes. The number of citations a patent receives reflects its direct impact on future technological inventions as well as its private and social value (Gambardella et al. 2008). A similar backward citation structure between a patent and later patents reflects adoption by future generations of technological inventions (Dahlin and Behrens 2005). Patents that are cited by patents from different technology fields are considered to have a more general purpose or impact (Trajtenberg et al. 1997). Finally, technologies with a novel combination of technology subclasses are adopted by future generations of inventions in case the same combination of subclasses is frequently used by future patents (Fleming et al. 2007).

Indicators relying on the count of forward citations

The most popular indicator of patent impact or value is the number of forward citations received from future patents. The number of forward citations that a patent receives is related to its technological importance (Albert et al. 1991; Carpenter and Narin 1993; Jaffe et al. 2000) as well as its social (Trajtenberg 1990) and private value (Harhoff et al. 1999; Hall et al. 2005; Gambardella et al. 2008). The distribution of forward citations is very skewed, with a large share of patents receiving no citations and a small minority of patents obtaining a large number of forward citations. This pattern resembles the distribution of the actual value of inventions. Hence, it is likely that outliers in the distribution of forward citations pertain to more important inventions. Prior research has typically identified breakthrough patents as the top 1 or 5 % in terms of citations received compared to patents with the same application year and technology class (Ahuja and Lampert 2001; Singh and Fleming 2010).

Adoption of backward citations

Besides having a dissimilar and unique backward citation structure, technologically radical patents should also have a backward citation structure that is adopted by future patents in the same field (Dahlin and Behrens 2005). The more similar the backward citations of a patent and future patents in the field, the more influence the patent has on future technological progress.

Adoption of a novel pairwise combination of technology subclasses

To assess the diffusion or adoption of a patented invention that recombines two disconnected technology subclasses, Fleming et al. (2007) look at the number of future patents that use the same pairwise combination of technology subclasses. The larger the number of future patents re-using the same combination of subclasses, the greater impact the patent has on future technological progress.

Generality

Trajtenberg et al. (1997) develop a measure of generality, capturing the extent to which the patented invention serves as prior art for a broad range of technology fields. So, while originality measures the broadness of the prior art of the invention (based on backward citations), generality captures the extent to which an invention directly serves as prior art for different technological fields (based on forward citations, i.e. citations received).

Biotechnology

Definition and short history

According to Bud (1993), the term biotechnology was coined as long ago as 1917, the year of the Russian revolution. Today, the best known definition is perhaps the one spelled out by the Organization for Economic Co-operation and Development (OECD 2005): “Biotechnology is the application of scientific and engineering principles to the processing of materials by biological agents to provide goods and services.” Biotechnology is a field that emerged from agriculture and animal husbandry in ancient times through the empirical use of plants and animals that could be used as food or dyes (McGloughlin and Re 2010). Moreover, contrary to its name, biotechnology is not a single technology. Rather, it is a group of technologies that share two characteristics: working with living cells and their molecules, and having a wide range of practical uses that can improve our lives (Keener et al. 2012).

According to Buchholz and Collins (2010), four periods can be discerned in the history of biotechnology (before 1850, 1850–1890, 1890–1950, and the period from 1950 onwards). This paper focuses on the later period, more particularly, the period from 1976 to 2001. By the 1950s, large-scale production of, for example, beer, cheese, citric acid, pharmaceuticals and other products of social and economic relevance such as antibiotics had become well established. During that time, biotechnology benefited from major public funding and made an increasing economic impact. Major technological progress was achieved during the late 1970s and 1980s, most notably due to genetic research and recombinant technologies. A milestone was the model of DNA providing the molecular basis of heredity derived by Watson and Crick with the aid of data provided by Rosalind Franklin who worked in Maurice Wilkin’s X-ray crystallography laboratory in 1953 (Watson and Crick 1953). However, the DNA revolution, as Hotchkiss (1979) termed it, penetrated slowly into technology, initially having little effect on traditional processes and products. A significant change was triggered by the introduction of recombinant DNA (Cohen et al. 1972; Cohen and Boyer 1979/1980). The emergence of molecular biology and biochemical engineering coincided with a growing industrial interest and the range of products expanded significantly. The field’s progress is reflected in the exponential rise in the number of journals devoted to biotech established in the late 1970s and early 1980s (Buchholz and Collins 2010).

The integration of applied microbiology, biochemical engineering and molecular biology led to the creation of biotechnology as a scientific discipline in its own right, with a common paradigm at the level of molecular research. Sub-disciplines such as genomics, transcriptomics, proteomics, metabolic flux analysis with quantitative analysis of complex metabolic pathways and, finally, biochemical engineering and bioinformatics have merged to create bio-systems engineering (Sinskey 1999; Stephanopoulos 1999; Reuss 2001).

Identification of major contributions that shape the field of biotechnology

In order to identify the most important technological developments that have shaped the evolution of biotechnology, we relied on secondary sources including books, journal articles, websites of inventors, academics, companies and research institutes, and expert reports. Amongst those sources, we principally relied on scientific books providing a consistent and exhaustive overview of major technological accomplishments in biotechnology or a particular subfield of biotechnology. An exhaustive overview of the major sources used in this respect can be found on-line (see Supplementary Material). We verified multiple secondary sources to strengthen the overall consistency of our list of important inventions, since any account may well be conditioned by the personal interests or values of the authors. We concentrated initially on events labeled as discontinuous, pioneering, important, breakthrough, revolutionary, radical, drastic, cutting edge, fundamental, groundbreaking, dramatic, leap forward and original, among others. In particular, we searched for those inventions that were described as contributing to the evolution of biotechnology, highlighting fundamental leaps on certain key research trajectories or establishing new ones that were clearly stated by authors.

Relating important contributions to patents

After carrying out a comprehensive screening and assessment of technological inventions that shaped the field of biotechnology, we systematically searched for patents and publications associated with those inventions. Using information on the description and timing of the invention, the associated researchers, institutions and/or companies, we searched for corresponding patents and publications in the USPTO patent database and the ISI Web of Science (WOS), respectively. In some cases, we found more than one corresponding patent and/or more than one corresponding publication, whereas we were unable to find publications or patents for a minority of important inventions. Of the 214 important inventions identified, 117 (55 %) were found in the USPTO patent database while 153 (71 %) were found in the WOS database as scientific publications. For 37 (17 %) of the events, we found at least one corresponding patent but no publication, for 72 (34 %), we found at least one publication but no patent while for 80 (37 %), we found patent-paper pairs. For eight events, we found multiple corresponding patents. For 25 (12 %) contributions, neither a patent document nor a scientific publication (present in the Web of Science) has been identified. A detailed list of all major inventions considered in this analysis as well as the related USPTO patents can be found on-line (see Supplementary Material). Notice that a number of inventions resulted in multiple USPTO patent documents. All patents that fall within the relevant time window (filed between 1976 and 2001 and granted before 2004) and that are situated within the biotechnology domain (see infra) have been included in the subsequent analysis (n = 122).

Data and findings

Sample selection

To identify all USPTO biotechnology patents, we made use of the OECD classification scheme that relies on IPC codes (OECD 2005). Data have been extracted from the Patstat patent database (version October 2011) and include all patents filed at the USPTO between 1976 and 2001 and granted before 2004, which fall into at least one of the IPC classes. The final sample used for analysis consists of 84,119 patents. From the 84,119 patents, 122 have been identified as relating to 117 important inventions that shaped the field of biotechnology. For the calculation of citation-related indicators, we employed the updated NBER patent database.Footnote 3

Variables

Dissimilar, unique and adopted backward citations

We follow the methodology of Dahlin and Behrens (2005) and calculate, for each patent P granted in year t, the average annual overlap scores between the backward citations of P with, respectively, all other patents filed in the same field (3-digit US technology class) within a time window of 5 years before and 5 years after the grant year of P (i.e. patents granted between t − n and t + n with 0 ≤n ≤5). We extend their methodology by comparing a patent to all other US granted patents with, at least, one similar technology class. So, for each of a patent’s 3-digit technology classes, we follow the methodology outlined in Dahlin and Behrens (2005). First, we label P as dissimilar compared to prior art when the average annual overlap score is 0 or the average standardized annual overlap score is smaller than or equal to the 10th percentile of all patents for each year tn with n > 0 and n ≤ 5. Due to truncation, not all patents have a time window of 5 years before (and after) grant. For patents that we can only observe three or 4 years before grant, we require the patent’s average overlap score to be 0 or its standardized average annual overlap score to be equal to or smaller than the 10th percentile threshold for each of the observed years before grant. Patents that cannot be observed at least 3 years before and after the grant are not taken into account during the analysis. Following this methodology, 47 % of the patents in our sample have a dissimilar citation structure. Second, a patent is labeled ‘unique’ when the average standardized overlap score in the year of grant is below or equal to the 10th percentile threshold. We find 67 % of the patents to pass the uniqueness criteria. Third, a patent is labeled ‘adopted’ when the annual overlap score passes the 90th percentile threshold for each year after grant. 8 % of the patents in our sample pass the adoption criteria. Finally, 2.2 % of the patents pass all three criteria.

New and adopted pairwise combination of technology subclasses

To identify patents that recombine two technology subclasses for the first time in history, we use the 2008 US technology subclass concordance to investigate all technology subclass assignments of all US-granted patents in order to identify all first pairwise subclass combinations. For each new subclass combination, we count the number of future patents re-using the same pairwise combination. In our sample of biotech patents, we find 45 % of all patents displaying a new combination of technology subclasses with, on average, 49 future patents re-using the same combination.

Originality and generality

In line with Trajtenberg et al. (1997) and Hall et al. (2001), we calculate originality and generality using a measure reflecting the concentration of backward and forward citations, respectively, within technology classes. Originality is calculated as 1 − bias-correctedFootnote 4 Herfindahl index of the technological classes (main) of all cited patents. Generality is calculated as 1 − bias-corrected Herfindahl index of technological classes (main) of all citing patents. The average originality score of the biotech patents in our sample is 0.52 while the average generality score is 0.51.

Indicators relying on the distribution of forward citation counts

To identify patents with the largest impact on future technologies, we calculate, for all granted US patents, the count of forward citations as the number of US patents citing the patent (citations from patents granted until 2006 inclusive) and the truncated count of forward citations as the number of citations received within 5 years of application. Prior research has typically identified breakthrough patents as the top 1 or 5 % in terms of citations received compared to patents within the same application year and technology class (e.g. Ahuja and Lampert 2001; Singh and Fleming 2010). This definition assumes that each technology field has a fixed share of high impact inventions each year and does not compare patents across years. To avoid a definition that forces a fixed proportion of breakthroughs every year into each class while allowing similar patents to be compared across years, we follow the methodology of Arts (2012) and consider the distribution of both forward citations received within five years and the distribution of forward citations received from all future patents. We use the full count of forward citations to compare all patents sharing at least one 3-digit US technology class filed in the same year and the truncated citation count to compare all patents sharing at least one technology class irrespective of time of filing. For each of the distributions, we calculate the mean and standard deviation of (truncated) forward citation counts. A patent is labeled as having a high impact on future generations of inventions when both its truncated and full count of forward citations are larger than the mean plus n times the standard deviation in at least one of its technology classes. So, for each patent’s technology class, the patent is compared with two distributions: the distributions of full and truncated forward citation counts. Using a 1, 2, 5 standard deviation rule to identify outliers in the distribution of forward citations, we find that, respectively, 7, 3, 0.5 and 0.1 % of biotechnology patents in our sample are labeled as having a disproportionate impact on future patents.

For all USPTO patent documents under study we calculate the abovementioned indicators. An illustrative example for the inventions related to recombinant DNA and polymerase chain reaction can be found on-line (see Supplementary Material). As a close inspection of both patent documents and the implied indicators reveal, not all indicators signal discriminatory values for these important contributions. Whether this is also the case for the sample as a whole, a more systematic, multivariate, analysis will reveal (see Sect. “Multivariate analyis”).

Descriptive statistics

Table 1 gives an overview of descriptive statistics for the set of major technological inventions and for the control patents including a mean-comparison T test between the two groups.

Table 1 Descriptive statistics

In terms of indicators reflecting the origins and nature of the patented invention, we do not observe significant differences in the proportion of patents without citations to technical prior art. By contrast, most important inventions seem to more frequently cite other patents compared to the control group (ten backward patent citations compared to six on average). Furthermore, major inventions have more dissimilar backward citations (55 % of dissimilar patents compared to 47 % for the control group) and rely on more recent technical prior art with an average backward citation lag of 6.1 versus 7.5 years for the control group. Both groups are not different in terms of originality indicating that important biotech patents do not rely on prior art stemming from a broader range of technology fields. Nevertheless, patent documents associated with important technological inventions contain a larger number of technology main classes and subclasses, so they seem to cover a larger part of the technology landscape (2.5 main classes and 8.0 subclasses on average compared to 2.2 and 6.2 for the control group, respectively). Finally, important inventions display a much larger number of citations to non-patent literature (44 vs. 22 on average), are more likely to have a novel pairwise combination of technology subclasses (66 vs. 45 % on average) and contain a larger number of claims (23 vs. 15 on average). In conclusion, patents associated with major technological inventions cite more patents, cite more recent technical prior art, contain more references to non-patent literature, and have dissimilar backward citations compared to prior art in the same field but do not rely on prior art from a broader range of technology fields. Nonetheless, major patents seem to be more novel and serve a more general purpose by covering more technology fields, subfields and claims, and are more likely to combine previously disconnected technology subfields.

Besides being based on a different set of science and engineering principles and/or incorporating substantially different core technologies, the most important technological contributions are expected to have a higher and broader impact on future technology trajectories. In line with expectations, patents associated with important contributions receive significantly more forward citations on average (146 forward citations and 36 forward citations within 5 years compared to 7 forward citations and 3 forward citations within 5 years for the control group). Accordingly, looking at outliers in the distributions of forward citations we observe that 74 % of the important patents are 1 SD outliers compared to 7 % for non-radical patents, and 29 % are 10 SD outliers compared to only 0.1 % for the control group. Besides serving more extensively as prior art for future generations of inventions, they also tend to remain cited for a longer time with an average forward citation lag of 7.6 years compared to 5.9 for the control group. Furthermore, patents associated with important inventions seem to serve as prior art for a broader range of technology fields, reflected by an average generality score of 0.74 (compared to 0.51 for the control group). Likewise, for patents that make at least one new pairwise combination of previously disconnected technology subfields, the pairwise combination of important contributions is adopted by a much larger number of future patents. On average, 1,526 future patents will use the same component configuration compared to 46 future patents for the control group. Also, the backward citations of major patents are more likely to be adopted by future patents with 36 % of important inventions having adopted backward citations compared to only 8 % for the control group.

Finally, Dahlin and Behrens (2005) suggest the use of a composite measure to identify technologically radical inventions. Besides having a backward citation structure that is dissimilar to prior art and becomes adopted by future patents, they add an additional uniqueness criteria, i.e. having backward citations different from patents granted in the same year in the same technology field. We find important contributions have a backward citation structure that is less unique with 59 % of the important patents satisfying the uniqueness criteria compared to 67 % for the control patents. According to the authors, technologically radical inventions should satisfy all three criteria. We find that 14 % of the most important inventions satisfy all three criteria compared to 2 % for the control group.

Table 2 presents the correlation coefficients between the most notable indicators. The dummies representing outliers in the distribution of forward citations correspond most with being an important contribution to the field, particularly 5 and 10 SD outliers. Furthermore, the number of future patents re-using the same pairwise combination of subclasses displays a strong correlation.

Table 2 Correlation matrix

In conclusion, the descriptive results suggest that both backward-looking measures reflecting novelty with respect to prior art, as well as forward-looking measures of value and impact, signal important inventions within a field. In particular, measures reflecting an impact on future technological progress seem to reveal discriminatory power.

Multivariate analysis

Given that our dependent variable, indicating whether the patent was identified as being a major contribution to the field, is binary (0/1), we use logit models to assess the discriminatory power of the different indicators. All models include technology dummies for each of a patent’s main technology classes (3-digit) as well as a set of additional control variables including the number of assignees, the number of inventors, and patent age. Note that patents in a main technology class without important contributions are dropped from the analysis during estimation. To assess the discriminatory power of the different indicators, we provide a number of statistics below each regression model in Table 3. We are particularly interested in the recall, i.e. the percentage of externally identified major patented inventions that are predicted as such, as well as in the precision, i.e. the percentage of patents that are predicted to be very important and that actually prove to be so.

Table 3 Identifying important contributions: full sample USPTO biotechnology patents

Table 3 presents the results obtained for the full set of patents. Note that important contributions only represent 0.15 % of the total sample, which seriously hampers the assessment of precision and recall rates of the different models (any model that would predict all patents as not important would classify over 99 % of all patents correctly). Therefore, we present parallel results for a reduced sample of matched patents in Table 4. We generate a more balanced sample of patents by only retaining control patents with exactly the same combination of technology main classes (3-digit), application year, and grant year as, at least, one of the patents associated with a major technological invention. Treatment and control patents for which no proper matches are found are excluded from the analysis. For each of the 92 remaining major patents, we randomly sample 4 control patents among those that match and rerun the model on the reduced sample.

Table 4 Identifying important contributions: matched sample USPTO biotechnology patents

Marginal effects are calculated as the percentage change with respect to the average likelihood of being an important invention, i.e. 0.15 %. In terms of the indicators capturing the nature and novelty of the patented invention, we find recombining previously disconnected technology subclasses makes a patent 47 % more likely to be a major invention (Table 3, Column 3). Column 4 (Table 3) presents the findings for the different measures suggested by Dahlin and Behrens (2005). We find that patents dissimilar to prior art in terms of backward citations are 67 % more likely to be a major contribution to the field. Surprisingly, the uniqueness dummy has a negative impact, suggesting that important contributions have more similar backward citations to patents filed in the same year and field. Patents with unique backward patent citations are 73 % less likely to be associated with major inventions. This might indicate that similar and parallel inventions building on the same set of technical prior art are conducted during the same time period. Furthermore, we find major patents are not more original, i.e. they do not rely on technical prior art from a broad range of technology fields. In fact, in Column 8 of Table 3, originality is negative and significant. A standard deviation increase in originality reduces the likelihood of a major invention with 23 % compared to the average likelihood.

However, in line with the descriptive statistics, the number of citations to non-patent literature and the number of claims have a positive and significant impact. Surprisingly, the number of main technology classes has a negative and significant effect while the number of backward patent citations is insignificant.

For the indicators reflecting impact and value, we find that the dummies indicative of outliers in the distribution of forward citations have the most predictive power of all indicators. In Column 2 of Table 3, we find patents that are 1 SD outliers in the distribution of forward citations are 173 % more likely to be of major importance for the evolution of biotechnology, 2 SD outliers are 313 % more likely while 10 SD outliers are 693 % more likely. The stricter the criteria of being an outlier, the better the discriminative performance.Footnote 5 While a new pairwise subclass combination makes a patent 47 % more likely to be a major invention, the number of future patents adopting the same pairwise subclass combination is also positive and significant (Table 3, Column 3). A standard deviation increase in the number of future patents adopting the same pairwise combination is associated with a predicted increase of 9 % in the likelihood of being a major invention. Also, important contributions have backward patent citations that become adopted by future patents. This effect is strong; patents whose backward citations strongly overlap with future patents are predicted to be 146 % more likely to become breakthroughs, whilst we find no support for a significant impact of combining dissimilarity, uniqueness and adoption (Table 3, Column 4). Column 5 of Table 3 presents results for the originality and generality measures. While important contributions clearly serve a general purpose as technical prior art, they themselves do not seem to rely on technical prior art from a broad range of fields. A standard deviation increase in generality is associated with an expected increase of 303 % in terms of the likelihood of being a very important invention. Finally, we present the results for all ex-ante indicators reflecting dissimilarity or novelty with respect to prior art (Table 3, Column 6), ex-post indicators reflecting impact (Table 3, Column 7) and both combined (Table 3, Column 8). The results clearly signal the superior performance of ex-post indicators. Measures reflecting direct use as prior art through forward citations, indirectly reflecting adoption through subclass combination and backward citations, display considerably more predictive power. The full model (Table 3, Column 8) combining all indicators is able to identify 25 % of the major patents (recall) while 70 % of the patents predicted to be important are indeed important (precision). Note that ex-post indicators account for the recall and precision. The model with only the ex-ante indicators has a recall of 0 % (Table 3, Column 6).

Table 4 presents the regression results for the matched sample of patents. Note that being a 10 SD outlier in the distribution of forward citations is a perfect predictor of importance; not a single control patent belongs to this category. Therefore, we dropped it from the regression analysis. Thus, the calculated recall and precision are an underestimation. The obtained results are in line with the results on the full sample of patents in Table 3. In contrast to the results for the full sample, having dissimilar or unique backward citations is no longer significant. Moreover, the adoption of backward citations by future patents becomes insignificant in Columns 7 and 8. The full model (Table 4, Column 8) combining all indicators is able to identify 69 % of the important patents (recall) while 84 % of the patents predicted to be of high importance are indeed important (precision). Note again that ex-post indicators account for the lion’s share of both recall and precision.

Discussion and conclusion

Technological innovation is an important constituent of economic growth. At the same time, technological inventions vary widely in terms of nature and impact. While there has been a great interest in the development of new technologies and their commercialization, only a minority of these new technologies will contribute significantly to private and social welfare in the future. Consequently, analyzing and understanding both the discovery and the exploitation phases of technological inventions, including their differentiated nature and impact, is important for companies and countries alike. While there has been great interest in the competitive dynamics following the commercialization of inventions, large-scale empirical research on the actual discovery of important technological contributions is scarce. As noted by Dahlin and Behrens (2005), this is mainly due to the lack of reliable indicators that allow such a large-scale quantitative assessment.

In this contribution, we rely on secondary sources to identify the most important technological inventions in the field of biotechnology and relate these to US patent data. Thus, it becomes feasible to examine whether, and to what extent, patent-based indicators advanced in the literature are able to identify these distinctive technological inventions. Indicators advanced so far can be labeled ‘ex ante’ to the extent that the indicator can be calculated as soon as the invention appears as a patent publication (e.g. dissimilarity of backward citation patterns, new technology subclass combination). Available ex-ante indictors reflect—at least partly—the nature and novelty of the patented invention. At the same time, a number of indicators can only be assessed ex post, i.e. after an invention’s impact becomes visible (e.g. the number of received citations, the number of patents displaying similar citation patterns afterwards). Available ex-post indicators reflect—at least in part—the impact and value of inventions. Our results show that relying on ex-post indicators allows us to identify patents associated with the most important inventions on a much larger scale (67 % are correctly identified) and more accurately (79 % of the patents that are predicted to be of major importance are correctly classified) compared to ex-ante indictors (21 and 61 % respectively in the matched sample). The ‘ex-post’ indictors clearly outperform the ‘ex-ante’ indicators in terms of precision and recall. In consequence, some of the recent proposed indicators, which rely heavily on novelty, do not qualify as accurate predictors of important contributions to the field. In addition, our findings clearly signal potential for future research seeking to identify the nature of inventions more precisely. As currently available indicators do not directly take into account the technical content of the inventions under study (e.g. by engaging in a textual analysis of the abstracts or claims), novel, complementary indicators relying on text mining algorithms may well achieve better results (e.g. Magerman et al. 2010; Kaplan and Vakili 2012). In terms of impact, the deployment of more sophisticated, network-oriented indicators could enhance the prospect of added value. Finally, the role of non-patent references, as well as the number of claims, would appear to deserve further attention. When included as a control variable, they tend to consistently predict important contributions in a positive manner. While precision and recall rates for the control variables only are modest, it seems worthwhile to further investigate the nature of this relationship and to assess the relevance of additional indicators based on claims and non-patent references, respectively. We hope our contributions inspire colleagues to engage in such endeavors.