Introduction

This paper proposes a new, alternative approach for the analysis of patent data based on social network analysis (SNA). Our quantitative assessment allows us to fully exploit the richness of patent data as well as to extract knowledge patterns from patent inventors’ network of collaborations. Furthermore, our approach describes a methodology in order to address several issues related to inventive activities and it ultimately provides a more complete understanding of the patterns that characterize the geography of innovation.

In particular, the study proposes a new SNA-based approach (with simple and informative empirical applications) to extract knowledge patterns about patent inventors and their structure of relationships. Indeed, in our network, nodes represent inventors while edges denote co-inventorship relations. Moving from a basic network analysis, we provide new developments to map and study co-inventorship.

The advent of SNA has provided an important tool that helps researchers to better understand the impact of scientific contributions to the existing body of literature. Specifically, the analyses developed using citations data provide meaningful metrics for gauging the impact of scholars’ contributions in a certain field (Griffin et al. 2016). Wasserman and Faust (1994) emphasized that the usefulness of SNA lies in the successful recognition of the existing relationships between social entities, models, and the associations between the two. From its introductory developments in sociology in the early 1970s (Wellman and Whitaker 1974) the use of SNA has increased significantly in popularity throughout the social sciences (Otte and Rousseau 2002). Since then, SNA has been applied in many disciplines, ranging from anthropology (Boissevain 1979) to zoology and animal biology (e.g., dolphin social networks; Lusseau 2003).

Recent advances in citation indexing and bibliometric studies provide new tools for network analysis methods and their applications across social sciences as well. For instance, communication researchers have widely relied on SNA to understand publications’ patterns and trends in specific content areas such as health communication (Kim et al. 2010). Scientific publications have also been used to predict job placements in the communication sector (Feeley et al. 2011). As the current study illustrates, SNA is a tool that can be meaningfully incorporated into tailored methodology in order to rank authors. As an example, Griffin et al. (2016) chose to use eigenvector centrality rather than simply tally up the number of publications to create a ranking of authors. Kim et al. identify the structural patterns of theoretical relationships in communication sciences by using both webometric and SNA methods. In particular, they analyzed the networked structure of theories in social sciences represented by co-occurrences on the World Wide Web. To do so, co-occurrences in communication science theories were retrieved from the Web and analyzed using SNA tools. Although relational bibliometric usually examines relationships within science mainly through the Institute for Scientific Information (ISI), it has undergone a revolutionary change through the wide diffusion of new important sources of scientific communication such as scholars’ webpages and online article databases (aka Google Scholar) (Delgado and Repiso 2013; Lee et al. 2009; Park 2010; Thelwall 2009). On the one hand, the use of webometrics as a new type of bibliometric source has flatly emerged. Webometrics is widely understood as the study of web-based contents and web phenomena through quantitative methods (Thelwall 2009). SNA methods, on the other hand, constitute a set of analytical techniques for understanding the structure of scholarly communication across many academic disciplines (Chung et al. 2013). More precisely, SNA is used to identify structural properties based on relationships or patterns of relationships among interacting components (Wasserman and Faust 1994). The intuition behind such methodology is that the social ties among which actors are embedded in has important consequences for such actors indeed (Freeman 2004). Several scholars have argued that the combination of the SNA methods and bibliometric tools can provide a richer picture of the status of a given field (Leydesdor 2007). On top, webometrics offers interesting analytical developments to provide a more comprehensive understanding of the status of communication sciences. (Anderson 1996: 8).

Before going back to specific content of the current paper it is appropriate to very quickly review the notion of gatekeepers. Following Le Gallo and Plunket, we can adopt two different definitions: Giuliani and Bell (2005) and Morrison (2008) consider gatekeepers as local actors with strong connections outside their cluster and with a central position within their cluster, enabling them to exploit externally produced knowledge within their local context; instead Gould and Fernandez (1989) consider gatekeepers as individuals with unique and non-redundant ties to external actors (Breschi and Lenzi 2015; Graf and Kruger 2011).

Le Gallo and Plunket have further explored whether and how gatekeepers contribute to the performance of innovations within clusters. Using patent data and inventor networks in genomic they analyzed whether gatekeepers affect the quality of patents produced both by the teams which they belong to as well as by the teams which they are socially connected to, within regional networks. Hence they were able to separate for each individual its intra and inter-regional connections. This micro-level approach can help us to understand how the embeddedness within regional and global networks might influence inventive performance.

As a matter of fact, the analysis of patent data through SNA methods can represent a building block in investigating scientific developments and understanding the nature of innovative processes of a country. In addition, such analysis can lead policymakers to make better and more efficient investment choices. Finally, investigating inventions via patent data allows decision makers to monitor the evolution of R&D activities over time among different organizations, institutions and countries. By looking at patent networks, indeed, we may also provide a precise and detailed picture of cooperation occurring among different entities and/or countries in the R&D sector, an indicator of geopolitical developments and possibly a forecast of future patterns of innovative activities.

In several fields the amount of data available to researchers and scholars has rapidly increased over the last few years. Patents data have been growing as well. To deal with this huge quantity of information, scientists have been relying more and more on a wide range of statistical methods that belong to the domain of data science and big-data analytics. Despite new statistical approaches are starting to spread, most of the analyses in the patent literature still rely on classical methodologies.

In fact, approaches based on classical statistical methods present several difficulties for the investigation of phenomena that involve large amount of data and where variables are strictly related to each other (Tsvetovat and Kouznetsov 2011); whereas SNA appears particularly adequate to it.Footnote 1

Bearing in mind such methodological issues, differently from most other studies, we try to exploit methods drawn from data science to provide a meaningful and comprehensive elaboration of such large quantity of data. In particular, we propose a new, general approach to the extraction of knowledge patterns by looking at patent inventors and their relationships.

Thanks to this approach we can describe many features of scientific and R&D collaboration among different inventors and across countries, characterize new measures and concepts, such as the notions of “power inventor” and “clique of inventors”, as well as adapting to our analysis some SNA metrics already employed in different domains.

The paper is structured as follows. After reviewing the relevant literature in the following section, “Data, research questions and methodology” Section illustrates our data and methodology showing how we address several questions about inventors’ networks and the geography of innovation. Furthermore, we present the results of our measurements and computations for a set of countries that in our opinion yield the most interesting and representative outcomes. Then, we briefly discuss our contribution to the literature. Finally, “Concluding remarks” Section concludes.Footnote 2

Literature review

Since the seventies many researchers have expressed some considerations about the influence that many companies can have on research and on the fields of expertise. Just think of what was published by Crawford and Biderman (1970), Blumer (1967a, b), Green (1971), Horowitz (1967), McCartney (1971), Orlans (1967), Platt (1971), Rist (1973, 1975), Galliher and McCartney (1973), Furstenberg (1971) where the authors’ relationship between sponsorship by various governmental agencies and commissions, as well as private foundations, has influenced the “objectivity” of scientific investigations.

Broadhead and Rist (1976) speculated and synthesized the effect of the sponsorship of the research activity, summarizing this action in three fundamental steps:

  1. 1.

    Through detailed specification of the research issue so that the eventual problem is cast within a framework congruent with the sponsor’s perspective.

  2. 2.

    Through emphasis upon a positivistic style of research thought more susceptible to manipulation for the purposes of controlling the results.

  3. 3.

    Through the threat of withdrawing present funding and denying future support should the researcher move into areas “not in the best interest” of the sponsor.

They point out that: “The actual manifestations of social control over social research can be shown as they appear within organizational frameworks. A key component of that control organization is the small group of managers and administrators within a formal organization who screen prospective researchers seeking funding, entering into the organization itself, or accessing to data already collected. This small group of “gatekeepers” has a central role in deciding the fate of those who desire to conduct social research with someone else’s money, data, or organization.” (Broadhead and Rist 1976)

Recent studies have shown how the situation after forty years has not changed much. In Zsindely et al. (1982a, b) it was showed that, at least in chemistry, the editorial bodies of international journals are tree gatekeepers and their professional status is positively correlated with the scientific quality of the international journals in this discipline. The person of the editor-in-chief alone seems to have a much lesser in influence. They lead back this gap by the lower participation of scientists of certain countries in the editorial boards of international chemistry journals is frequently compensated by the high citation frequency of the editors coming from these countries. The higher than average citation frequency of editors from countries with scientific life of a more closed character shows that in order to receive an invitation to the editorial boards of international chemistry journals scientists from such countries must acquire a higher than average international prestige.

In Zsindely et al. authors point out the significant correlation between the number of science journal editors from different countries and the number of scientists. They argue using “the extent of participation in the editorial hoard of international science journals as a new science indicator. The deviations from the regression lines between the new publication indicator and other indicators allow one to assess the “open” or “closed” character of the scientific life of a given country”. They also considered that the critical mentality and decisions of the editors have so far protected and will also warrant in the future the social and intellectual integrity of science. Moreover Crane (1965) rightfully considers the members of such boards the publication gatekeepers of a given field.

In the field of economics and management of innovation (EMIT) there is a strong evidence that research collaborations across firms and regions are a key factor for the acquisition of external knowledge (Singh 2005), as well as for promoting innovation (Miguélez and Moreno 2013).

Several papers analyze various aspects connected to researchers’ professional relationships and their impact on innovative activities. With this regard our work aims at providing an original approach for the analysis of these relationships and for the extraction of knowledge patterns exploiting patent data. As already mentioned, the base of our analysis is on inventors and not applicants, as in Lissoni (2012). Differently though from Hingley and Bas (2009) for instance who investigate the evolution of both the number of applicants and the distribution of applicant size.

Patent data and innovation

Economic studies of innovation have for long made use of patent data (Griliches 1990; Nagaoka et al. 2010). Assisted by digitalization of records and increasing computational power economists and other social scientists have extracted increasing quantities of information from patent documents: the applicants’ identity and location, the technological contents of the invention, or the latter’s impact, as measured by citations. More recently information on inventors has attracted a good deal of attention. Identifying inventors allows studying their mobility patterns, both in space and across companies (Agrawal et al. 2006; Marx et al. 2009) as well as their social capital, as measured by their position in co-inventor networks (Fleming et al. 2007; Breschi and Lissoni 2009; Lissoni et al. 2010). Inventor data can also be matched to additional information at the individual level, ranging from professional identities (does the inventor appear also on a list of R&D employees or a list of academic scientists?) to other type of archival data on knowledge related activities (such as scientific publications; see Azoulay et al. 2009; Breschi et al. 2008; Lissoni et al. 2008). Identifying inventors within any given set of patent data, as well as matching them to any other list of individuals, requires the elaboration of complex disambiguation algorithms. They are necessary to analyze in a non-trivial way the text strings containing the inventor’s names, surnames, and addresses. Yet, it is only of late that users of inventor data have started discussing openly about the disambiguation techniques they employ and examine their implications in terms of data quality and reliability of the evidence produced (Raffo and Lhuillery 2009; Li et al. 2014).

Over the past decade universities and public research organizations have devoted increasing attention to the issue of Intellectual Property (IP), as part of a general trend towards a greater commitment to work with industry, market their research results and promote entrepreneurship based on science. The Academic Patents in Europe (APE-INV) coordinated the collection of data on micro- inventors, scientists and patents. Moreover, it promoted the production of studies on a national basis across Europe. APE-INV has also promoted an operational notion of “Patent academia”, centered on the identity of the inventor (the academic researcher) rather than that of the assignee (the Researcher University). Most academic patents are assigned to be commercial companies, both as a result of the search sponsorship or market transactions. APE-INV has also actively promoted a debate on the economic value of academic patents, the role of IP law and university autonomy in determining the propensity of universities to invest in patents, or relationship between the productivity of scientists and their activities related to patents.

Miguélez and Moreno (2013) document the existence of a positive correlation between cross-regional networking and innovation in Europe. Similarly, Breschi and Lissoni (2009) find that network activity across firms and locations is in large part responsible for the localization of knowledge flows. Nicholas (2009) analyses the relationship between inventors and the locations of R&D laboratories of firms, identifying a strong positive effect of distance from a laboratory where the average quality of externally available inventions is high.

Thelwal (2014) explores the relationship between geographic distance and triadic closure, which represent two important forces driving the evolution of collaboration networks. By analyzing the dynamics of inventor networks in German biotechnology sector she is able to show—both theoretically and empirically—that inventors increasingly rely on network resources, forming links with partners of partners. On the other hand, the direct impact of geographic distance on the formation of ties is decreasing: initially the triadic closure reinforces the effect of geographic distance by closing triads among proximate inventors, then it becomes an increasingly powerful vehicle to generate longer-distance collaboration ties, and so the effect of geographic proximity decreases.

Guellec and Pottelsberghe de la Potterie (2001) present three new patent-based indicators of technology internationalization that reflect international cooperation in research and the location of research facilities of multinational firms. Exploiting these indicators, they find that there is an increasing trend towards technological globalization in the OECD area and that Nordic countries have a particularly high propensity to collaborate together.

Ponds et al. (2010) investigate the effect of knowledge spillovers from academic research on regional innovation. They found that this effect is mediated not only by geographic proximity, but also by network stemming from university-industry collaboration. They also show that this type of collaboration is not limited to a regional scale.

Patent data and international collaboration

The importance of knowledge spillovers is well known in the literature (see, for instance, Griliches (1992). Especially at the international level they can play an important role in reducing the growing technological gap among the most and the least sophisticated economies. To investigate the impact of international knowledge flows and their effects many studies focused on international R&D collaborations, cross-border inventions (i.e., inventions involving people coming from different countries) and on their impact on innovation quality.

At the theoretical level, there are some authors who argued that these collaborations could induce higher-quality innovations thanks to the combinations of different skills and knowledge (Levinthal and March 1993; March 1991). Other authors suggest instead that international collaborations may be not efficient due to high coordination costs and difficulties in integrating knowledge coming from different research teams, Furman et al. (2006); Grant (1996); Singh (2008).

At the empirical level, results are mixed. Branstetter et al. (2014) investigate Indian and Chinese inventors and finds that cross-border inventions receive more citations - and, hence, are more valuable—than patents filed by inventors of only one country. Montobbio and Sterzi (2011) show that innovative activities of Latin American countries have largely benefited from R&D performed on some OECD countries. Recently, Giuliani et al. (2016) find that cross-border inventions between BRICS firms and EU actors are growing and are more valuable than domestic ones, representing an opportunity to accumulate technological capabilities. Alnuaimi et al. (2012) also finds that patent quality is positively influenced by international collaborations. However, it also evidences the difficulties of research teams in the absorption of external knowledge. Hence it is not clear whether such collaborations may enhance the accumulation of technological competencies in developing countries.

As for developed countries Kim et al. (2009) investigates the international transmission of knowledge in USA. It finds that the number of US firms that collaborate with researchers having foreign experience has increased. It also shows that these firms have an easier access to non-US technological know-how than firms not collaborating with inventors characterized by foreign research experience.

Menon (2014) also analyses the US scenario, and in particular the effect of top inventing companies on local inventors. His aim is to check whether the patenting activity of the most inventive companies have any causal effect on other local inventors: his results show that this effect is not bounded within narrow technological categories. Developing an original approach based on SNA, our work goes in this direction.

Patent data, international collaboration and SNA

The literature that studies patent data and dynamics of innovative activities using SNA is still small yet rapidly growing, indicating that it may be a promising and insightful approach. In an early study, Balconi et al. (2002) show that both classical and advanced contributions to the economics of knowledge—as well as most of the empirical research on contributions of universities to commercial innovation—can be investigated using SNA.

Lissoni and Miguelez (2014) study the geography of innovation analyzing patents. They show how localized knowledge flows are largely mediated by labor and technology markets and how social distance among inventors, along with inter- and intra-national borders, affects the diffusion of knowledge. Lubango (2015) finds that both the links and the H-indexes of co-inventors and co-authors highly enhanced the flows of academic knowledge into industrial patents in South African firms, as well as the diffusion of knowledge in large R&D and innovation clusters. Hsueh and Wang (2009) also propose a network-based analysis of patent data, applying it to the field of Liquid Crystal Display. They aim at understanding how knowledge flows spread among institutions and countries. In particular, their approach identifies the key influential players, knowledge spillovers patterns among them and the overall knowledge spillover efficiency. Wagner and Leydesdorff (2005) hypothesize that international collaboration is a self-organizing network, exploiting SNA to show that growth of international co-authorship can be explained on the basis of the organizing principle of preferential attachment, even if the attachment mechanism deviates from an ideal power-law.

Furthermore, Chen and Guan (2016) use patent citations data and SNA to show that linkages among core and peripheral nodes are sparse, more so for linkages between peripheral nodes only. Linkages between core and peripheral nodes though are reported to be important to building innovation capabilities and technological catch-up.

Forti et al. (2013) analyze academic inventors’ neighbors to investigate the contribution of social networks to the generation of inventive ideas within university. They show that the ego-networks of inventors are more cohesive than the ones of non-inventors. This is probably due to higher climate of trust. They also find that both inventors and non-inventors extend their network and become more central over time, as well as no evidence that after patenting inventors close their networks. Ejermo and Karlsson (2006) study the structure of interregional inventor networks in Sweden, showing that several factors influence the spatial affinity of regions. In particular, they show that spatial affinity extends beyond a region if it has less own R&D-related resources, it is close to the other region and it is relatively small.

Similar to our work, Sternitzke et al. (2008) analyze cooperation networks between inventors and applicants—in the field of optoelectronics—emphasizing bibliometric measures and network theoretical ones. Furthermore, they investigate “boundary spanner” inventors and, in a general way, the role played by applicant position within citation networks.

Another notable work using SNA was carried out by Cammarano et al. (2017). They focused on bio-pharmaceutical companies and point at a relationship between network structure and innovation strategiesFootnote 3 In particular, their research finds a rationale behind R&D international collaborations with SNA validating the technique for innovation studies.

Wang et al. (2014) use patents data to show how inventor’s network structure affects researchers’ exploratory innovation. In particular, their study reveals a U shaped relationship between the average degree centrality of inventors and their exploratory attitude.

Building upon Wang et al. (2014), Tseng et al. (2016) focus on semiconductor patents data to investigate the relationship between inventors’ network properties and their innovation capabilities. They show that firms that are more central in the network and have more linkages within the network are more likely to generate new knowledge.

Data, research questions and methodology

In this section, we present our approach, along with its support data structures and metrics, showing how it can address several questions related to patents and inventors and how it can ultimately provide an understanding of patterns about the geography of innovation.

An output of a EU-funded project (Id: 506022 under FP6-2002-CITIZENS-2 call) called Knowledge-Based Entrepreneurship: Innovation, Networks and Systems (KEINS) leaded by the Invernizzi Center for Research on Innovation, Organization, Strategy and Entrepreneurship at Bocconi University (ICRIOS) was the EP-INV database produced by ICRIOS which implemented the so called Massacrator algorithm (Pezzoni et al. 2014), containing all EPO applications, reclassified by applicant and inventor; and from three lists of university professors of all ranks (from assistant to full professors), one for each countries (PROFLISTs). Academic inventors have been identified by matching names+surnames of inventors in the EP-INV database with those in the PROFLISTs and by checking through the e-mail and phone the identity of the matches in order to exclude homonyms. This data-base was largely used world-wide, and in particular the methodology used to build a database on academic inventors from France, Italy, and Sweden (1978–2004), which was delivered to the European Commission as part of the KEINS project (Knowledge-Based Entrepreneurship: Innovation, Networks and Systems) and which will provide the basis for future publications was described in Lissoni et al. (2006a, b). It provides an overview of the database contents, as well as information on access rules and on related datasets by CESPRI-Università Bocconi. The database is the result of joint efforts by CESPRI—Bocconi University (IT) and other European universities; in Pezzoni et al. following Raffo and Lhuillery (2009) the authors described disambiguation as a three step process: cleaning & parsing, matching, and filtering. Based on Monte-Carlo simulations, by means of sensitivity analysis, we show how various filtering criteria can be manipulated in order to obtain optimal combinations of precision. They also show how these different combinations generate different results for applications to studies on inventors’ productivity, mobility, and networking and discuss quality issues related to linguistics. The criteria are sensitive to data quality, while those based upon co-inventorship networks are always effective. Details on data access and data quality improvement via feedback collection are also discussed. In Lissoni et al. (2008) the authors described networks of inventors and the position of academic inventors in France; in particular, they built upon Balconi’s methodology in order to map the networks of inventors in France. They also exploited the French section of the KEINS database complemented with the novel dataset on CNRS inventors collected following the KEINS methodology.

Patents have been one of the main topics investigated in several fields,Footnote 4 as they provide a wealth of useful information on the dynamics of innovative processes, on the state of art of the Research & Development (R&D) in different sectors and its main characters, i.e. researchers and inventors (Balconi et al. 2002; Breschi and Lissoni 2009; Guellec and Pottelsberghe de la Potterie 2001; Hingley and Bas 2009; Hsueh and Wang 2009; Singh 2008).Footnote 5 As a matter of fact, patent submission is usually the first public claim of a new invention that could become an innovation (Garcia and Calantone 2002).

The data used in this research comes from the PATSTAT-ICRIOS database. Hence, we have access to a large amount of information: the priority, the number of applications and publications and the related dates, the title and abstract, the designated states for protection, the status of application, the main and secondary International Patent Classification (IPC) codes, the applicants’ and inventors’ names and addresses, and the references (citations) to prior art patents and to non-patent literature.Footnote 6 As Lissoni (2012), we use data regarding inventors rather than applicants.

Thanks to the wealth of details about patents and inventors present in our dataset we can provide a first, intuitive answer to several questions:

  1. (a)

    What is the degree of cooperation among inventors coming from different countries?

  2. (b)

    Does it evolve over time?

  3. (c)

    Are there groups of inventors that cooperate intensively with each other?

  4. (d)

    Do countries maintain research collaborations with specific other countries?

  5. (e)

    Are some inventors more “important” than others?

  6. (f)

    Which is the distribution of patents against inventors and IPC classes?

To address these issues we consider a social network whose nodes and edges represent respectively inventors and co-inventorship relations. We use the universe of data, including all inventors filing for patent applications between 1980 and 2010 in all technology fields represented by the 2-digits technology classification according to WIPO (2011).Footnote 7 On top of that we define other support data structures and accompanying metrics—both establishing new ones and adapting those already existing to make them suitable for this type of analysis.

For instance, among those structures we introduce and characterize the concept of “power inventor”, which aims at capturing those inventors that display a high impact in terms of both number of patents and number of collaborations—including at the international level—and hence have the potential to favor further developments and improvements of R&D activities within their countries. As another example we define a new metric to quantitatively understand how much inventors tend to cooperate—intensively—within compact groups.

In the rest of the paper we introduce several metrics and additional data structures in the additional material (more formally characterized in Appendix A), addressing different questions about inventors and patents and coming up with a series of statistical facts about knowledge patterns among the countries in our dataset.

Distribution of patents and inventors’ network

Here we provide a characterization of patents and inventors present in our dataset, showing how patents are distributed among inventors for a large number of countries and how the inventors’ network has evolved over time.

Distribution of patents among inventors

Given a node \(n_{i}\) in a social network G where the nodes correspond to inventors,Footnote 8 we define the metric \(M_{1}\) such that \(M_{1i}\) denotes how many patents were filed by inventor \(n_{i}\); this metric coincides with the classical weighted degree centrality (Hanneman and Riddle 2005). We measure \(M_{1}\) for all countries in the European Union, all Mediterranean and North African countries, BRICS (Brazil, Russia, India, China, South Africa), South Korea, Japan, Vietnam and Taiwan.

For most of these countries \(M_{1}\) follows a power law distribution, implying that there is a low number of inventors filing a high number of patents and large number of inventors filing a very low number of patents. Some other countries—i.e., Greece, Croatia, Principate of Monaco, Slovenia, Turkey and the BRICS—show a slightly disturbed power law distribution, as the highest number of inventors and filed patents is quite low. Other countries—i.e. Egypt, Lebanon, Malta, Morocco and Tunisia—present a disturbed power law distribution. Again, this is due to the low number of inventors and filed patents in the corresponding countries. Finally, for some countries—i.e. Albania, Algeria, Libya and Montenegro—the distribution of \(M_{1}\) is totally different from a power law. In some cases, even a linear distribution can be observed, as in the case of Algeria.Footnote 9

Figure 1 displays the distribution of \(M_{1}\) for France, Greece, Egypt and Algeria, each one representing the distribution of the different aforementioned categories of countries.

Fig. 1
figure 1figure 1

Distribution of \(M_{1}\) for selected countries

Evolution of inventors’ networks over time

With the supports of social network \(G_{k}\),Footnote 10 we now present the temporal evolution of the number of nodes \(\left| {N_{k} } \right|\), which is the set of inventors of a given country \(k\), the number of edges \(\left| {E_{k} } \right|\), which corresponds to collaborations established by inventors in country \(k\), and the densityFootnote 11 \(D_{k}\).

Generally, all countries into consideration present a growing number of nodes and edges over the years. Some countries—e.g. those of BRICS, South Korea, Taiwan, Turkey and Israel—present an exponential growth. The majority of EU countries instead show an increasing linear trend, although there are countries that with the North African ones show a growing but irregular trend over time.

As an example, Fig. 2 displays the temporal evolution of these variables for China and for Italy. In case of China notice that for both nodes and edges the rate of growth starts growing exponentially in 2002, whereas in case of Italy we can see that both nodes and edges show an increasing trend until 2008 and that thereafter both numbers have substantially remained constant.

Fig. 2
figure 2

Trends of \(|N_{k} |\), \(\left| {E_{k} } \right|\) and \(D_{k}\) for selected countries (China and Italy)

As for density \(D_{k}\), it generally decreases, as the density of a network is inversely proportional to the square of the number of nodes. To obtain a constant trend against time it would be necessary that the number of edges grows proportionally to the square of the number of nodes, which is unthinkable in real scenarios.

In Fig. 2 we also show the trend of \(D_{k}\) for China and Italy. Observe that the decrease is more pronounced in China than in Italy: this is due to the fact that the increase of the number of nodes is exponential for China and linear for Italy.

Cooperation patterns among inventors

We now address the issue of cooperation among inventors. As a first step into our investigation, we define a new metric \(M_{2}\), such that \(M_{2i}\) indicates the dimension of the neighborhood of \(n_{i}\).

This is useful to understand how much the inventors of a given country \(k\) tend to cooperate for filing patents. We measure \(M_{2}\) for all Mediterranean and North African countries, the BRICS and some EU member states. For most of them it follows a quite disturbed power law distribution.

For instance, in Fig. 3 we show the distribution of \(M_{2}\) for Brazil that presents a peak in the value of \(M_{2}\) between 0 and 10. The only country presenting a perfect power law distribution is Austria, as shown in the same Figure.

Fig. 3
figure 3

Distribution of \(M_{2}\) for Brazil and Austria

Cliques of inventors

To understand the dynamics of inventorship relations, and hence how cooperation occurs among inventors, we resort to the concept of clique. In SNA clique corresponds to a sub-graph where every node is adjacent to all the others. In our scenario it indicates the presence of a compact group of inventors who cooperate intensively with each other. We compute the distribution of the cliques’ dimension for the social network \(G_{k}\).

Since in a clique all nodes are totally connected to each other its dimension can be considered a valid metric to quantify inventors’ tendency, in a given country, to form more or less large working groups. The general trend we find for this phenomenon is the one of a power law distribution, where the dimension of the largest clique being different from country to country.

For instance, in Fig. 4 we report the distribution of cliques for Japan and United Kingdom. In both cases we can observe a power law distribution, with a different maximum number of cliques: although Japan has far more nodes than the UK the dimension of its largest clique is lower than the one of UK. A case of particular interest is represented by Israel (Fig. 4), which presents a maximum dimension of cliques equal to 7. This is a very high value considering that the number of Israelis inventors (30,358) is much lower than the one of Japan (924,554) and the UK (231,128).

Fig. 4
figure 4

Distribution of clique size for selected countries

To capture and quantify this last observation we define a parameter that indicates how much the inventors of a country k are aggregated in cliques. The previous result suggests that the difficulty to have larger and larger cliques grows exponentially. Hence, to define a corresponding index, we deem appropriate to consider only the cliques of maximum, sub-maximum and sub–sub-maximum dimension, as well as assigning an exponentially decreasing weight to these cliques.

Specifically, the aggregation index \(Agg_{k}\) of inventors on cliques for country \(k\) is defined as:

$$Agg_{k} = \frac{1}{{\left| {N_{k} } \right|}} \cdot \mathop \sum \limits_{l = 0}^{2} 2^{{\left( {\left| {C_{k} } \right| - l} \right)}} \cdot \nu_{{\left( {\left| {C_{k} } \right| - l} \right)}}$$

where \(C_{k}\) indicates the largest clique of country \(k\), \(|C_{k} |\) denotes its dimension and \(v_{x}\) is the number of cliques having dimension \(x\).

In Table 1, we report the dimensions of the maximum, sub-maximum and sub–sub-maximum cliques for several (selected) countries, along with the corresponding aggregation index. We have chosen to split the table in three parts, grouping countries that share a similar number of inventors: the first includes countries with 35,000 inventors, the second those having a number of inventors between 40,000 and 65,000, and the third countries having more than 110,000 inventors.

Table 1 Values of \(Agg_{k}\) for several countries

In the first group of countries, Taiwan has the highest Aggregation Index, since, despite displaying a number of nodes quite low compared to the other countries in the same group, it has a high number of cliques of maximum, sub-maximum and sub–sub-maximum dimension. In the same group, India has the lowest value of \(Agg_{k}\): notwithstanding the high number of cliques of sub-maximum dimension, there is only one clique of maximum dimension and no clique of sub–sub-maximum dimension.

Analogous reasoning can be made for the second group of countries, where the highest lowest value of \(Agg_{k}\) belongs to Spain and the lowest to Belgium. As for the third group reported in Table 1, the values of \(Agg_{k}\) are generally small. The highest value is observed for South Korea that, in spite of having less inventors than the UK and Japan, display a dimension of maximum clique equal to the one of UK and even higher than the one of Japan.

Foreign collaborations

We are now interested in the international dimension of inventorship relationships. We start looking at the level of cooperation with foreign colleagues for inventors of a given country \(k\).

For this purpose, we define a metric \(M_{3}\) such that \(M_{3k}\) represents the average fraction of foreign collaborations established by investors of k. \(M_{3}\) ranges between 0 and 1 and allows us to understand how much the inventors of \(k\) tend to work with foreign colleagues: the higher \(M_{3}\) the higher the tendency to establish scientific collaborations abroad.

We compute \(M_{3}\) over a large period of time for several countries to understand their degree of internationalization and how it has evolved over time. We consider some EU member states, some countries of BRICS, some North African countries, as well as South Korea and Taiwan.

Findings are heterogeneous. Some countries—such as Spain, Taiwan, South Korea, China and Brazil—present a generally decreasing trend. Others—such as Austria, Italy and South Africa—display an increasing trend for this measure. There are also countries that show quite an irregular trend for \(M_{3}\), characterized by the presence of peaks and decays over time.

In Fig. 5 we report the trend of \(M_{3}\) for South Korea, Austria and Romania. We can observe that, while South Korean inventors have cooperated more and more with each other within South Korea over the last years, Austrian inventors have grown their number of foreign collaborations, becoming more internationalized. In case of Romania, we can see that no clear trend emerges.

Fig. 5
figure 5

Temporal evolution of M3 for selected countries

After the characterization of inventors’ tendency to cooperate with foreign colleagues, we now aim at understanding the states a given country mostly cooperates with—as far as patents are concerned.

Given a country \(k\), for each node \(n_{i} \in N_{k}\), we consider the set \(nbh_{i}^{F}\), which is the set of direct neighbours of \(n_{i}\) that belong to a country other than the one of \(n_{i}\), and then we compute—for all Mediterranean and North African countries, BRICS and for some EU member states—the distribution of countries to which the nodes associated with these neighborhoods belong to. In particular, we focus our attention on some past colonies. In Fig. 6, we show the results obtained for some former French colonies—e.g. Algeria, Morocco and Tunisia—and for a former British colony—e.g., Egypt.

Fig. 6
figure 6

Distribution of foreign collaborations for Algerian, Moroccan, Tunisia and Egyptian inventors

As one would expect, a large fraction of inventors in Algeria, Morocco and Tunisia mainly cooperate with French inventors. However, for both Morocco and Tunisia, there is a notable degree of cooperation with Germany as well; on the other hand, Algeria does not show significant inventorship relations with any other country. Looking at Egypt, which is a British past colony, we do not observe such high degree of cooperation with its former motherland, the United Kingdom. In fact, the highest number of collaborations can be found with inventors of United States, Germany and France. Interestingly, there is a good contribution with inventors coming from Saudi Arabia.Footnote 12

In Fig. 7, we show the results obtained for Israel, Austria, Slovenia and Taiwan. In case of Israel, we can observe a strong cooperation with US inventors and with German ones as well. Germany is also an important source of scientific collaborations also for Austria and Slovenia.

Fig. 7
figure 7figure 7

Distribution of foreign collaborations for selected countries

In addition, for Slovenian inventors we can observe a high concentration of links with Austrian ones—as evidence of the very strong links between Slovenia and German-speaking countries dating back to the Austro-Hungarian Empire period—and, to a lesser extent, with Croatian inventors. The same figure displays also the foreign collaborations entertained by Taiwanese inventors. Somehow unexpectedly, given the controversial political relations between the two countries, Taiwan has a high degree of scientific cooperation with China.

To gain additional insights on the international collaborations existing among countries, we compute the variety level of the countries inventors of a country \(k\) cooperating with. Drawing on the measure of biodiversity introduced by Simpson (1949), we build an indicator of the internationalization level of inventor teams, which we exploit to conduct an explorative empirical analysis of the trends and features in research groups’ internationalization level by using cross-patent data from the Worldwide Patent Statistical Database.

It also allows us to measure the size of firms in relation to the industry and the degree of competition among them. This is known as the HI, and in this context tells us if inventors of a given country privilege collaborations with inventors of one or more foreign countries. The higher the \({\text{HI}}_{k}\) of a country \(k\), the more concentrated the external collaborations of \(k\).

Table 2 reports the value of the Herfindahl Index (HI) for several countries. Among the North African countries, the highest value of \({\text{HI}}\) is obtained by Algeria. This result is due to the fact that Algerian inventors cooperate mainly with inventors of one country, France. An analogous observation can be drawn for Tunisia and Morocco.

Table 2 Values of \({\text{HI}}\), \({\text{HI}}\) Top 80% and \({\text{HI}}^{*}\) Top 80% for North African countries

Differently from these three countries, Egypt has a much lower value of \({\text{HI}}\), because the collaborations of Egyptian inventors are more distributed among several countries. An analogous reasoning can be drawn for EU countries—where the highest value of \({\text{HI}}\) is obtained for Austria.

As for the BRICS, the value of \({\text{HI}}\) is generally small, since Brazil, South Africa, China and India tend to cooperate with several countries. High values of \({\text{HI}}\) can be observed also for Taiwan, Israel and Turkey.

However, it should be noted that there is one caveat that applies to the HI as we defined it: its value could be strongly distorted by the presence of a large number of extemporaneous collaborations between an inventor of country \(k\) and a foreign one who cooperated for one or two patents only.

Considering the power law trend, usual for the measures in our reference scenario, we use a modified version of the HI, limiting the countries into consideration to the top 80%. In this way the main cause of the distortion—the tail of the power law distribution—is removed. The results obtained with this new measure of the HI are reported in the third column of Table 2.

Power inventors

Before characterizing the notion of “power inventor”, we define another metric, namely \(M_{4}\), which is analogous to \(M_{2}\) with the difference that it considers only the neighborhood of foreign inventors rather than the whole neighborhood,Footnote 13 since we are now interested in collaborations with foreign countries.

We measure \(M_{4}\) for Euro-Mediterranean and North African countries, and the BRICS. For many of them \(M_{4}\) follows a power law distribution even if some, such as Greece, Turkey and Principate of Monaco, present a slightly disturbed power law distribution.

For other countries, such as Morocco, Slovenia, Tunisia, Croatia, Cyprus and Egypt, \(M_{4}\) follows a more disturbed power law distribution.

Finally, for other countries such as Syria, Albania, Algeria, Bosnia-Herzegovina, Lebanon, Libya and Malta, \(M_{4}\) does not follow a power law distribution.Footnote 14 Figure 8 displays all the different cases. Hence, \(M_{1}\), \(M_{2}\) and \(M_{4}\) generally follow a (potentially disturbed) power law distribution.

Fig. 8
figure 8figure 8

Distribution of M4 for selected countries

We are now ready to introduce the concept of “power inventor”, which indicates an inventor with the following characteristics:

  • \(C1\): filing of many patents;

  • \(C2\): many collaborations;

  • \(C3\): having an international stature, which implies that she cooperates a lot with inventors of foreign countries.

To evaluate these conditions, we can use, respectively, \(M_{1}\), \(M_{2}\) and \(M_{4}\).Footnote 15

Then, the set \(P_{k}^{X}\) of power inventors of a country \(k\) is the set of those inventors simultaneously belonging to the top \(X\%\) percentile of the distribution of \(M_{1}\), \(M_{2}\) and \(M_{4}\) in each country \(k\).

Since available data are huge, and since the power law distributions characterizing \(M_{1}\), \(M_{2}\) and \(M_{4}\) are generally steep, we choose a low value for \(X\) and we set \(X = 5\).Footnote 16

With regard to this definition of power inventor we would like to point out that our aim is not proposing a new concept characterized by a precise supporting mathematical foundation.

Instead, we would like to introduce an informal and empirical, yet reasonable, concept that can capture the presence of inventors more important than others along several dimensions and that can support the extraction of knowledge patterns about this phenomenon.

The notion is similar to the one of “star scientist” employed in several studies that analyze the dynamics underlying the generation of new ideas and the value that some elite scientists and researchers bring to these processes, as well as their impact on other researchers.Footnote 17

The idea behind the concept of power inventor is that of an inventor that has a notable scientific productivity (demonstrated through the filing of many patents) that can catalyze the growth of innovative activities and help the internationalization process of his own country.

Hence, all the aforementioned criteria are necessary in order to be considered a power inventor. On the one hand, if an inventor filed very few patents, even if all of them were in cooperation with foreign inventors, he could not have enough weight to influence the R&D scenario of his country.

If an inventor filed many patents, but all of them were in cooperation with few inventors, she would not manage to stimulate through collaborations other inventors to file patents.

Finally, if an inventor filed many patents, but all of them were in cooperation with inventors of his country, he would certainly play a prominent role in his country but he would not have the capability to stimulate contacts with foreign countries.

Characteristics of power inventors’ networks

After having established what power inventors are, we are now interested in understanding the features of the networks they operate within.

We construct another support data structure, the power inventor’s social network, and two related metrics \(rAgg_{k}\) and \(rf_{k}\), which are useful in that they allow us to understand whether power inventors tend to cooperate more or less intensively among each other compared to other inventors in the same country, i.e. whether there is a backbone of power inventors or not in a given country.Footnote 18 Values for the two metrics are reported in Table 3.

Table 3 Values of \(rAggk\) and \(rfk\) for selected countries

Looking at Table 3 we can see that, for countries like Austria and Italy, power inventors are much more aggregated in cliques than all other inventors—they display the highest value of \(rAgg_{k}\). As for \(rf_{k}\), the values are quite homogeneous across countries and range between 3 and 7.

Since both \(rAgg_{k}\) and \(rf_{k}\) present values higher than 1, we can conclude that there exists a backbone of power inventors in each of these countries.

To expand our inquiry about this issue, we consider a new data structure, the clique social network. Footnote 19 In Fig. 9, we plot the clique social network of Spain and Israel. We can see in case of Spain the network is characterized by a particular dense core implying that in this country there is a group of particularly active inventors who often cooperate with each other. In case of Israel there are instead several nodes whose dimension is generally smaller than those belonging to the core of Spain.

Fig. 9
figure 9

Visualization of the clique social network (left) and a zoomed portion of it (right) for two countries

In Table 4 we report some characteristics of clique social networks, namely the number of nodes and edges, and density.

Table 4 Number of nodes, number of edges and density of the clique social networks of some countries

Looking at it, we can see that, for instance, France and Italy have the lowest values for density. The reason is that both these countries have many nodes that are only partially connected to each other. By contrast, Brazil has a very low number of nodes and edges but, at the same time, the highest density among the countries under consideration, implying that Brazilian power inventors are strongly connected to each other.

The impact of power inventors

Underlying the notion of power inventor there is the idea that these inventors have the ability of stimulate further inventions and improve also the scientific production of their colleagues. To quantitatively assess this intuition, we compute two different measures of scientific activity, namely the average number of patents filed by the neighbors of power inventors of a country \(k\), denoted with \(AvgPatNumNbh_{k}^{P}\), and the average number of patents filed by neighbors of generic inventors in the same country, denoted with \(AvgPatNumNbh_{k}\). The values of these metrics are reported in Table 5.

Table 5 Average number of patents of the neighbours of a power inventor, of a generic inventor and values of the parameter \(rPatNumNbh\)

A comparison between the two sets of values can support us in quantifying the benefit induced by cooperating with a power inventor, since high values of \(AvgPatNumNbh_{k}^{\rm P}\) relative to \(AvgPatNumNbh_{k}\) indicate that being in the neighbourhood of a power inventor stimulates patent filings.

From the Table it is evident, as we would expect, that being in the neighborhood of a power inventor is associated with an increase of filed patents. To sharpen our understanding we also define the following ratio:

$$rPatNumNbh_{k} = \frac{{AvgPatNumNbh_{k}^{\rm P} }}{{AvgPatNumNbh_{k} }}$$

A value of \(rPatNumNbh_{k}\) higher than 1 indicates that belonging to the neighbourhood of a power inventor is beneficial for filing patents. These values are reported in Table 5, and we can observe that values for this ratio are greater than 1 for all countries, in some cases even greater than 3—Brazil and Spain. Thus, we can conclude that cooperating with power inventors is beneficial for other inventors.

We now focus on the dimension of the neighborhood of power inventors. We already know, by definition, that a power inventor has a lot of scientific collaborations, but we would like to quantify its magnitude. We exploit the metrics \(AvgDimNbh_{k}^{\rm P}\), which is the average dimension of the neighbourhood of a power inventor, and \(AvgDimNbh_{k}\), which is the average dimension of the neighbourhood of a generic inventor. A comparison between these two measures can tell us about the importance and centrality of a power inventor. We compute their variation over time for the same countries as in Table 3. Figure 10 displays, as an example, the time series of \(AvgDimNbh_{k}^{\rm P}\) and \(AvgDimNbh_{k}\) for Spain. Note that the former measure remained almost constant from 1990 to 2003 and, then, had an increase, whereas the latter show an increasing trend.

Fig. 10
figure 10

Temporal evolution of the average neighbourhood dimension for Spain

Finally, we also report the values of the following ratio:

$$rDimNbh_{k} = \frac{{AvgDimNbh_{k}^{\rm P} }}{{AvgDimNbh_{k} }}$$

When \(rDimNbh_{k}\) is higher than 1, the average dimension of neighbourhoods of power inventors is higher than the corresponding one of generic nodes. In Table 6 we report the values of \(rDimNbh_{k}\) for several countries in the year 2013, and we can observe that the values obtained for all the countries into consideration are always higher than 1 and range from 1.597 (for Brazil) to 2.870 (for Taiwan).

Table 6 Values of \(rDimNbh\) for several countries in the year 2013

Discussion

With respect to the existing literature, our study allows to characterize inventors’ relationships from a quantitative standpoint and understand the cooperation patterns within inventors’ networks in an innovative way. To this end, we have developed several new metrics and introduced new concepts, such as the notion of “power inventor”, the aggregation coefficient as well as the parameters based on the modified HI for the computation of both the variety of external collaborations entertained by a given country and of the variability of IPC classes.

We have also redefined concepts already existing in SNA, such as the “internal” and “external neighborhood” of a node or the “clique” of nodes, adapting them so that they can be useful to gain insights on research collaborations and their dynamics. As in other studies, we exploit SNA to analyze relationships among inventors within and across countries. While we are not the first to employ an approach based on SNA, the originality of our contribution consists in presenting a general methodology for the extraction of several knowledge patterns with a reference to the geography of innovation. The suggested approach can then be applied to any country or to any sector of interest.

Concluding remarks

The goal of this research was to provide an overall understanding of the dynamics concerning knowledge flows in inventive activities. We show how the network of inventors is on average increasing in size, more and more inventors are in fact contributing to technology innovations and they are more connected to each other. We also show to what extent inventors from different countries tend to cooperate with their local peers or internationally. Furthermore, an analysis of the clustering of inventors is carried out to show differences across countries in the structure of inventors’ communities, with a particular focus on the dynamics of collaboration for power inventors (i.e. star inventors).

From a methodological standpoint, new and alternative approaches to the extraction of knowledge patterns about the geography of innovation based on a SNA of patent data are proposed. These approaches are extremely general and can be exploited to analyze patents, inventors and their relationships within and among countries. First, an indicator of the internationalization proposed by Simpson (1949) to measure biodiversity is applied here to investigate trends and features in inventive teams’ internationalization. Second, an index that measures the dimension of the neighborhood of power inventors is proposed to see how power inventors differentiate themselves from a generic inventor. Third, the benefits of cooperating with a power inventor are quantified with an appropriate index that highlights the expected increase in productivity that such a collaboration would mean for inventors in different countries. Finally, a parameter that indicates how much the inventors of a country are aggregated in cliques is proposed.

This research has implications for policy makers and R&D managers. Firstly, it provides an understanding of how inventive activities are carried out in different countries in terms of openness to cooperation. Secondly, the dynamics of knowledge flows for power inventors are distinguished from the dynamics of knowledge flows when a power inventor is not involved. This provides insights on how to nurture national innovation systems that can help generate more knowledge. Finally, insights on innovation cliques in different countries have implications for new market entry strategies definition and R&D internationalization endeavors organization.

It is important to stress that our work is aimed at providing a first, intuitive characterization of some patterns related to innovations and inventors’ collaborations dynamics, since no causality statements can be made on the basis of our analysis; yet, many facts can be deduced from the data we have. Furthermore, even if the quality of the dataset used reduces the concerns about disambiguation of inventors, still different optimization algorithms may produce different results, a further effort carried out proved those results not to be significantly different from the ones presented here. More rigorous analyses and more precise answers will be object of future research, as well as providing a taxonomy of all the countries in our database depending on nodes and cliques, but this requires more time due to the vast amount of data to be analyzed. However, this work represents a promising starting point.

Deepening our understanding of power inventors could be a promising line of research, given their importance within the innovation system and the inventors’ network of a given country. For instance, it would be possible to construct the profiles of power inventors of a given country using some classification methodology, as well as applying techniques of information diffusion to understand how much the mobility from one country to another of power inventors can impact on the filing of new patents in these two countries. Moreover, it would be interesting to inquire on the main causes that lead power inventors to leave their homeland, or the reasons that induce power inventors to file patents in another country.

Answering these questions could give important strategic suggestions to policymakers for the definition of strategic actions aimed at retaining inventors in their country and to favor the arrival of new ones from abroad.

In conclusion, forthcoming research will focus on how the structure of innovation networks is related to innovation performance and internationalization strategies. This would help to set guidelines for policymakers that want to better understand how to nurture national and regional systems that take advantage of global value chains.