Introduction

Patents are framed in different contexts: in addition to being among the outputs of the production system of knowledge, patents can also serve as input to the economic process of innovation. Furthermore, intellectual property in patents is legally regulated, for example, in national patent offices (e.g., Granstrand 1999). Patents reflect these different contexts in terms of attributes: names and addresses of inventors and assignees provide information about the locations of inventions; patent classifications and claims within the patents can be used to map technological developments; citations provide measures of impact and value, etc. (e.g., Hall et al. 2002; Porter and Cunningham 2005). Can patent analysis and patent maps provide us with an analytical lens for studying the complex dynamics of technological innovations? (e.g., Jaffe and Trajtenberg 2002; Balconi et al. 2004; Feldman and Audretsch 1999; Mowery et al. 2001).

In this study, we argue that a further development of methodologies is required more than of theories when one understands technologies as complex adaptive systems. The various contexts provide different selection environments that are further explored with the development of the technology. The diffusion of a new technology in different dimensions may vary in terms of the rate and the directions.

In the case of small interference RNA (siRNA), for example, we found in a previous study (Leydesdorff and Rafols 2011) that the initial discovery was academic and published in Nature (Fire et al. 1998). After a few years, however, the centers of preferential attachment shifted from the academic inventors to institutional centers of excellence in metropolitan areas such as London, Boston, and Seoul. A spin-off company (Alnylam) was created by MIT and the Max Planck Society (in 2002) in order to secure the revenues of a number of patents. However, economic exploitation of the technology as a reagent became more attractive than as a diagnostic tool when the transition from in vitro to in vivo encountered problems (Lundin 2011). Accordingly, the center of patenting shifted to Denver, Colorado during the 2000s (Leydesdorff and Bornmann 2012). In the meantime, the academic research front shifted focus from “small interference RNA” to “micro interference RNA” (Rotolo et al. in preparation).

The example illustrates that in order to appreciate the complexity of innovation processes and understand the emerging and evolving patterns, one needs instruments to study the different dimensions and the interactions among them over time and in relation to one another. In this study, we build on the recent development of geographical maps of patents and maps in terms of patent classes as different projections (Leydesdorff and Bornmann 2012; Leydesdorff et al. 2012). We extend the static maps with a methodology to study the evolution of inventions over time in the different dimensions. For example, using the proposed methodology one can overlay the networks of co-inventors on a Google Map or analyze these networks using measures from social network analysis (Breschi and Lissoni 2004). Different dimensions and dynamics can thus be distinguished and then related. Can the co-evolution in different dynamics be grasped in order to show two or more dynamics in parallel by using split screens?

Several teams have generated patent maps and overlays for patent classes (Kay et al. in press; Schoen et al. 2012). However, our main objective is to make these overlays interactive so that one can use them as versatile instruments across samples gathered for different reasons. In our opinion, one must be able to change the focus in order to capture the resulting dynamics. In summary, we add to the previous mappings and overlays: (1) the dynamics by using time series, (2) the social networks, and (3) options to consider more than a single dynamics concurrently—but not necessarily synchronously—using split screens (Leydesdorff and Ahrweiler in press).

As a case, we focus on a specific material technology for photovoltaic cells (CuInSe2), but our aim is to demonstrate the methodology and further develop the overlay techniques for sequential years into animations. Accompanying websites provide instructions for using the instruments for other sets.Footnote 1

Patent data

Despite the well-known limitations (e.g., Archibugi and Pianta 1996; OECD 2009), patents can be used for analyzing patterns of invention along the dimensions of locations, technology classes, and organizations. The freely accessible interface of the United States Patent and Trade Office (USPTO) allows us to download sets of patents in batch jobs on the basis of composed search strings, and additionally to track their citation rates. An SQL-script was furthermore developed that enables the user to draw patents similarly from the Worldwide Patent Statistical Database (PatStat) of the European Patent Office (EPO).

The PatStat database includes patents of more than 80 patent offices worldwide (including USPTO, EPO, and the Japanese Patent Office), but access to this database requires institutional subscription. The expectation is that PatStat, because of its broad coverage in terms of patent offices, can inform us about networks at national or regional levels that may be coupled to developments in USPTO to varying extents. The US market provides a highly competitive environment, whereas technologies can also be further developed in niche markets. The latter may be more visible in PatStat data than USPTO data.

Cooperative patent classifications of “photovoltaic cells”

On the 1st of January 2013, USPTO and EPO introduced a new system of Cooperative Patent Classifications (CPC)Footnote 2 that unlike existing patent classifications (such as International Patent Classifications IPC, and its American or European equivalents), can also be indexed with a focus on emerging technologies using specific tags in the new Y-class (Scheu et al. 2006; Veefkind et al. 2012). Whereas the previous classification systems have grown historically with the institutions, and combine patents that cover product and process innovations at different scales, the classification in terms of CPC adds technological classes from the perspective of hindsight under the category “Y”.

EPO first experimented with the class Y01 as an additional tag for nanotechnology patents (Scheu et al. 2006), while USPTO tried to accommodate nanotechnology into a subclass 977 of its existing classification system. “Y01” was subsequently integrated into IPC v8 as class B82. More recently, a new CPC tag for emerging technologies was developed as Y02: “Climate Change Mitigating Technologies.” In the meantime, these new classifications have been backtracked into the existing databases for indexing.Footnote 3 The tag and its subclasses are now operational in both USPTOFootnote 4 and PatStat data.Footnote 5

More than 150,000 patents are tagged with Y02 in USPTO, among which 5,021 US patents with the search string cpc/y02e10/54$ for material technologies in photovoltaic (PV) cells (cf. Peters et al. 2012; Shibata et al. 2010). In terms of CPC, these technologies are further subdivided into nine specific technologies as shown in Table 1.Footnote 6

Table 1 Nine material technologies for photovoltaic cells distinguished in the Cooperative Patent Classifications (CPC)

We focus in this study on developing the relevant instruments using the first subclass Y02E10/541 that covers “CuInSe2 material PV Cells.” CuInSe2 is used in thin-film solar cells; thin-film solar cells are an emerging technology and are expected to be a dominant photovoltaic (PV) technology in the future (Unold and Kaufmans 2012). Although this technology has only a small share of the market, it continues to attract most of the funding for R&D among the material technologies for photovoltaic cells (ibid., p. 12).

We retrieved 419 patents at USPTO (on August 20, 2013) and 3,428 patents in PatStat (using the version of April 2013) with the CPC “Y02E10/541”.Footnote 7 Figure 1 provides the trends.

Fig. 1
figure 1

Development of patenting in USPTO and PatStat under the CPC tag Y02E10/541 for “CuInSe2 material PV cells”, 1975–2010

The attribution of this class in PatStat (right vertical axis) is an order of magnitude larger than in USPTO (on the left vertical axis). This difference accords with the expectation specified above: PatStat data contain duplicates from different patent offices. One can use priority patents to prevent this (de Rassenfosse et al. 2013), but we use PatStat data in addition to USPTO data also for studying the geographical diffusion in markets other than the U.S. (Heimeriks et al. in preparation). In other words, all patents at all offices are counted in the PatStat analysis, leading to double-counts, i.e., the actual number of different priority patents is smaller.

Methods

In this section, we discuss the routines and provide instruction on how to use the software that is freely available online for generating geographic maps (“Geographic maps”) and classification maps (“Classification maps”).

Existing routines for overlaying patent data to Google Maps (Leydesdorff and Bornmann 2012) and a map based on aggregated citations among IPC (Leydesdorff et al. 2012) were initially further developed for the purpose of dynamic mapping. The resulting routines are available at http://www.leydesdorff.net/software/patentmaps/dynamic for USPTO data and at http://www.leydesdorff.net/software/patstat for PatStat data. (These webpages also provide instructions about how to generate the various files.) The USPTO interface is accessed online by the routines, while the PatStat data have to be exported from a local installation of the database by using the dedicated scripts provided in SQL. The interface with USPTO additionally allows downloading the forward citations.

Unlike USPTO data, forward citation information in PatStat data is not uniformly standardized because references are provided by different patent offices. Considering citations from different offices raises questions about bias, as (at least part of) the citation could be due to differences in office practices and regulations, rather than to the quality and relevance of the patents considered (Criscuolo 2006; Squicciarini et al. 2013, p. 8). Colors indicating citation counts above or below expected citation rates are therefore only provided when mapping USPTO data. As specified more extensively in Leydesdorff and Bornmann (2012), the proportion of top-cited patents in a sample of USPTO data can be (z-)tested for each location against the expectation, but only in the case of more than five patents at a city-location. As in the previous study, we test against the expectation that 25 % of the patents at a location, ceteris paribus, can be expected to belong to the top-25 % most-highly cited of the set.

Using colors similar to those of traffic lights, cities with (USPTO) patent portfolios significantly below expectation in terms of citedness are colored dark-red and cities with portfolios significantly above expectation dark-green. Lighter colors (lime-green and red–orange) are used for cities with an expected number of patents smaller than five (which should not statistically be tested) and for non-significant scores above or below expectation (light-green and orange).Footnote 8 (See at http://www.leydesdorff.net/photovoltaic/cuinse2/cuinse2_inventors.htm for the aggregated set.) The precise values are provided in the descriptors which can be accessed by clicking on the respective nodes. Additionally, all numerical values are stored in the file “geo.dbf” for statistical analysis.Footnote 9

Data from PatStat are not z-tested in terms of citation rates, but rated in terms of percentiles of the patent distributions. Using a different color scheme [that is, the same colors as used by Bornmann et al. (2011)], the top-1 % cities are in this case colored red (as “hot spots”), the top-5 % fuchsia, the top-10 % pink, the top-25 % orange, the top-50 % cyan, and the remainder (bottom-50 %) is colored blue (“cold”). The percentile classes are relative to the specific years or sets of years under study.

Geographic maps

The user is first prompted to choose between an analysis of the address information of either inventors or assignees for the generation of geographic overlays. The addresses are then aggregated at the city level as provided in the patents. Using USPTO data, the addresses are almost always complete and standardized in the case of granted patents, but much less so in the case of patent applications. We use granted patents for this reason, but all time-series are organized in terms of the (earlier) filing dates.

PatStat data are drawn from different (e.g., national) databases and therefore heterogeneous in terms of the organization and quality of the address information. Our routines try to exhaust this data, but correction of error remains an uphill battle. Among the corrections to systematic error, we notably tried to correct for the state information when this is provided for addresses in the USA because the same city names may occur in different states (e.g., Athens, GA or Athens, OH). Several such minor adjustments are made automatically by the routine and we intend to improve this error-correction further.

In both cases (USPTO and PatStat), the addresses are first listed and have to be geocoded (for example, at http://www.gpsvisualizer.com/geocoder/).Footnote 10 Co-occurrence matrices of the addresses at the patent level are then generated for each year (or period of years). After completing this for the aggregated set(s), the new routines provide filters that allow the user to generate overlays to Google Maps for compilations of moving aggregates of years or single years. Because of the low numbers in the first decades (Fig. 1), we used overlapping periods of 5 years in this study, as follows: 1974–1978; 1975–1979; 1976–1980; etc. However, the user can choose another time frame.Footnote 11

The routines for both USPTO and PatStat data produce time-series of output filesFootnote 12 that can be used as input for the generation of overlays to Google Maps at http://www.gpsvisualizer.com/map_input?form=data or a dedicated interface at http://data2semantics.github.io/PatViz. This latter site provides dynamic loading, visualization, and animation of the patent data using the JavaScript libraries of jQuery (http://jquery.com) and Google Maps (https://developers.google.com/maps/). This eliminates a number of steps in producing the visualizations. The resulting animations can be saved locally and made available at one’s own website. The source code and program of PatViz are available for download at https://github.com/Data2Semantics/PatViz/releases; one can use this version locally and/or at the internet (see “Appendix” for further instructions).

The routines also write a series of files (paj1974.txt, paj1975.txt, paj1976.txt, etc.) as input for network analysis using Pajek or any other network-analysis program reading the Pajek format.Footnote 13 These files contain symmetrical co-inventor (or co-assignee) data among cities in matrix format. One can use these files for generating network statistics such as density, degree distributions, etc., both for each year (or period of years) and over time.

Classification maps

For mapping the classifications, we use the base maps of aggregated citation relations among IPC in the USPTO data 1975–2011 provided by Leydesdorff et al. (2012). These maps are available at http://www.leydesdorff.net/ipcmaps for both three and four digits of the current IPC version 8. We can use these maps for CPC because the first four digits of IPC were kept in the CPC scheme.

The initial step for the construction of the time-series is again the construction of the overall map for the aggregated set. Subsequently, the time series are generated by setting filters for consecutive years to this aggregate. In the case of USPTO data, the routine ipcyr.exe (available at http://www.leydesdorff.net/software/patentmaps/dynamic) generates input information for consecutive years in the format of VOSviewer for the mapping (http://vosviewer.com). Two time series of files are generated as input for the mapping for three and four digits of IPC, respectively. Another routine (ps_ipcyr.exe at http://www.leydesdorff.net/patstat) provides the same functionality for downloads from PatStat.

Both routines additionally write a file “rao.dbf” which contains Rao-Stirling diversity for both three and four-digit IPC-based maps for each consecutive year (or set of years). Rao-Stirling diversity is a measure that takes into account both the variety and the disparity in a patent portfolio under study across the IPC classes. The indicator is defined as follows (Rao 1982; Stirling 2007):

$$ \Delta = \sum\nolimits_{ij} {p_{i} p_{j} d_{ij} } $$
(1)

where d ij is a disparity measure between two classes i and j—the categories are in this case IPC classes at the respective level of specificity—and p i is the proportion of elements assigned to each class i. As the disparity measure, we use (1 − cosine) since the cosine values of the citation relations among the aggregated IPC were used for constructing the base map of three and four digits. Jaffe (Jaffe 1986, at p. 986) proposed the cosine between the vectors of classifications as a measure of “technological proximity”. Using the file “rao.dbf” in Excel, the development of the (Rao-Stirling) diversity over time can be plotted. Can the development of diversity perhaps be used as a measure of technological change? (e.g., Anderson and Tushman 1990).

The IPC-based maps of VOSviewer for the different years can be animated (e.g., in PowerPoint) given the base maps of the aggregate of citation relations among IPC classes of patents between 1975 and 2011. The overlays show the evolution in specific samples against a stable background. An example of such an animation for the 419 USPTO patents in terms of IPC3 is provided at http://www.leydesdorff.net/photovoltaic/cuinse2/cuinse2.ppsx. One can animate the webpages of the geo-maps in PowerPoint similarly using the add-on “LiveWeb” at http://skp.mvps.org/liveweb.htm.

Results

USPTO

We first discuss the results of the analysis of using the 419 patents downloaded from USPTO with the search string “CPC/Y02E10/541”, and turn thereafter to the larger set of 3,428 records downloaded with this CPC from PatStat for the comparison.

Geographical diffusion

After proper editing of the html (e.g., webpage titles and insertion of one’s API code of Google Maps), one obtains a series of maps in which the node sizes are proportionate to the logarithm of the number of patents. [We used log(n + 1) in order to prevent cities with single patents from disappearing because log(1) = 0.] As noted, the node colors correspond to the quality of the patents in terms of their citedness (see Leydesdorff and Bornmann 2012). One can click on each node to find statistical details. (This statistical data is also stored in the file “geo.dbf” that is generated and overwritten in each run.) The links span a network of co-inventor relations among the patents.

For example, Fig. 2 provides the set of USPTO patents in this class (Y02E 10/541) for the 5-year period 2000–2004. The numbers of patents are often too small for significance testing, but one can see at a glance that the US is dominant (green-colored nodes) in this set in terms of both numbers and quality. In addition to the US, Japan and Europe have developed their own networks. (One can zoom in on the map at http://www.leydesdorff.net/photovoltaic/cuinse2/index.html.) During this period, international co-inventorship between the three world regions was limited to transatlantic collaborations.

Fig. 2
figure 2

Patent configuration during 2000–2004 for CuInSe2 material in PV Cells (Y02E10/541) in USPTO data; an interactive version of this map is available at http://www.leydesdorff.net/photovoltaic/cuinse2/index.html. See also at http://data2semantics.github.io/PatViz

One can animate the map online by repeatedly clicking on the button “next year” to the right of the arrows of Google Map or by clicking on the button entitled “[Animation]” at the bottom left. (Alternatively, one can enter http://www.leydesdorff.net/photovoltaic/cuinse2/animate.html into the browser.) The animations require the reloading of the html—using a “refresh”—after each year and therefore run most reliably under a light browser such as Google Chrome.

As noted, we took a further step on the basis of this exploration and generated a dynamic interface for users at http://data2semantics.github.io/PatViz. In addition to showing the dynamics for this case study (and for its equivalent using PatStat data; see below), the interface allows users to upload their own geo-coded output files (z*.txt in the case of USPTO data or pat*.txt in the case of PatStat data) and to have generated the animations locally and/or at the Internet (Appendix).

Inspection of the animations informs us that patenting in this CPC class began in isolated centers in the USA, then spread first within the U.S. and thereafter also to some centers in Europe (e.g., 1983–1987). During the second half of the 1980s, Japanese and also isolated inventors in Europe began to patent in the USA. In 1990–1994, co-inventorship is found only in the local environments of Munich (Germany) and within Colorado. The latter network reflects that the National Renewable Energy Laborarory (NREL) of the US Department of Defense is based in Golden, Colorado. (NREL performs research on photovoltaics (PV) under the National Center for Photovoltaics.)

In the second half of the 1990s, there is also more co-invention in the USA and Japan, but within national boundaries. The technology increasingly becomes commercially viable during this period. The number of cities in Europe and Japan with USPTO patents increases, and transatlantic collaboration is resumed towards the end of the 1990s. Since 2003—the commercial phase—one sees co-invention between Japan and the USA, and within Europe. In the European context, France plays a role in addition to a recurrent collaboration between Germany and Spain. An address in the UK (Stirling in Scotland) joins the US networks in the final periods (2007–2011, 2008–2012). During 2008–2012, Europe is otherwise no longer represented in USPTO data.

In summary, collaborations within nations are more important than international collaborations, but the majority of the inventors do not collaborate beyond local environments. (The addresses on the patents can also be the home addresses of inventors.) How can the map in terms of IPC-classes add to our understanding of these geographical dynamics?

IPC classes

Figure 3 shows the IPC-based map (three digits) for the same set of patents as used in Fig. 2 (2000–2004). The technology originated during the 1970s in the category of “basic electric elements” and remained there during the next 15 years, but has spread during the 1990s into other domains of technology such as “spraying and atomizing” and machine techniques for making thin films in photovoltaic cells. This diffusion increases further during the 2000s. (An animation is provided at http://www.leydesdorff.net/photovoltaic/cuinse2/cuinse2.ppsx.)

Fig. 3
figure 3

Map of USPTO patents in terms of IPC at the three-digit level for the period 2000–2004. A dynamic version of this map is available at http://www.leydesdorff.net/photovoltaic/cuinse2/cuinse2.ppsx

Figures 2, 3 can be combined into Fig. 4 using frames in the html for the splitting of the screens (at http://www.leydesdorff.net/photovoltaic/cuinse2/dualmix.html). One can animate Fig. 4 precisely as Fig. 2. However, this animation taught us that dynamic changes in two different (split) screens are difficult to handle for an analyst. A user needs more control over the time steps when focusing on the differences between two dynamics. Therefore, we suggest another solution for studying the dynamics using split screens: by clicking on another year, one opens a new window in the browser with the same figures for this different year. A user is then able to compare among years using, for example, different time intervals (such as 5 or 10 years) by going back and forth between windows, and at one’s own pace.

Fig. 4
figure 4

Map of USPTO patents in terms of both IPC (at the three-digit level) and geographical diffusion for the period 2000–2004; an interactive version of this map is available at http://www.leydesdorff.net/photovoltaic/cuinse2/dualmix.html

Note that the maps in terms of the IPC classes (in the bottom half) can be enlarged to the full breadth of the screen by clicking on the map. We do not provide software for all possible combinations, but one can keep the html relatively simple so that a user can adapt the system to one’s needs. The html of Fig. 4, for example, reads as follows (Table 2).

Table 2 Html code for the two maps shown in Fig. 4

On larger screens, one would be able to show four or even more depictions in parallel. Thus, one would be able to study transitions which are visible in one domain in terms of other domains synchronically or also using different time frames. As noted in the introduction, the visualization of asynchronicities and development in different directions is central to our longer-term research program (Leydesdorff and Ahrweiler in press; Leydesdorff et al. 2013).

Rao-Stirling diversity as a measure of technological change

The longitudinal development of Rao-Stirling diversity indicates a cyclic pattern (Fig. 5).

Fig. 5
figure 5

The development of Rao-Stirling diversity in IPC (three and four digits) among 419 USPTO-patents with CPC Y02E10/541 (“CuInSe2 material PV cells”) during the period 1975–2012

Figure 5 suggests that the technology was developed in three cycles. Two of the valleys, i.e., the period of decreasing diversity in the late 1980s and the latest such period, correspond with breakthroughs in the efficiency of thin-film solar cells (Green et al. 2013). Combining the maps with split-screens of Fig. 4 for each consecutive year, we suggest specifying these cycles as follows (Shafarman and Stolt 2003):

  1. 1.

    an early cycle during the 1980s which is almost exclusively American; after initial development of the technology at Bell Laboratories in the ’70s, Boeing further developed the solar cells using these materials;

  2. 2.

    a second cycle during the 1990s that includes transatlantic collaboration and competition with Europe; the US, however, remains leading; and

  3. 3.

    a third and current cycle—the commercial phase—in which American-Japanese collaboration, on the one side, and collaboration within Europe, on the other, prevail.

The volume of patents continued to increase more smoothly (Fig. 1), but with an increasing (above-exponential) rate during the most recent years. The pronounced articulation of these cycles in terms of Rao-Stirling diversity came as a surprise to us. As the material technology becomes mature, other technologies such as spraying the thin film on carrier materials may become crucial.

PatStat

We developed the same routines analogously for the patent data downloaded from PatStat. As noted, this data is an order of magnitude larger than in USPTO (Fig. 1), since PatStat collects patent data from offices in different countries and world regions. The geographical map for the same year as used above (2000–2004) is provided in Fig. 6. This figure can be animated similarly as in the case of USPTO data in Fig. 2 above—that is, by clicking on the button entitled “[Animate]”. This animation is also implemented in the JavaScript-based program PatViz at http://data2semantics.github.io/PatViz or http://www.leydesdorff.net/patviz (see “Appendix”).Footnote 14

Fig. 6
figure 6

Patent configuration during 2000–2004 for “CuInSe2 material in PV Cells” (Y02E10/541) in PatStat data; an interactive version of this map is available at http://www.leydesdorff.net/photovoltaic/cuinse2.patstat/index.htm

The colors in Fig. 6 use a palette different from Fig. 2 because this data cannot be assessed in terms of citations. In this figure, “red” means hot, and “blue” cold in terms of relative numbers of patents at locations (Bornmann et al. 2011). Otherwise, the map is not very different from the one based on USPTO data (in Fig. 2). The PatStat network can also be considered as an extension of the USPTO network. For example, the Indian center in Chennai is added. This center is well connected to leading centers in Germany and France.

In order to enhance the possibility to make comparisons, we experimented with a split screen showing the USPTO data in the top screen and PatStat data for the same year(s) at the bottom (Fig. 7; available online at http://www.leydesdorff.net/photovoltaic/cuinse2.patstat/dualgeo.html). For the same reasons as above, we abstain from animating this double map because of overloading one’s mental map, but instead the option is provided to compare for different years in terms of new windows in a browser.

Fig. 7
figure 7

Comparison of USPTO-based and PatStat-based global maps of patents classified as “CuInSe2 material PV cells” (CPC); an interactive version of this map is available at http://www.leydesdorff.net/photovoltaic/cuinse2.patstat/dualgeo.html

The juxtaposition of the geographical maps for USPTO and PatStat data for each year and over the years in separate windows enables an analyst to zoom into the differences and similarities. One can follow up with network analysis using the files in the Pajek format that are generated additionally by our routines. Figure 8 shows the largest network components during 2000–2004 in the two sets of patents classified with “CuInSe2 material in PV Cells” (Y02E10/541) and using the same data as in Figs. 2 and 6 above. In addition to spelling variants and misspellings in the PatStat database such as Rueil-Malmaison (France)—with or without hyphen—and Jülich (Germany)—with or without umlaut—the two graphs show the extension of the network in PatStat including non-US patents.

Fig. 8
figure 8

Largest components of the co-inventor network in USPTO and PatStat data for patents in “CuInSe2” (Y02E10-541) during 2000–2004; 16 and 24 nodes, respectively. Coloring of the community structure is based on the algorithm of Blondel et al. (2008); Kamada and Kawai (1989) is used for the layout

In addition to this network of co-inventors between France and Germany, Fig. 9 shows other (separate) networks in this same year among German, Dutch, and Japanese inventors, and one network with German, Dutch, Belgian, and Estonian participants (in the upper left-side corner). Note that US inventors are not networked internationally during this period (2000–2004).

Fig. 9
figure 9

Components other than the largest one (see Fig. 8) in the co-inventor network of patents in PatStat during 2000–2004

Indeed, one would find a poor representation of these national and regional networks using USPTO data (Fig. 2). When comparing the two overall networks for 2000–2004 in terms of various network parameters (e.g., De Nooy et al. 2005; Hanneman and Riddle 2005), the density value is significantly different: density in the USPTO data 2000–2004 is twice as high as for PatStat data in this same period. Thus, while the PatStat network is larger in size, it is less densely connected than the USPTO network among inventors. The USPTO network can be considered as a core set within the larger network of PatStat data. The number of communities in this PatStat data is 67 as against 32 in USPTO data. Although this seems to support the idea of showing niche markets (e.g., in India), 47 of these groups are isolates, and thus most likely local duplicates.

Figure 10 shows the longitudinal development of Rao-Stirling diversity in the set of 3,428 patents downloaded from PatStat using the CPC of Y02E10/541. Note that Rao-Stirling diversity might be used as a rough first indicator of a possible “technological change,” but not as an actual measure of this complex phenomenon. However, one can distinguish the same three cycles of development as in Fig. 5, but less pronounced when compared with USPTO data. This accords with the expectation because PatStat includes national databases which may experience the various cycles with more delays than among patents in USPTO. The shift to a next generation of the technology is provoked by sharp competition in the US market, but not necessarily followed in more protected market environments in other nations or world regions. In other words, one can expect the diffusion patterns to develop more gradually using PatStat data because of this effect of averaging out among the different sources of patent data. New generations of patents may be delayed in the worldwide database of PatStat when compared with the more competitive environment of USPTO.

Fig. 10
figure 10

The development of Rao-Stirling diversity in IPC (three and four digits) among 3,428 patents in PatStat during the period 1975–2010

Discussion about the longitudinal development of diversity

Let us further explore our conjecture about technological generations made visible by time-series of Rao-Stirling diversity, by using the next following CPC category, that is, the class Y05E10/542 for “dye-sensitized solar cells” (DSSC). In Fig. 11, Rao-Stirling diversity is plotted at the four-digit level for both USPTO and PatStat data. The data suggest at least two cycles: a first one that ran out of steam during the 1980s, and a second one during the 1990s. Perhaps, a third one can be distinguished as emerging in USPTO data during the most recent period, that is, since 2004.

Fig. 11
figure 11

Development of Rao-Stirling diversity in Y02E10/542 (“Dye sensitized solar cells”) for USPTO and PatStat, respectively

The second wave (in the early 1990s) corresponds to the invention of the modern (second generation) version of DSSC which was developed in the period 1988–1991. The first highly efficient DSSC—also known as the Grätzel cell—was published in 1991 (O’Regan and Grätzel 1991). A patent was filed at the World Intellectual Patent Organization according to the Patent Cooperation Treaty (PCT) in March 1993 (WO93/18532), and then also at USPTO in November 1993 (nr. 5,525,440 in USPTO; granted June 11, 1996). The École Polytechnique Fédérale de Lausanne is the assignee of this patent. Patenting, however, seems to have become broader in scope already a few years earlier (Schmookler 1962); and shortly after 1993, the diversity begins to decline. The plots in Figs. 5 and 11 provide us with heuristics for the reconstruction of the history of a technology from this data. In other words, informed questions can be raised and discussed with expert knowledge in the various domains.

Figures 5 and 11 may seem somewhat similar upon visual inspection (the Spearman rank correlations are .81 in the case of IPC3 and .48 in the case of IPC4), but in other cases we found significantly negative correlations, such as ρ = –.71 (p < 0.01) between “microcrystalline silicon PV cells” (Y02E10/545) and “polycrystalline silicon PV cells” (Y02E10/546). An expert in PV research whom we consulted confirmed that these are very different technologies (Van Sark, personal communication, 7 January 2014). “Microcrystalline silicon PV cells” (Y02E10/545) are more similar to “amorphous silicon PV cells” (Y02E10/548) and “polycrystalline” (Y02E10/546) is more similar to “monocrystalline” (Y02E10/547). The Spearman rank correlation between these last two time-series, for example, is 0.69 (p < 0.01).

Figure 12 shows the results for an extension to the nine material technologies classified as Y02E10/54*. These results merit further investigation and interviews for validation with experts in the respective fields.

Fig. 12
figure 12

The development of Rao-Stirling diversity in USPTO-patents of nine material technologies for photovoltaic cells

Conclusions

The maps of patents in different dimensions are instrumental to understanding the complex dynamics of innovation by providing different projections of these dynamics. We distinguished in this study between IPC-based maps that show the technological organization of the patents in a vector space, the geographic maps as overlays to Google Maps, and the social networks that can be overlaid to the geographic map, but can also be studied in themselves using graph-theoretical instruments such as spring-embedded layouts (e.g., Kamada and Kawai 1989; see Fig. 8 above).

The user, or more generally the discourse of innovation studies, can bring the insights that can be harvested from the different perspectives together reflexively. The maps provide the footprints of the development; but they can make the historical narrative evidence-based. We elaborated this for the case of CuInSe2 as a material technology for photovoltaic cells. At the theoretical level, we thus aim to address what Griliches (1994) called “the computer paradox,” but from a methodological angle: ever more data—nowadays, one would say “big data”—are stored in ever larger repositories. The logic of these repositories is institutional, whereas the logic of innovation is based on the transversal recombination of functions at interfaces (e.g., supply and demand). The relabeling using the Y-tag in CPC, however, provides an opportunity to follow delineated technologies within and across databases: recent agreements of EPO and USPTO with the Chinese, Korean, and Russian patent offices to use also CPC in the near future show an increased awareness to coordinate the data in a networked mode.

The advantage of developing instruments is provided by the direct relation between instruments such as visualization and the empirical operationalization (McGrath et al. 2003). Middle-range theorizing can guide this process of developing “instrumentalities” (de Solla Price 1984) as heuristics (Geels 2007). The systems perspective adds the evolution of these functions over time in terms of technological trajectories and regimes (Arthur 2009). Empirical studies of innovation need to allow for the appreciation of changes of perspectives because innovations can be developed—or unintentionally diffuse—into different directions: geographical, economic, and technical. In our opinion, the bottle neck of innovation studies has been the development of instruments which keep pace with the (re)combinations possible in terms of the data fluxes.

Dynamic overlays that can be accessed interactively on the internet provide the user with options to trace technological developments and develop new perspectives reflexively. The use of Rao-Stirling diversity in this study can be considered as a case in point: the literature pointed us to considering variety versus the loss of variety in shake-out phases as central to techno-economic developments (Anderson and Tushman 1990; He and Fallah 2011), but the data allowed us to operationalize this in relation to the new instruments. The extension beyond two maps to be recombined follows as a progressive research agenda for quantitative innovation studies (Rotolo et al. in preparation).