1 Introduction

The adoption of genetically modified (GM) varieties of soybean and cotton seed has become nearly ubiquitous in the U.S. These varieties account for over 90 percent of acres planted in each crop type since their introduction in 1994 and 1995, respectively (Wechsler, 2017). The rapid expansion of the market for GM crop varieties—coupled with frequent merger and acquisition (M&A) activity—has prompted concerns about market concentration in the product market as well as in the level of innovative activity (Moschini, 2010; Maisashvili, et al., 2016; MacDonald, 2017; Deconinck, 2020; Clapp, 2018; Régibeau & Rockett, 2021). Using data on field trial applications (FTA) for GM soybean and cotton varieties, we estimate the level of concentration in research and development (R&D) activity by agricultural biotechnology firms—both with and without accounting for M&A activity through 2010.

Figure 1 reveals increasing adoption of GM cotton and soybean varieties since their introduction in the mid-1990s. The increasing adoption was accompanied by an increase in the concentration of R&D activity: concentration of R&D in GM cotton seed peaked in the early 2000s, and concentration of R&D in GM soybean seeds peaked in the mid-2000s. Subsequent levels of R&D concentration have decreased for both crops, with soybean seed markets remaining more concentrated than those for cotton seed. Anderson and Sheldon (2017) show that when R&D investments lead to increases in product quality, the concentration in R&D activity is bounded from below. These results—which Anderson and Sheldon use to estimate the level of concentration in GM corn seed markets—imply that increases in market size increase R&D activity by existing firms rather than permit entry by new competitors. Consistent with Lence and Hayes (2005), the overall welfare effects of this activity depend upon product quality, which can be thought of as the cost-saving to farmers of improved GM varieties, and the nature of product market competition.

Fig. 1
figure 1

R&D Concentration Ratios and Adoption of GM Cotton and Soybean

We expand upon Anderson and Sheldon (2017) by considering two GM crop types: soybeans and cotton. Unlike the market for GM corn seed, which is dominated by its use in the U.S., the international market for GM varieties of soybean and cotton is substantial, with global adoption rates of 78 and 64%, respectively (ISAAA, 2016). If the financial returns to R&D activity in GM crop varieties expand beyond the domestic market alone, then we would be overestimating the level of R&D concentration along two dimensions. First, if the market share of foreign competitors abroad is substantial, then the actual level of R&D concentration would be overestimated. Second, if the relevant market size for domestic firms includes international markets, then the fitted lower bound to R&D concentration would be “flatter” and the theoretical lower bound for large markets would be overestimated.

Prior to the release of GM varieties for R&D purposes in the U.S., firms, non-profits, and government organizations must file an application with the Animal and Plant Health Inspection Service (APHIS). Using FTA from the introduction of GM varieties in the late 1980s through 2010, we construct measures of R&D concentration across agro-climatic geographic regions and across time for each of the crop varieties. We use this two dimensional variation, geographic and intertemporal, in order to estimate the lower bounds to R&D concentration for GM soybean and cotton varieties with and without adjusting for M&A activity. Using cluster analysis, we partition U.S. states into non-overlapping submarkets in order to allow for geographic spillovers in R&D investment between similar regions. The submarkets that we define are largely consistent with the cotton production regions identified in Larson and Meyer (1996) and the soybean production regions identified in Schaub, et al. (1988).

We use a two-step procedure to estimate the lower bounds to R&D concentration. In the first step, we fit a lower bound by solving a linear programming problem using the simplex algorithm and boot-strapping the standard errors. If increases in market size lead to increased R&D activity by existing firms, the non-zero first-stage residuals should fit a two-parameter Weibull distribution. Conversely, if market size increases are accompanied by firm entry and constant (or decreasing) concentration of R&D activity, then the residuals should fit a three-parameter Weibull distribution. These distributions can be estimated via maximum likelihood and tested using a likelihood ratio test.

Our results imply that existing firms increased R&D activity in both soybean and cotton seed submarkets as market size increased rather than permit additional market entry. Regional submarkets in soybean seed are characterized by greater variation in R&D concentration, lower estimated R&D concentration for current market sizes, and higher theoretical R&D concentration relative to cotton seed. The results suggest there is potential for additional concentration in soybean R&D activity, but that additional concentration in cottonseed R&D activity should be viewed critically as current levels of concentration are approaching the theoretical lower bound. Accounting for previous M&A activity significantly changes the lower bounds to R&D concentration for both cotton and soybean seed, by increasing the levels of observed concentration in smaller- and medium-sized submarkets.Although we are unable to estimate the impact of the recent wave of mergers and acquisitions between the largest competitors in the seed and agrochemical industries (MacDonald, 2017, 2019), we conclude with a discussion of how this M&A activity and the divestment mandated by U.S. regulators might impact R&D concentration in GM cotton and soybean seed markets.

2 Theoretical Justification and Empirical Model of R&D Concentration

Sutton (1991, 1998, 2007) develops a model of market structure in which market entry and advertising and/or R&D investment decisions are jointly determined. When firms can vertically differentiate their products by investing in advertising or R&D, the equilibrium number of entrants, hence the concentration of firms in the market, remains bounded away from perfectly competitive levels even as the size of the market becomes large. Entry costs are considered “endogenous” in the sense that both product quality and the number of market entrants are jointly determined via investments in advertising or R&D. This contrasts with the case in which products are sufficiently homogeneous, or in which product quality is non-increasing in advertising or R&D expenditures, such that all entrant firms offer symmetric, “minimum” quality. In this case, firms enter the market “exogenously” such that the number of entrants (market concentration) is strictly increasing (decreasing) as market size increases.

2.1 Theoretical Model of Lower Bounds to Concentration

Sutton (1998) derives empirically testable hypotheses with regard to the lower bounds to market concentration and the R&D-to-sales ratio that would be observed in each case. Sutton illustrates that the market share \({C}_{1,m}\) of the firm that offers the highest level of quality in submarket \(m\) is bounded from below under endogenous entry costs by:

$$C_{1,m} \ge \alpha \left( {\sigma ,\beta } \right) \cdot h_{m} ,$$
(1)

where the parameter \(\alpha \left( {\sigma ,\beta } \right)\) depends upon the degree of product substitutability \(\sigma\) and the elasticity of R&D costs \(\beta\); and \(h_{m}\) is a measure of product homogeneity in submarket \(m\).

Under exogenous entry costs, the market share for all entrants is symmetric and given by:

$$C_{1,m} = \frac{1}{{N_{m} }},$$
(2)

where \({N}_{m}\) is the number of entrants when all firms invest in minimum quality. When Eq. (1) is binding, firms respond to an increase in the size of the submarket by escalating product quality rather than permitting entry by additional firms. Equation (2) implies that an increase in submarket size will result in entry by additional firms such that the concentration ratio is strictly decreasing. Although these two conditions are stated separately, it is important to note that a single industry may be characterized by either endogenous or exogenous entry costs, depending upon the underlying parameters. Equation (2) is more likely to arise when: products are more homogeneous or closer substitutes; R&D costs are lower; and submarket sizes are larger. The extent that R&D and/or advertising investments jointly determine product quality and the number of entrant firms can be tested empirically via cross-industry analysis (Robinson & Chiang, 1996; Sutton, 1998) or by comparing within an industry across submarkets \(m\) (Anderson & Sheldon, 2017; Berry & Waldfogel, 2010; Dick, 2007; Ellickson, 2007; Latcovich & Smith, 2001; Marin & Siotis, 2007; Sutton, 1991).

Anderson and Sheldon (2017) show that under the same conditions that are identified in Sutton (1998), the lower bound to concentration in R&D expenditures can also be derived and empirically tested. When R&D investments and market entry decisions are endogenous, then the firm that invests in the market-leading level of quality in submarket \(m\) will have a share of R&D expenditures \(R_{1,m}\) that is bounded from below by:

$$R_{1,m} \ge \left[ {\alpha^{2} \left( {\sigma ,\beta } \right) \cdot h_{m}^{2} - \alpha \left( {\sigma ,\beta } \right) \cdot h_{m} \left( {\frac{{F_{0} }}{{S_{m} y_{m} }}} \right)} \right],$$
(3)

where \({F}_{0}\) is the fixed setup cost associated with entry; \({S}_{m}\) is the number of consumers in submarket \(m\); and \({y}_{m}\) is the industry sales revenue per consumer in submarket \(m\). Conversely, R&D concentration is bounded from above by:

$$R_{1,m} \le \frac{1}{{N_{m} }}.$$
(4)

Since \(\alpha \left(\bullet \right)\in \left[\mathrm{0,1}\right]\) and \({h}_{m}\in \left[\mathrm{0,1}\right]\), the lower bound to concentration in R&D expenditure when firms make quality-enhancing R&D investments with entry is less than the lower bound to output market concentration. As the size of the submarket increases, in terms of the number of consumers (\({S}_{m}\)) or the total industry revenue (\({y}_{m}\)), the lower bound to R&D concentration increases. Therefore, larger submarkets are more likely to be concentrated relative to smaller submarkets. For markets in which entry costs are exogenous, the level of market concentration forms an upper bound on the level of R&D concentration.

2.2 Empirical Specification

The bounds to R&D concentration—which is reflected in Eqs. (3) and (4)—can be estimated with the use of maximum likelihood when the concentration ratio is characterized by a Weibull distribution (Anderson & Sheldon, 2017).We follow the empirical estimation strategy that is developed in Sutton (1991) based upon Smith (1985, 1994) and the simplex methodology of Giorgetti (2003). In order to derive the empirically testable equations, the R&D concentration ratio must be transformed such that the predicted concentration measures lie between 0 and 1. We first monotonically shift \({R}_{1,m}\) by − 0.0001 to address submarkets with only a single entrant and then transform the concentration measure according to:

$$\tilde{R}_{1,m} = \ln \left( {\frac{{R_{1,m} }}{{1 - R_{1,m} }}} \right).$$
(5)

In order to estimate the lower bounds to R&D concentration, the transformed concentration measure for each submarket \(m\) is normalized by the degree of product homogeneity in the submarket such that the functional form for the estimation is:

$$\frac{{\tilde{R}_{1,m} }}{{h_{m}^{2} }} = \theta_{0} - \theta_{1} \left( {\frac{1}{{h_{m} \ln \left( {S_{m} y_{m} /F_{0} } \right)}}} \right) + \varepsilon_{m} .$$
(6)

Estimates of \(\theta_{0}\)—the theoretical lower bound to market concentration for large markets—and of \(\theta_{1}\)—the slope parameter for changes in the lower bound as market size changes—can be obtained via a linear programming problem such that the residuals \({\varepsilon }_{m}\) are non-negative. The fitted residuals follow a Weibull distribution such that:

$$F\left( \varepsilon \right) = 1 - \exp \left[ { - \left( {\frac{\varepsilon - \mu }{\delta }} \right)^{\gamma } } \right], \gamma > 0, \delta > 0,$$
(7)

where \(\varepsilon \ge \mu\). The Weibull distribution is characterized by three parameters \(\left(\mu ,\delta ,\gamma \right)\), which reflect the “shift”, “scale”, and “shape” of the distribution. The shift parameter \(\mu\) represents the degree of horizontal shift of the distribution such that when \(\mu =0\), it corresponds to a two-parameter Weibull distribution. The scale parameter \(\delta\) reflects the dispersion of the Weibull distribution, and the shape parameter \(\gamma\) captures the degree of clustering around the lower bound.

The estimation of the lower bound to R&D concentration involves a two-step procedure. First, we obtain consistent estimates of the lower bound parameters \({\widehat{\theta }}_{0}\) and \({\widehat{\theta }}_{1}\) by solving a linear programming problem with the use of the simplex algorithm under the constraint that the model residuals are non-negative such that:

$$\begin{array}{*{20}c} {\mathop {\min }\limits_{{\left\{ {\theta_{0} ,\theta_{1} } \right\}}} \mathop \sum \limits_{m = 1}^{M} \left[ {\frac{{\tilde{R}_{1,m} }}{{h_{m}^{2} }} - \left( {\theta_{0} - \theta_{1} \left( {\frac{1}{{h_{m} \ln \left( {S_{m} y_{m} /F_{0} } \right)}}} \right)} \right)} \right]} \\ {subject\; to \;\frac{{\tilde{R}_{1,m} }}{{h_{m}^{2} }} \ge \theta_{0} - \theta_{1} \left( {\frac{1}{{h_{m} \ln \left( {S_{m} y_{m} /F_{0} } \right)}}} \right),\forall m} \\ \end{array} ,$$
(8)

with standard errors that can be calculated via bootstrapping. Since there are two first-stage parameters, there will be \(M-2\) positive fitted residuals \({\widehat{\varepsilon }}_{m}\) from the first stage. These residuals can be used to estimate the Weibull distribution parameters \(\left(\mu ,\delta ,\gamma \right)\) via the maximization of the log pseudo-likelihood function:

$$\begin{array}{*{20}c} { \mathop {\max }\limits_{{\left\{ {\mu ,\delta ,\gamma } \right\}}} \mathop \sum \limits_{m = 1}^{M - 2} \ln \left[ {\left( {\frac{\gamma }{\delta }} \right)\left( {\frac{{\hat{\varepsilon }_{m} - \mu }}{\delta }} \right)^{\gamma - 1} \exp \left[ { - \left( {\frac{{\hat{\varepsilon }_{m} - \mu }}{\delta }} \right)^{\gamma } } \right]} \right]} \\ \end{array}$$
(9)

with standard errors that can be estimated according to the asymptotic distributions defined by Smith (1994).

The shift parameter estimate \(\widehat{\mu }\) can be used to test for the validity of the three-parameter Weibull distribution (\(\widehat{\mu }>0\)) against the restricted, two-parameter Weibull distribution (\(\widehat{\mu }=0\)). Failure to reject the three-parameter Weibull distribution implies that we are unable to reject that R&D expenditures are exogenous since the transformed measures of R&D concentration are shifted away from the lower bound.

3 Data and Descriptive Statistics

The lower bound to R&D concentration in soybean and cotton seed markets can be estimated according to Eq. (8) with the use of data for each crop type at the submarket level. We exploit variation in the adoption and prevalence of GM crop varieties across two dimensions: (i) suitability of GM traits to agro-climatic conditions that vary geographically; and (ii) intertemporal variation in the adoption and expansion of GM crop varieties across geographic submarkets. An estimation of the lower bound to R&D concentration in these seed markets requires data at the firm level on R&D investment \({R}_{1,m}\) for each crop type and every submarket. We aggregate state-level data on FTA of GM crops into geographically distinct submarkets as a proxy for firm-level R&D investment. In addition to the firm-level data that are used to calculate the degree of R&D concentration, the lower bounds to R&D concentration also depend upon industry-level data on submarket size \({S}_{m}{y}_{m}\), the degree of product homogeneity in the submarket \({h}_{m}\), and the minimum setup costs \({F}_{0}\) that a firm must incur in order to enter a submarket (see Eq. 3).

3.1 Geographic Submarket Cluster Analysis

In order to estimate the lower bounds to R&D concentration in a single industry, we first must identify distinct submarkets. Previous industry-level analyses of lower bounds to concentration have largely focused on retail industries, which can be separated spatially. These include examinations of retail banking (Dick, 2007), supermarkets and barbers/beauty salons (Ellickson, 2007), and newspapers and restaurants (Berry & Waldfogel, 2010). Unlike a retail environment in which firms incur advertising expenditures in each submarket, R&D investment in GM traits face a greater potential for spillovers across submarkets. The potential for spillovers across submarkets rules out the possibility of using patent applications as a proxy for R&D activity since these occur at the national level and are equally applicable to all submarkets.

In order to identify the relevant subnational geographic markets for seed varieties, we make a critical identifying assumption that R&D investments in GM seed varieties are recouped within a particular geographic submarket only. Specifically, we assume that if a firm wishes to market its existing GM seed in a different geographic submarket, then it first must test those varieties in the submarket that it wishes to enter. This assumption motivates us to characterize geographic submarkets for soybean and cotton seed according to observable agricultural and climatic differences.

Cluster analysis permits us to partition states into regional clusters that follow a “natural structure” of observable agricultural and climate characteristics. We assume a “prototype-based” framework such that every state in a cluster is more similar to a prototype state for that submarket than it is to every other submarket’s prototype state. We utilize a K-means approach by defining the number of K submarket clusters for each crop type and minimizing the Euclidean distance between each state and the centroid of the cluster. For robustness, we consider alternate K clusters for each crop type as well as minimizing the absolute distance function.Footnote 1

The results of the cluster analysis are reported in Table 1, along with the corresponding market shares of U.S. production and the number of field trial applications for each submarket.

Table 1 GM Seed Submarkets by Crop Type

3.2 Measuring R&D Concentration

The ideal data for measuring R&D concentration in GM seed markets would be R&D expenditures for each product line in each submarket for every active firm. Although this level of detail on R&D expenditures is unavailable for GM crop varieties, there are publicly available data on FTA that capture an intermediate stage of the R&D process. The Biotechnology Regulatory Services (BRS), which is a division of the Animal and Plant Health Inspection Services (APHIS), mandates that all importation, interstate movement, and release of GM organisms are reported by firms and organizations. BRS publishes this database of permits, notifications, and petition applications, which includes information on: the applicant institution; the status of the application; the plant (or “article”) type; the dates the application was received, granted, and applicable; the states in which the crops will be released, transferred to, or originated from; and the crop phenotypes and genotypes. Our data cover 1985 through 2010 and consist of 33,440 permits or notifications of release across all crop types. We restrict the sample to include only for-profit firms and to applications that pertain to the release of GM soybean and cotton varieties.

Figure 2 plots the annual number of field trial applications, the number of firms that file an application for a field trial each year, and the average number of applications per firm by year for both GM cotton and soybean varieties. Although the number of individual firms peaked in the early 1990s for both cotton and soybeans, the total number of applications did not peak until the early 2000s for cotton and the late 2000s for soybeans. The number of applications per firm reflect this increased intensity of R&D—with the intensity of cotton research peaking in the late 1990s and early 2000s and the intensity of soybean applications peaking in the mid to late 2000s.

Fig. 2
figure 2

Field Trial Applications and Firms in GM Cotton and Soybean Seed Markets

We aggregate the number of FTA for each firm at the geographic submarket level across five-year intervals in order to derive a measure of R&D concentration that varies across submarkets and time. We account for geographic spillovers by aggregating applications up to the submarket level, which consists of states with similar agricultural and climatic characteristics. We aggregate across multiple years to account for the nature of the R&D process, in which year-to-year fluctuations are secondary to long-run trends. Summary statistics for the geographic submarket concentration in FTA, as well as the other variables included in our empirical analysis, are included in Table 2.

Table 2 Summary Statistics by Crop Type

3.3 Industry-level Data on Market Size, Product Homogeneity, and Setup Costs

In order to estimate the lower bounds to R&D concentration, we still require submarket size, a measure of product homogeneity at the submarket level, and minimum R&D setup costs. Our primary measure of submarket size is a proxy for total industry sales that we construct using annual data from the June Agriculture Survey and the Agricultural Resource Management Surveys (ARMS). We obtain total acres planted and harvested at the crop level from acreage reports from the June Agriculture Surveys for each state and aggregate within submarkets. The Economic Research Service (ERS) of the USDA computes yearly seed costs for each crop type based upon ARMS data. After adjusting for inflation, we multiply annual seed costs by total acres planted to obtain our proxy for industry sales at the submarket level.

As a robustness check, we consider a definition of industry sales at the submarket level for GM seed varieties only. We combine estimates on adoption of GM seed varieties by Fernandez-Cornejo and McBride (2002) for 1996–1999 with estimates provided by the June Agriculture Surveys for 2000–2010 in order to obtain a proxy for submarket-level industry sales of GM crop varieties.Footnote 2 These rates of adoption are also used to construct the degree of product homogeneity for each crop type at the submarket level. By definition, the product homogeneity index is meant to capture the percentage of industry sales of the largest product group. We consider product groups as broadly defined: conventionally-bred varieties; insect resistant (IR) varieties; herbicide tolerant (HT) varieties; and “stacked” varieties that consist of both IR and HT traits. Since the lower bound estimations explore the market share of the leading firm in each submarket, the product homogeneity index \({h}_{m}\) is calculated as the percentage of acres planted with the largest product group—as specified in Sutton (1998) and Anderson and Sheldon (2017).Footnote 3

Finally, we estimate minimum setup costs \({F}_{0}\) for each crop type that is associated with entry into the product market. We sum the total number of public “scientist years” (SY), as reported by the State Agricultural Experiment Stations (SAES) and the Agriculture Research Service (ARS), and divide this sum by the total number of reported projects to obtain an average SY per crop. Using data from Frey (1996) and Traxler et al. (2005), we multiply the average SY by the private industry cost per SY ($148,000) and adjust for inflation.Footnote 4

4 Empirical Results and Discussion

The unadjusted measures of R&D concentration, both for the market-leading firm in each submarket and for the largest four firms in each submarket, are plotted against the size of each submarket in Fig. 3 for cotton seeds and Fig. 4 for soybean seeds. The raw data reveal a considerable amount of concentration across submarkets and across time with the four-firm concentration ratios in cotton seed exceeding 0.75 in every submarket and the four-firm concentration ratios in soybean seed exceeding 0.60 in every submarket. The single-firm R&D concentration ratios for both GM cotton and soybean seeds also appear to be non-decreasing in market size.

Fig. 3
figure 3

R&D Concentration and Market Size in GM Cotton Seed

Fig. 4
figure 4

R&D Concentration and Market Size in GM Soybean Seed

In order to estimate the lower bounds to R&D concentration, the raw data that are presented in Figs. 3 and 4 are transformed according to Eq. (5)and the lower bounds are estimated controlling for the degree of product homogeneity in each submarket. The baseline, two-stage estimation results are reported in Table 3, and the estimated lower bounds are illustrated in Figs. 5 and 6 for cotton and soybean seeds, respectively. Direct interpretation of the coefficients on the lower bound estimations can be difficult due to the logit transformation of the measure of R&D concentration. However, the first-stage intercept estimate \({\widehat{\theta }}_{0}\), when adjusted for product homogeneity and transformed by the inverse logit function, is equivalent to the theoretical lower bound to R&D concentration as market size becomes large. Since the dependent variable has been transformed, the null hypothesis for the interpretation of the estimated coefficient on the intercept term should be adjusted. The appropriate null hypothesis in this case is that the lower bound to R&D concentration converge to zero as market size becomes large when products are homogenous. Under this hypothesis, the coefficient on the intercept term is approximately − 9.210. Therefore, our test of statistical significance is not whether the estimated coefficient equals zero, but whether it is statistically different from − 9.210. The coefficient on adjusted market size \({\widehat{\theta }}_{1}\), after being adjusted for product homogeneity and fixed setup costs, informs us as to whether R&D concentration is increasing (a negative parameter), decreasing (a positive parameter), or independent (insignificant parameter) in the submarket size.

Table 3 Lower Bound Estimations for GM Cotton and Soybean Seed
Fig. 5
figure 5

Lower Bound Estimations for R&D Concentration in GM Cotton Seed

Fig. 6
figure 6

Lower Bound Estimations for R&D Concentration in GM Soybean Seed

The first-stage estimates reveal that there exists a lower bound to R&D concentration that does not converge to zero as the market size becomes large. Moreover, the increasing lower bound is consistent with an industry in which increases in market size are accompanied by escalations in R&D in order to improve product quality and to block entry by additional firms. The results from Table 3 imply that the largest firm, in an infinitely-sized submarket, would account for 49.2% of the R&D in cotton seed and 78.6% of the R&D in soybean seeds. Although R&D in the largest submarket in the latest time period (2006–2010) is already substantially concentrated in cotton, with a fitted R&D share for the largest firm of 42.6%, it is much less concentrated in soybeans at 26.7% especially when compared to the theoretical predictions. This reveals that the substantial consolidation of R&D activity in soybean seeds that followed the end of the sample in 2010 would be consistent with the estimated lower bounds and not necessarily indicative of anticompetitive actions.

From the likelihood ratio tests of the second-stage results, we fail to reject the hypothesis that the first-stage residuals fit a two-parameter Weibull distribution for both cotton and soybeans seeds: We fail to reject the hypothesis that \(\mu =0\) such that there is no horizontal shift of the distribution that would be consistent with a poor fit of residuals that would arise under exogenous R&D costs. The estimated shape parameter \(\widehat{\gamma }\) for both cotton and soybean seeds are less than two, which confirms the appropriateness of Smith’s (1985, 1994) two-step procedure and indicates a fair amount of clustering of observations around the lower bound to R&D concentration. The estimated scale parameter \(\widehat{\delta }\) is also consistent with a relatively narrow dispersion of first-stage residuals.

4.1 Robustness Checks

In order to test the validity of our estimations of the lower bounds to R&D concentration in cotton and soybean seed markets, we consider four robustness checks. First, we consider an alternative definition of submarket size that is based solely upon an estimate of the number of acres that were planted with GM varieties only. We subsequently explore the role of the product homogeneity index upon our results by assuming homogenous products—which would bias our results towards finding R&D costs to be exogenous. The third robustness check considers an alternative definition of minimum setup costs that is based upon public sector SY, which exceed private sector SY. The greater minimum setup costs should again bias our estimations towards failing to find endogenous R&D costs. Finally, we consider an alternative functional form for market size as proposed by Dick (2007), which allows for the lower bound to R&D concentration to change non-monotonically in market size.

The robustness checks, which are reported in Table 4, generally support our findings of endogenous R&D investments in GM cotton and soybean seed markets. The results reported in columns labeled “GM Market Size” differ from our baseline estimations in two dimensions. We limit the submarket size to only those acres that were planted with GM varieties and also restrict our sample to observations between 1996 and 2010 since commercially available GM varieties were not available between 1991 and 1995. The results from these estimations confirm our baseline estimations of endogenous R&D investments in both sign and magnitude. The theoretical lower bounds to concentration implied by the first-stage estimates increase for GM cotton seeds from 0.492 to 0.545 as well as for GM soybean seeds from 0.786 to 0.953—with both fitted lower bounds increasing more rapidly under the alternate submarket definition.

Table 4 Robustness Checks on Lower Bound Estimations

In the second robustness check, we explore the measurement of the product homogeneity index upon our estimations of endogenous R&D investments. These results, which are reported in the “Homogeneity” columns, confirm the importance of firms’ being able to differentiate their products in order to capitalize upon investments in quality. When the product differentiation channel is “turned off”, the estimation results are consistent with exogenous R&D investments with the theoretical lower bounds to R&D concentration decreasing to 0.259 for cotton seeds and 0.099 for soybean seeds. The “Setup Costs” columns report results in which the minimum setup costs are assumed to be consistent with the public sector cost of R&D, which exceeds the private sector costs. Neither the theoretical nor the fitted lower bounds to R&D concentration substantially change for either seed market, which implies that the measurement of minimum setup costs is not driving the endogenous R&D investment results.

Finally, we consider an alternate specification to permit a nonlinear relationship between the lower bound to R&D concentration and submarket size. These results, presented in the “Quadratic Market Size” columns in Table 4, confirm the estimates of GM soybean seeds being characterized by endogenous R&D costs. Concentration levels within an infinitely-large market remain comparable across models. However, the parameter estimates for GM cotton seed in the quadratic model imply that the fitted lower bound is decreasing in submarket size, which is consistent with exogenous R&D costs.

4.2 Impacts of Mergers and Acquisitions

In order to examine the impact of M&A upon R&D investment and concentration, we adjust the FTA data to account for changes in ownership of intellectual property. If M&A activity results in intellectual property assets becoming more concentrated among a small number of firms, it is possible that the lower bound to R&D concentration increases not due to the presence of endogenous R&D costs, but instead due to this consolidation activity. We utilize company histories and Lexis-Nexis news releases in order to identify M&A activity and the effective merger date in order to construct a measure of R&D concentration that accounts for ownership changes. Although completed independently, our list of changes in corporate ownership corresponds to the activity reported in Fuglie et al. (2011).

The estimations of the lower bounds to R&D concentration adjusted for M&A activity are reported for cotton and soybean seeds in Table 5. We continue to reject the null hypothesis of exogenous R&D costs for GM cotton and soybean seed markets, as the estimated parameters imply a theoretical lower bound to R&D concentration of 0.411 for cotton seed and 0.509 for soybean seed; both are significantly different from zero. However, as is illustrated in Figs. 7 and 8 for cotton and soybean seed, respectively, the fitted lower bounds to R&D concentration after accounting for M&A activity is decreasing in market size. This inverse relationship is a necessary, but not sufficient, condition for the markets to be characterized by exogenous fixed cost investments in R&D.

Table 5 Lower Bound Estimations for GM Cotton and Soybean Seeds (Mergers and Acquisitions Adjusted)
Fig. 7
figure 7

Lower Bound Estimations for R&D Concentration in GM Cotton Seed Adjusted for Mergers and Acquisitions

Fig. 8
figure 8

Lower Bound Estimations for R&D Concentration in GM Soybean Seed Adjusted for Mergers and Acquisitions

Equivalence tests between the parameter estimates with and without adjusting for changes in intellectual property ownership are reported in Table 6. We find that accounting for M&A activity significantly decreases the fitted values for both the intercept and the relationship between adjusted market size and concentration in both cotton and soybean seed markets, which run contrary to the results for corn seed from Anderson and Sheldon (2017). Whereas the latter implies a downward-sloping relationship between market size and R&D concentration, the former implies that the theoretical lower bound to R&D concentration inclusive of M&A activity is lower than it is without accounting for this consolidation. Figures 7 and 8 reveal that the decrease in the theoretical bound is more consistent with the observed concentration levels such that the estimations that account for M&A activity are likely to be a better description of the relationship between market size and R&D concentration.

Table 6 Impact of Mergers and Acquisitions upon R&D Concentration

Additional inspection of Figs. 7 and 8 reveals that the smaller- and medium-sized submarkets were more likely to be affected—in terms of the concentration of R&D activity—by the M&A activity prior to 2010. The increase more closely aligned the levels of R&D concentration in these submarkets to those in the largest submarkets. These results imply that M&A activity had minimal effect upon the largest submarkets, but may have adversely affected the competitive fringe of firms that engaged in R&D in the smaller submarkets.

Although they are not addressed directly in our estimations, the recent wave of mergers and acquisitions in the seed and agrochemical industries—the merger between Dow and DuPont (completed in September 2017); the acquisition of Syngenta by ChemChina (completed in June 2017); and the acquisition of Monsanto by Bayer (completed in June 2018)—are worth considering in light of our econometric results. It should first be noted that ChemChina’s acquisition of Syngenta would not affect our results since ChemChina was not engaged in releasing GM varieties for field trials in the U.S. during this time period.

Second, the U.S. Department of Justice (DOJ) Antitrust Division did not raise concerns with regard to the Dow-DuPont merger with respect to their assets in seed markets in the U.S. Although DuPont conducted a small number of field trials in cotton seed in the early years that followed the introduction of GM varieties, accounting for their merger with Dow would affect only the first time period (1988–1995) and would not change the R&D concentration ratio of the market-leading research firm in any submarket. For GM soybean varieties, the “western core” submarket between 1988 and 1995 would experience an increase in R&D concentration for the leading firm by 2.2% from this merger, and the “mid-Atlantic” submarket in the 2006–2010 period would experience an increase of 15.9%. Given the small number of submarkets in which the single-firm R&D concentration ratio would change, it is unlikely that the estimated lower bounds to R&D concentration in soybean seeds would be affected by the Dow-DuPont merger in 2017.

On the other hand, the acquisition of Monsanto by Bayer raised concerns in the DOJ with respect to both cotton and soybean seeds. As is summarized in MacDonald (2019), Bayer was required to divest its soybean seed business and intellectual property assets—as well as the majority of its cotton seed business—to BASF in order to gain the DOJ’s approval.

If these divestments had not been required, and we apply the acquisition retroactively to field trial applications, we would observe the single-firm R&D concentration ratio’s increasing across cotton submarkets for both the 2001–2005 and 2006–2010 sample periods. Although some of these increases were nominal, the largest submarket for cotton seed (Texas) would have experienced an increase in its single-firm R&D concentration ratio from 48.6% in 2001–2005 (52.5% in 2006–2010) to 69.2% (86.9%) following the acquisition. Similar increases in concentration would be observed in the southeastern, Delta, western, and mid-Atlantic submarkets. Although the increase in soybean seed R&D concentration would not have been as substantial, it would have been concentrated in the largest submarkets.

The completed divestiture to BASF—a firm without a substantial R&D presence in these crop varieties—implies that the estimated lower bounds to R&D concentration that accounted for both the acquisition and divestment would not be significantly different from those that are reported here.

5 Conclusion

Using geographic variation in R&D activity, we analyze the markets for genetically modified (GM) cotton and soybean seed varieties in order to determine whether increases in market size leads to additional firm entry or if existing firms can preclude additional entry by escalating R&D in order to improve product quality. With the use of field trial applications (FTA) data—field trials are an intermediate stage in the R&D process for GM crops—we estimate the lower bounds to R&D concentration for cotton and soybean seeds. Our results imply that both GM cotton and soybean markets are characterized by endogenous R&D investments; our results are robust to alternate definitions of market size and setup costs. Accounting for merger and acquisition activity—which is argued to increase the concentration of ownership of intellectual property assets—increases the fitted lower bounds to R&D concentration for both cotton and soybeans seeds as expected, but reduces the theoretical lower bounds. These results suggest that the consolidation that occurred prior to 2010 was consistent with the empirical predictions of concentration in R&D activity.

Our empirical results are of interest in the face of continuing concerns with regard to concentration in agricultural inputs—especially in light of the recent acquisition of Monsanto by Bayer and the merger between Dow Chemical and DuPont. Although we do not estimate the effects of this consolidation directly, an examination of the underlying concentration data reveals that—absent the divestment of Bayer’s cotton and soybean seed businesses and assets to BASF—the observed R&D concentration following Bayer’s acquisition of Monsanto would have been substantially higher in the largest submarkets. Conversely, the merger between Dow and DuPont—which were the second and third largest firms in terms of field trials between 2006 and 2010—would not affect our lower bound estimations as their combined market share of R&D activity remained below that of Monsanto (Bayer).

This assessment of the likely effects of the recently completed M&As in agrochemical and seed industries does not consider the implications for the global agricultural input competitive landscape. Although the European Union (Coublucq, Kovo, and Valletti, 2023) and Brazil (Lenzi, 2023) also imposed structural remedies, in order for the recent consolidation to be approved, the effects of this consolidation are likely to be felt in other international markets for GM cotton and soybean seeds—especially as China relaxes its policies towards GM crops and products.

A key identifying assumption of our empirical model is that the R&D investment that is associated with the field trials that are conducted within the U.S. (and that are represented by the FTA) can be recouped solely from the U.S. market. If the R&D investments from the largest submarkets are recouped across a larger international market, the results that identify R&D investments as being endogenous would be unaffected. The major caveat would be if R&D investments that are made in smaller, less concentrated markets are recouped internationally such that the predicted lower bound to R&D concentration “switches” from increasing to decreasing such that we would be unable to reject R&D costs as exogenous. Given the regulatory hurdles that exist to gain approval across different countries—such that the minimum setup costs are also likely to be higher than measured—this scenario is unlikely.