Introduction

It is widely believed that efficient regular production systems (e.g. firms or economies) are characterised by the division of labour. Indeed, this principle is easily communicated when looking at the alternative: a subsistence economy, where each person produces the goods they need without trading. Although the absence of trade reduces transaction costs, these gains are certainly outweighed by the losses from not reaping the benefits of specialisation; in particular, learning effects and accumulation.

Although this principle seems natural for the production of regular goods and services, there is little agreement whether this insight can be simply transferred to the production of “scientific goods”. There are many arguments to support the scepticism. Some of them are philosophical (for example, the postulated “unity of research and teaching” by Wilhelm von Humboldt, one of the most important university reformers at the beginning of the nineteenth century in Germany), while some are at least implicitly based on efficiency arguments. For example, it is sometimes assumed that economies of scope exist (Johnes 1997; Cohn et al. 1989) which would render specialisation inefficient.

Many authors have since then analysed the question of increasing returns to scale (IRS) and scope in science (among many others Lloyd et al. 1993; Johnes 1997; de Groot et al. 1991; Dundar and Lewis 1995). Although the results with respect to economies of scope are mixed, the majority of analyses, at least at the level of universities, demonstrate the existence of IRS. However, these analyses suffer from two limitations. First, most of these studies use aggregated university-level data. Second, the methodologies employed build on the estimation of parametrically-specified cost-functions, which imply not only potentially restrictive functional form assumptions, but also assume away the possibility that some research units may be inefficient. This may lead to estimation biases.

So far only a few papers have analysed the production returns at lower levels of aggregation. One example is the paper by Bonaccorsi and Daraio (2005), who use local regression techniques and do not find clear-cut evidence of IRS. Brandt and Schubert (2013) use parametric regression techniques and find increasing returns on the university level and decreasing to constant returns on the level of the research group with respect to publication output. This analysis is particularly interesting because it tries to disentangle the production returns on a micro and a macro level of analysis. However, their approach is limited not only by the parametric specification, but also by the assumption that scientific output is unidimensional taking publications as an adequate proxy.

The latter two problems can be avoided by the use of non-parametric envelopment estimators. Further, we will show that creative use of these estimators can identify returns to scale characteristics of the production function. Therefore, in this paper, we integrate the neoclassical or parametric understanding of scale economies into a non-parametric framework of efficiency estimation. We analyse whether IRS exist or not.

If IRS are present, two implications for efficient organisation follow. First, (at least some) research groups would benefit from an increase in size. Second, the specialisation of research groups in certain tasks becomes desirable, because a division of labour leads to increased output of the overall system.

The remainder of this article is organised as follows: in “The nature of scientific production” section we review the literature on returns to scale in science. We then show under which conditions specialisation is an optimal strategy. “Methodology” section describes our methodology. “Results” section presents the estimation results concerning the returns to scale of scientific production functions. “Conclusions and limitations” section concludes.

The nature of scientific production

Returns to scale in scientific production

The literature on returns to scale is largely based on aggregated university level data, where most authors find (at least for some outputs) IRS. This means that groups with IRS can produce disproportionately more if their inputs are increased.

For example, Worthington and Higgs (2011) find ray economies of scale up to 120 % of the mean in a multi-input, multi-output setting. Comparable results are found by de Groot et al. (1991), Sav (2004), Laband and Lentz (2003), Johnes et al. (2008), as well as Koshal and Koshal (1995). Glass et al. (1995a, b) observe ray economies, but also find product-specific economies of scale for undergraduate teaching. Johnes (1999) and Izadi et al. (2002) do not detect ray economies of scale but product-specific economies of scale for undergraduate teaching, postgraduate teaching and research. This is in line with Brandt’s and Schubert’s (2013) result that IRS exist at the level of the university.

However, the latter also analysed the returns to scale at the level of the research group and find no evidence of IRS. Indeed the results on the micro level are less clear, probably also due to the divergence in methods and datasets, where analyses are necessarily based on survey data or case studies.

van Tunzelman et al. (2003), who reviewed the existing literature on the level of research groups for size effects on research group productivity, conclude as follows: Evidence across different studies indicates that there appears to be a critical mass threshold for group size, at least in some scientific fields, which hovers around six to eight people. This ‘critical mass’ threshold may differ among major subject fields, as individual studies show, but no comprehensive picture has emerged so far. A study by Carayol and Matt (2004), focusing on 80 laboratories of the Louis Pasteur University, comes to similar conclusions. With regard to the relationship between research group size and the size of the respective department, empirical findings indicate that research groups of sufficient size are able to function well regardless of the size of the department or the university they are affiliated with (van Tunzelman et al. 2003). This latter result, in our terms, indicates the absence of agglomeration effects, while there may be IRS for very low levels of input that turn into decreasing returns to scale (DRS), if inputs increase. The latter argument would, for example, result from an s-curved cost function. This is congruent with Johnston (1994) to some extent, who, on the level of universities, finds economies of scale for low output levels and diseconomies of scale for high output levels. The results are mixed, at this level in particular. Adams and Griliches (2000) find constant returns to scale (CRS), which implies that size does not matter at all. The same conclusion is drawn by Narin and Hamilton (1996) and by Bonaccorsi and Daraio (2005) for Italian CNR units.

However, all of the analyses were based either on a cost function or a production function approach. This has the advantage that relatively simple regression techniques can be employed, but the disadvantage that these techniques suffer from parametric assumptions (with the exception of Bonaccorsi and Daraio (2005), who use local regression techniques) and eliminate any potential inefficiency that the groups might exhibit (with the exception of Johnes (1999) and Izadi et al. (2002), who use parametric frontier estimation). Both assumptions can lead to severe estimation bias if they are not true. We therefore reinvestigate the topic of IRS at the research group level using more flexible non-parametric frontier estimators, which allow the research groups to display inefficiency in their use of resources.

As argued, the importance of IRS in scientific production derives from two aspects. First, and relatively obviously, groups with IRS can produce disproportionately more if their inputs are increased. More precisely, if all inputs are increased by a factor λ > 1, then the outputs increase by a factor δ > λ (cf. Brandt and Schubert 2013 for details). Thus, such groups would benefit from growth. More subtly, if IRS exist among research groups (and total inputs over all groups are fixed), the aggregate outputs of all research groups would increase if they specialised in certain outputs, implying, for example, that there are graduate teaching- or publication- or transfer-oriented research groups. Without going into too much technical detail, we will explain this latter point using an illustrative example.

Specialisation in science

Many authors highlight that scientific production is a process in which manifold inputs (e.g. capital equipment, trained scientists, etc.) are transformed into various outputs (e.g. publications, patents, knowledge transfer, etc.) (Rousseau and Rousseau 1997; Nagpaul and Roy 2003; Warning 2004; Johnes 2006). This is corroborated for example in Jansen et al. (2007), Schmoch et al. (2010) and Schubert (2009), who show that distinct profiles of production are present in scientific production. In particular, these authors find that research groups fall into characteristic clusters which focus on publication, graduate teaching or transfer activities, which closely resemble the three missions of universities. Thus, from a descriptive point of view, scientific research groups tend to specialise in certain activities.

From a normative point of view, of course, the question is whether specialisation is a desirable feature in the sense that it makes the best possible use of the available resources. It turns out that the question of optimality is closely linked to the characteristics of the returns to scale.

To enhance the understanding of this question of specialisation, we present a very simple illustrative framework consisting of just two scientific units which have one input and can use it to produce either of two outputs. Each output is produced according to the same production function. The units are identical in their technology and input equipment. The question asked is the following: Should the first unit, UNIT 1, produce Output 1 but not Output 2 (and UNIT 2 vice versa), or should each unit produce a bit of both, i.e. should they specialise or not?

Looking at Fig. 2, the answer to the question of the optimality of specialisation crucially depends on the shape of the production function. Here, it is assumed that the production functions display IRS (they are bowed towards the y-axis). In this case, increasing the input by a constant factor increases the output by an amount strictly higher than that factor (see also “Non-parametric efficiency estimators” section). For example, if it is assumed that UNIT 1 and UNIT 2 decide to spend 50 % of their input on producing Output 1 and 50 % on producing Output 2, then the aggregate production of Output 1 and 2 are \( Y_{11}^{US} + Y_{12}^{US} \) and \( Y_{21}^{US} + Y_{22}^{US} \). If, instead, UNIT 1 specialises in Output 1 and UNIT 2 in Output 2, then the aggregate production would be\( Y_{11}^{S} \) for Output 1 and \( Y_{22}^{S} \) for Output 2. Since the production functions are convex, we have \( Y_{11}^{US} + Y_{12}^{US} < Y_{11}^{S} \) and \( Y_{21}^{US} + Y_{22}^{US} < Y_{22}^{S} \). Therefore, specialisation would increase aggregate output. Obviously, if the production functions were concave, then specialisation would be uniformly detrimental calling for a “generalist” strategy. If the production functions were linear, then the specialisation strategy would not have any effect.

This illustrative model is obviously quite simplified. But it should be clear that the results are fairly robust with respect to several generalisations. First, if the production technologies used by each unit are not identical, then this affects only the question of who should specialise in which dimension (but not whether they should specialise at all). Second, it is unimportant if only two or a multitude of units exists. Third, it is unimportant if we have two or more outputs. Fourth, it does not matter if the units have identical input endowment.Footnote 1

To summarise, units should specialise in those outputs where their production functions exhibit IRS. With CRS, any specialisation strategy is equally efficient, while decreasing returns argue against specialisation. In “Results” section we will show that the production functions are characterised either by increasing returns or CRS. Before we do so, we outline the theory about returns to scale in a neoclassical and non-parametric understanding, where the focus will be on the neoclassical one because of the results of this subsection.

The hypotheses

The guiding question of this paper is whether the scientific production of research groups is characterised by IRS in the neoclassical sense. This is the case when the production possibility set is non-convex.Footnote 2 Because it was concluded in “Specialisation in science” section that scientific units should pursue a specialisation strategy if there are IRS in at least some regions and at least CRS everywhere else, a sufficient condition is that the production possibility set is strictly non-convex at some points and weakly non-convex anywhere else. We thus hypothesise that:

H0 (decreasing or constant returns to scale everywhere)

The production set is weakly convex everywhere.

H1 (increasing returns somewhere and constant returns everywhere else)

The production set is strictly non-convex at least at some points and weakly non-convex everywhere else.

Methodology

If H1 is true, IRS exist. We test this hypothesis using non-parametric efficiency estimators. These are explained below.

Non-parametric efficiency estimators

The frontier model is explained in the following using the case with one input and one output. This is for expositional reasons only, since the general frontier model can handle technology frontiers with arbitrary dimensionality. One of the most prominent estimators is the data envelopment estimator (DEA), which was originally proposed by Charnes et al. (1978). The major limitation of this estimator is that it can only deal with situations where the production function is either characterised by CRS. A production function that is either CRS or DRS can also be said to exhibit non-increasing returns to scale (NIRS). If the production function exhibits IRS as in Fig. 1, the DEA estimator is biased. Then the free disposal hull estimator (FDH) must be used. The latter estimator is less efficient but unbiased irrespective of the returns to scale. This property allows us to use it to test our hypotheses H0 versus H1.

Fig. 1
figure 1

Specialisation benefits and curvature of the production function

We now introduce the DEA estimator alongside the general frontier model. We leave out mathematical formulations as much as possible. DEA is the most commonly used estimator in non-parametric efficiency estimation. The idea behind efficiency analysis is that a so-called decision-making unit (firms, persons, regions, or research groups) commands a certain set of inputs to produce certain outputs. The efficiency model assumes that, given a certain amount of input, there is a technological limit to the production of outputs, i.e. a unit cannot produce more than this output. The union of all these maximum points that correspond to a specific input amount is called the theoretical frontier. Units falling short of this theoretical frontier are inefficient. Inefficiency is usually quantified by radial measures. If the theoretical frontier was always observed, the estimation of inefficiency would be a trivial task. However, this is usually not the case. Instead, only a given number of units are observed for which we have sample values of inputs and outputs. Using these data points, DEA is one way of estimating the theoretical frontier from the observed data.

In particular, DEA constructs the estimated frontier as the smallest convex hull “enveloping” all the data points in a sample. In fact, there are several variants which will be explained later on but, for expositional reasons, we start with the so-called VRS DEA frontier (bold line). Consider the one-input one-output case depicted in the Fig. 2, where the true frontier is given as \( y = \sqrt x \). x is taken to be non-random, and inefficiency is generated by \( y^{\text{obs}} = y \cdot \exp \left(- {\left| u \right|} \right) \) with \( u \sim N(0,1) \). The small circles mark observed sample coordinates for the units. The smallest convex hull that envelops all the data points is the DEA frontier. Obviously, the DEA frontier does not coincide with the theoretical frontier, but if more and more units are observed, the DEA frontier will converge to it (Kneip et al. 1998). Using this estimated frontier, it is easy to define a measure of inefficiency. With regard to input, this is simply the used input divided by the input needed to provide this output level. Looking at the figure given below, and focusing on the inefficient point D, this is given by the ratio of the length of line segment AD divided by the length of the line segment AC, whereas the true but unobserved inefficiency measure is \( \left| {AD} \right|/\left| {AC} \right| \).

Fig. 2
figure 2

Efficiency and productivity estimation in frontier models

Three things are important to note. First, this input-inefficiency measure may take values of 1 and above. Second, a value of 1 indicates that the unit is efficient, because then the unit is on the frontier. Any value >1 indicates inefficiency. Thus low values are desirable. Third, note that DEA also works for the multiple-input, multiple-output case. The interpretation of the measures remains the same.

Indeed there is not only one DEA estimator but several variants. The estimator represented by the bold line in Fig. 2 is called the VRS estimator. The VRS estimator is the most general estimator. The most restricted estimator is the CRS estimator, which is just a straight line passing through the origin and most outward observation. In Fig. 2 it is indicated by the dotted line. It is also clear that it largely overestimates the true frontier if this exhibits IRS as in our case.

The NDRS estimator (dash-dotted line) is like the VRS estimator but additionally includes the origin. Effectively this makes it a compromise between the VRS and the CRS estimators. It is identical to the latter up to the first observation on the frontier defining the CRS frontier and beyond that is identical to the VRS frontier.

Testing for IRS using frontier estimators

All DEA estimators are too restricted to deal with IRS production functions, because the production possibility set must be convex for DEA to be applicable. The more general estimator which can also handle non-convex production possibility sets is the FDH estimator. Now, under H0, both the FDH and the NIRS-DEA estimator are consistent, while only the FDH estimator is consistent under H1. This implies that, under H0, the two estimators should not differ markedly, while, if H1 were true, they would. Thus our hypothesis from “The hypotheses” section implies:

H0a

Both the FDH estimator and the NIRS-DEA estimator are consistent.

H1a

Only the FDH estimator is consistent.

If H1a is corroborated, then the production function displays IRS at least somewhere. However, this does not mean that it has IRS everywhere. For example, it may also be characterised by DRS in other regions. This would be the case, if the production function had IRS for low output levels and DRS for high output levels as claimed, for example, by Johnston (1994). Under these circumstances, the finding that there are regions with IRS would imply practically nothing with respect to the optimal size of the research groups or their specialisation. We thus have to rule out that there are regions with DRS. If this was true as well, then the production function would either be characterised by IRS or at least CRS.

Therefore, in the parlance of frontier estimation, we have to additionally show that the production possibility is weakly non-convex (CRS) in the regions where it is not strictly non-convex (IRS), i.e. the efficient boundary is described by CRSs, in which case specialisation is at least not detrimental:

H0b

Both the CRS–DEA estimator and the NIRS–DEA estimator are consistent.

H1b

Only the NIRS–DEA estimator is consistent.

In summary, based on H0a–H1b, we can distinguish four cases. First, when H0a is rejected and H0b is not rejected, then the production possibility set is strictly non-convex at least somewhere, but there is no evidence that it is strictly convex in other places. Consequently, specialisation will increase overall output. Second, when neither H0a nor H0b are rejected, there is no evidence to refute the hypothesis that the efficient boundary exhibits CRS everywhere. Both the degree of specialisation and group size are irrelevant. Third, when H0a is not rejected but H0b is, then there is evidence that the production possibility set is strictly convex at least somewhere and exhibits CRS elsewhere. In this case, specialisation strategies lower overall output. Fourth, if H0a and H0b are rejected, there is evidence that production possibility sets have regions where they are strictly concave and others where they are strictly convex. In this case, general recommendations cannot be derived with respect to size and specialisation.

The data

In this analysis we use original data from a large online survey (data from 2007) conducted as part of a research project funded by the German Research Association (DFG). The sample consists of 473 research units from the disciplinary fields of astrophysics, nanotechnology, biotechnology, and economics. This corresponds to a return rate of ~25 %, as 1908 research units received a questionnaire. This selection of fields guaranteed the inclusion of basic research fields from the natural sciences (astrophysics), applied disciplines from the natural sciences (biotechnology and nanotechnology), and a field which has both applied and basic research characteristics from the social sciences (economics). Astrophysics makes up about 7 % of the sample, nanotechnology about 42 %, biotechnology 22 % and economics 29 %, which correspond roughly to the shares in the population. The main objective of the survey was to determine the effects of different university governance models on research efficiency. Because the low sample share of astrophysics units does not allow reliable estimation, we excluded this group from our analysis.

The survey includes information on the inputs and outputs of a research group, as well as its organisational or governance setting. This paper focuses on the input and output data.

Against the background of the multidimensionality of outputs, we collected a variety of different activity indicators, which we regarded as scientific outputs. In total, we collected the following 11 measures: SCI publications per scientist, citations per publication, conference articles per scientist, fraction of international co-publications, professorial job offers per scientist, expert reports for companies per scientist, cooperation with companies per scientist, membership in advisory boards per scientist, number of doctoral theses per scientist, number of state doctoral theses per scientist, editorships per scientist.Footnote 3

Definition of the production possibility set

As argued by Jansen et al. (2007), Schmoch et al. (2010), Schmoch and Schubert (2009), and Schubert (2009) scientific outputs should at least consider the dimensions of knowledge generation, graduate teaching, and knowledge transfer.

We assume that each of these dimensions can be appropriately represented by a single indicator. Specifically, the following indicators were chosen: number of SCI (bio and nanotech) or SCOPUS (economics) publications (knowledge generation), number of completed doctoral theses (graduate teaching), and number of advisory services for companies plus cooperation with companies (knowledge transfer). It was also assumed that the only relevant input is the number of scientists. Thus our production possibility set is four dimensional (1 input and 3 outputs).

The major tenet of non-parametric efficiency estimation is that these estimators can easily deal with multidimensionality, because the assumed production possibility set is not restricted with respect to the included inputs and outputs. However, for us there are at least two relevant problems concerning the class of estimators.

Firstly, because non-parametric estimators make few assumptions, which could help identification, they suffer from the curse of dimensionality. This means that convergence becomes very slow as the dimensionality of the production possibility set increases. Thus, if we include multiple outputs, we also lower the precision of estimation drastically. To deal with this, apart from an analysis where all the outputs (see next section) are considered simultaneously, we also run the analyses for each dimension separately. This also provides a more detailed picture of the production possibility set.

Secondly, non-parametric efficiency estimators are very sensitive to measurement error, outliers, and model specification. Since the general classification of outputs into knowledge generation, graduate teaching, and knowledge transfer (see “Specialisation in science” section) provides only some insights into a sensible choice of indicators, but is far from being a clear-cut definition, we used two alternative sets of indicators.

This alternative includes the number of citations of the SCOPUS/SCI/SSCI publications, the number of completed habilitation theses, and the number of memberships in scientific advisory boards, where knowledge transfer more commonly refers to policy making than to industry.

Even though the first set is a better choice, at least in the author’s opinion, if the results stemming from the second are not too different, then the results can be deemed quite robust.

In order to enhance the readability of this paper, we discuss the exact testing procedures in the Appendix. The methodologies build on complex bootstrap algorithms and have been partly developed and described in Simar and Wilson (2001, 2002). We now turn to the results.

Results

Some summary statistics for the core variables are presented in Table 1.

Table 1 Core input and output variables

The estimation results are presented in Table 2. We see that H0a is rejected and H0b is not rejected for every discipline. This conclusion also holds with the alternative operationalisation of the input–output set. Thus, we can be more confident that our results are not only due to the specific model specification and should also hold in a wider context.

Table 2 Test results of the shape of the production set (full output list)

According to Table 3, this means that the production possibility set is strictly non-convex somewhere and weakly non-convex elsewhere; or in the language of economics, the efficient boundary exhibits IRS somewhere and at least CRS elsewhere. Therefore, specialisation will increase overall output. It also means that larger research groups can make better use of their resources.

Table 3 Case definition and recommendations

When separating this analysis by dimension (Tables 4, 5), and taking a closer look at our primary definition, we find a comparable structure, especially in the case of transfer (H0a is rejected, but H0b is not). This also holds for graduate teaching, except for the case of biotechnology, where H0a is not rejected. In knowledge generation, H0a is rejected only for biotechnology but not for the other disciplines. The latter finding is in line with the results obtained in Brandt and Schubert (2013), who showed that, in the case of knowledge generation as measured by publications, IRS cannot be detected on the group level.

Table 4 Results of the shape of the production set for H0a compared with H1a (by output dimension)
Table 5 Results of the shape of the production set for H0b compared with H1b (by output dimension)

Looking at the interesting cases where H0a is not rejected (remember that this implies there is no evidence for regions along the efficient boundary which exhibit IRS), Table 4 indicates that there is also no evidence for DRS. This means that, although specialisation will not increase output, it will at least not decrease it.

However, using the second definition to test for robustness, we also see some differences in Table 4. For example, in the case of economics, we can no longer detect IRS, even though we found them in the full model. This may be partly due to the fact that Table 2 presents a joint test on all the output dimensions, while Table 4 shows three separated tests and therefore has lower power. Further, the chosen indicators on graduate teaching and knowledge transfer are probably not a very good choice, because they are comparatively rare events. Looking at the other two research fields as well, we find that, although IRS are found for the other disciplines, the dimensions along which they occur are not always the same as in the tests based on the primary definitions.

Yet, and this should be stressed, the general result that there are locally IRS but no evidence for locally decreasing returns remains the same. So, again, we observe the robustness of this general result.

Summarising the results from Tables 4 and 5, we find cases where the efficient boundary exhibits IRS at least somewhere and CRS elsewhere, which calls for specialisation. We also have cases where the efficient boundary is characterised by CRS everywhere, which implies that specialisation is at least not detrimental. Thus, for the analysed research fields, we find strong evidence for the existence of locally IRS, while we cannot detect locally DRS. Taken together, this calls for the increased specialisation of scientific research groups.

Conclusions and limitations

This paper deals with the optimality of specialisation in science. The empirical methodology used is much more robust than in older research in this field because no parametric production function was “forced upon” the data. Instead, flexible non-parametric estimation techniques were used. To make this approach feasible, it was necessary to integrate the parametric notion of returns to scale and that of non-parametric efficiency analysis. By doing so, it could be shown that the economic intuition which calls for a division of labour can be transferred from the traditional production of goods and services to scientific goods as well. Specialisation in science will increase aggregate output. We also found that larger research groups will make more efficient use of their resources. In terms of policy recommendations, this paper suggests that specialisation should be increased and that resources should be concentrated in larger research groups. This could be achieved, for example, by creating appropriate incentives in the resource allocation system (e.g. indicator-based funding schemes).

Two qualifiers are necessary with respect to the major implications. First, this result only holds for graduate teaching and transfer. It does not carry over to knowledge generation, which is in line with what has been found in earlier studies (cf. Brandt and Schubert 2013). Thus the general claim of the increased efficiency of larger units only holds for a subset of the activities. Second, the observation that there are IRS might be due to reverse causality for the following reason: typically better units are more able to attract new funding and, inherently, more efficient firms might be more likely to grow. Thus, unobserved heterogeneity could be the driver of this result, rather than actual IRS. While this is definitely a limitation of this study, it should be a problem of small samples where there are groups which are efficient and small. This will be true for example, when the groups are still young, implying that they have not yet grown strongly. Since there are few relatively young units in the sample, it seems reasonable to assume that reverse causality issues have limited impact. Nonetheless, it would only be possible to fully control for this effect if panel data are used to capture the full dynamics. Since the dataset used here is a purely cross-sectional one, this remains an issue for future research.