Introduction

An answer to the question of which properties of returns to scale the research and/or teaching technology of universities or university departments feature, is essential for determining an optimal institution size. Such an optimum exists, for example, when returns to scale increase with a growing decision making unit (DMU) size at first and then decrease later. On the minimum efficient scale, the average cost per unit of scientific benefit is minimal (Gutenberg 1983). However, an unambiguous optimum does not always exist, because for example, all DMU sizes are optimal in the event of constant returns to scale. In such cases, the average cost of scientific benefit with constant factor-input ratios and prices is independent of the size of a DMU. That means, for example, that the choice of the particular departmental size is irrelevant to research efficiency in this case, which is different from the case of an unambiguous minimum efficient scale.

Basically, there are plausible arguments for any type of returns to scale in universities or university departments (e.g. Cohen 1980; Kyvik 1995; Fandel 2007). Reasons for increasing returns to scale are, for example, synergy effects induced by economies of scale and scope. Such effects can be explained by potential cooperation and joint research projects as well as more evenly allocated teaching assignments and administrative tasks. Furthermore, there might be a minimum level of successful university research that consists of an adequate number of university departments or corresponding professorships accompanied by a particular infrastructure of personnel and equipment as well as a certain number of students. In contrast, decreasing returns to scale are also conceivable and, for instance, justifiable due to innovation-adverse regulations and routines as well as increased coordination requirements.

Findings of empirical studies that examine correlations between the size of universities or departments on the one hand and research productivity on the other hand do not render unambiguous results with regard to different countries and disciplines (e.g. Johnston 1994; von Tunzelmann et al. 2003). Rather, the authors observe increasing, decreasing, and constant returns to scale. In this respect, Bonaccorsi et al. (2006, p. 393) state that “there is lack of consensus on the existence of economies of scale in scientific production and higher education”. Because of the heterogeneity of individual scientific fields, investigations of returns to scale are supposed to be carried out merely for specific scientific disciplines (Abramo et al. 2012). Due to varying organizational structures and research objectives of universities in different countries, such analyses can still only be properly interpreted for institutions within individual countries.

From the point of view of a German Business Administration department or rather of a German business school (hereafter: BuS), Dyckhoff et al. (2009) examine the relation between the size of a BuS and its research output. They show that a BuS comprised of between 8 and 16 professors exhibits approximately constant returns to scale with regard to its research production. Due to the limited number of BuSs with more than 16 professorships, however, the authors are not able to derive results for that respective cluster. They base their study on research data originating from the Centre for Higher Education (CHE) acquired in 2004; i.e. the respective CHE output and input indicators refer to the years 2001 up until 2003.

The non-profit organization CHE conducts comprehensive evaluations of academic research in Germany that differentiates between specific academic disciplines. It acquires and evaluates comprehensive data on German universities’ research and teaching activities. Almost every academic discipline is analysed in a 3-year-cycle, in order to create transparency for interested stakeholders. Evaluation results are published on the CHE’s website as well as through its media partner, the German magazine Die Zeit. CHE data acquisition and evaluation of business research attract great attention in the German business academic community. Especially German governmental bodies and university managements utilize the CHE data and results in order to evaluate research between different BuSs in a comparative manner. Based upon these assessments, several governmental institutions and universities derive objectives and performance agreements, thus co-opting the results of the CHE research ranking (Clermont and Dirksen 2015).

In the 2004 CHE datasets used by Dyckhoff et al. (2009), research effectiveness of a BuS is represented by three output criteria: the publication points of nationally visible publications, the number of PhD dissertations, and the amount of expended third-party funds. The research efficiency of a BuS is measured by the CHE insofar as each of these output criteria is proportioned to a staff assignment quantity. Since 2004, the CHE data acquisition for BuS research evaluation has been conducted two more times (2007 and 2010), leading to datasets covering the (consecutive) period from 2001 up until 2009. However, the CHE has enhanced the design and data acquisition of several research indicators over the course of time, in order to account more strongly for qualitative aspects of research performance. For instance, the CHE has integrated the number of international publications as an additional output indicator.

Accordingly, the CHE provides an extensive, regularly acquired and enhanced database that is also recognized by the relevant scientific community. As a result, empirical analyses based on these datasets are appropriate. With respect to the analysis of returns to scale, better and more extensive datasets in relation to the research activities of BuSs in Germany are now available through the 2007 and 2010 data acquisitions. This means that analyses regarding properties of returns to scale of German BuSs are now feasible over a consecutive period of 9 years. Thus, from an empirical perspective, the question arises as to whether Dyckhoff et al.’s results of constant returns to scale of German BuS research production represent a robust observation with reference to alternative time periods and evaluation criteria; otherwise it might actually illustrate a one-time result, incurred by the specific design and procedure of data acquisition according to the chosen temporary reference period. Hence, the objective of this paper is to analyse the returns to scale of BuS research production at universities in Germany covering the time period from 2001 up until 2009 and to derive respective implications.

Our paper is structured as follows: In the next section, we provide a brief overview of the state of the art regarding empirical analyses of the properties of research production’s returns to scale. Subsequently, we introduce the conceptual design of our examination referring to both the used datasets and the applied methodology. Based upon the described method, we then present and interpret the resulting findings. In addition, we conduct comprehensive robustness and sensitivity analyses to ensure the validity of the inferred results. We conclude our paper by addressing implications and limitations as well as providing future prospects on further research questions.

State of the art

A first overview covering the results of international empirical studies that examine returns to scale in research is provided by the reviews of Cohen (1991), Johnston (1994) and von Tunzelmann et al. (2003). Cohen unconditionally perceives proportional relations between research group size and respective performance, while the other two reviews give more versatile, and even inconsistent, insights. In general, Johnston likewise observes a linear relationship between the size of scientific institutions and corresponding research performance. However, he gives substantial evidence of an optimal institution size existing (Johnston 1994, p. 32). In contrast, von Tunzelmann et al. (2003, p. 8ff) consider the existence of an optimal institution size in research for groups or teams, but not for (organizationally superior) departments or whole universities.

More current studies disclose heterogeneous results as well, even though most of them constitute a constant ratio between the used input and the produced output (as in Abbot and Doucouliagos 2003; Laband and Lentz 2003; Bonaccorsi and Daraio 2005; Bonaccorsi et al. 2006; Abramo et al. 2012, 2014). Compared to this, Longlong et al. (2009) derive different findings for returns to scale for Chinese universities. They observe either increasing or decreasing returns to scale between a university’s size and its research and teaching performances. This, in turn, implies that an optimal institution size of the evaluated universities exists. Van der Wal et al. (2009) and Brandt and Schubert (2013), however, only show decreasing returns to scale with reference to research groups.

With regard to the relation of BuS size in Germany and generated research productivity, so far there mainly exists the study by Dyckhoff et al. (2009). Using Data Envelopment Analysis (DEA), they construct and analyse the empirical production function—generated by the CHE research data of German BuSs between 2001 and 2003—especially with regard to the properties of returns to scale. In view of multi-dimensional total productivity measurements, their examinations demonstrate that BuSs employing between 8 and 16 professors produce research outputs with nearly constant returns to scale. A similar result had already been detected by Ahn et al. (2007), as a minor conclusion while conducting efficiency analyses of the prior CHE research data from 1998 up until 2000.

Altogether, the studies mentioned above mainly differ with respect to the way in which the evaluated organizational units are defined (individual scientists, research groups, departments or universities) and in the choice of indicators. Since we focus our examinations on the relation between the research productivity and the size of German BuSs, we evaluate and extend the results achieved by Dyckhoff et al. (2009). Due to changes in the German university system, it is questionable as to whether their results are valid for subsequent years, too. Since the CHE data acquisition in 2004, the Business Administration research landscape in Germany has changed decisively with regard to numerous influencing factors. First of all, owing to diverse funding programs originating from the federal government and state administrations—for instance, the so called “Excellence Initiative”—many BuSs have grown in size (measured by the number of employed scientists). Moreover, for young scientists, the raising of third-party funds and in particular the placing of articles in international top journals have become an increasingly necessary condition in order for them to be merely considered when they apply for vacant professorships (Schrader and Hennig-Thurau 2009, p. 202). Because of performance-based salary reforms in Germany and performance-based management by objectives, even for already appointed scientists, the pressure to raise funds and to publish research results is increasing (Bort and Schiller-Merkens 2010, p. 340). As the CHE has incorporated the developments outlined above regarding the design and data acquisition of research indicators, we have taken CHE data from 2004 up until 2009 as the basis for our study. In addition, we also use the (already analysed) data from 2001 up until 2003, in order to achieve robust findings with respect to a more comprehensive and consecutive time period.

From a methodological point of view, the previously outlined studies of the state of the art discuss and apply different approaches to examine the properties of returns to scale. What most studies have in common is that they utilize stochastic methods in order to deduce functional relationships between in- and output values of the underlying data (points)—usually by executing one-dimensional and/or multi-dimensional regression analyses. These approaches construe a function that is supposed to reflect the average transformation of inputs into outputs. Such methods, though, assume no inefficient universities to be existent within the dataset. However, it does not seem plausible that each (considered) university is producing in an efficient manner.

Assuming the generated data to be deterministic as well as comparable to each other, it is possible to analyse dominance relations between the inputs and outputs of the considered universities. This implies inefficient universities to be potentially existent in such a dataset. By definition, a production function only describes the efficient frontier of the underlying production technology (Farrell 1957). Because we focus on properties of returns to scale (with respect to the research production of German BuSs), the efficient frontier (of production)—or more precisely the production function—is our main subject matter. The DEA methodology identifies such a (best-practice) production function by generating an efficient frontier using the underlying input and output data.

In this regard, there are numerous articles in the scientific literature on efficiency analyses of universities that use DEA. A database query on the Web of Science produces 1674 articles in scientific journals based only on the search terms “Data Envelopment Analysis” or “DEA” in connection with “research efficiency” or “teaching efficiency” (as of November 25, 2014). In principle, such articles can be classified based on whether aspects or models of DEA are developed further, whereupon they are validated on the basis of application examples (e.g., Cook and Zhu 2007), or whether the empirical research question regarding the efficiency of universities is emphasized, which is determined on the basis of previously established DEA models (e.g., Fandel 2007). In regard to the empirical applications, the fundamental DEA models are frequently used (e.g., Ahn et al. 1988); however new models, such as DEA applications in hierarchical output structures, or Balanced DEA, are also used for analysing university performance (see Meng et al. 2008; Dyckhoff et al. 2013). Apart from model selection, the applications are also distinguished according to whether teaching (e.g., Johnes 2006; Ray and Jeon 2008), research (e.g., Johnes and Johnes 1993), or both components [integratively (e.g., Tomkins and Green 1988) as well as separately (e.g., Beasley 1995)] are taken into consideration. These applications are further distinguished with respect to the demarcation of the production system, which is to say whether the focal point is that of entire universities (e.g., Breu and Raab 1994) or departments (e.g., Madden et al. 1997; Kao and Hung 2008) or individual professorships or researchers (e.g., Gutierrez 2007). The input and output factors that are used in these articles also vary. In the context of research efficiency analysis, frequently publication and third-party factors are employed (e.g., Johnes and Johnes 1993; Abramo et al. 2011). What all these approaches have in common is that they determine and analyse efficiency scores of the university organizational units that they are investigating.

Analyses of properties of returns to scale with respect to research productivity using DEA, by contrast, is a divergent area of research that, to our knowledge, has only been addressed in a few articles. Given the fact that DEA has been established in efficiency analyses of universities, and Dyckhoff et al. (2009) demonstrated the expedience of DEA with respect to the analyses of properties of returns to scale of German BuSs, we have chosen to utilize this method in our paper. As the description of the methodological approach in the next section will make clear, although our research design differs from that of pure efficiency analyses, the analysis of properties of returns to scale may be helpful in conceptionally designing efficiency analyses using DEA. This is because in most DEA applications, the user determines ex-ante which properties of returns to scale the production system under consideration is featuring. An analysis of returns to scale done in advance could therefore deliver essential informational support for selecting relevant DEA models in following efficiency analyses.

Research design

Datasets: CHE research data on German BuSs

Before analysing the relation between the size of a BuS and its research productivity, first of all we imperatively have to define how the research production of a BuS is actually to be construed and operationalized. In general, quantitatively measurable indicators are used to describe research activities, especially bibliometric measures, such as the number of publications and citations.Footnote 1 As already introduced before, a comprehensive evaluation of BuS research production in Germany, and especially the associated acquisition of an approved and accepted database of the research production of German BuSs, is carried out by the non-profit organization CHE. To date, the CHE research data have been available for acquisition years 2001, 2004, 2007 and 2010. The respective data referring to a year of acquisition always correspond to the used inputs and achieved outputs of the preceding 3 years. This means that, e.g., the 2004 dataset represents research production for the years 2001 up until 2003.

Since the data acquisition in 2007, the CHE has operationalized research outputs by using the following four absolute indicatorsFootnote 2:

  • Publication points of nationally visible publications (PP-nat),

  • Number of internationally visible publications (Pub-int),

  • Number of PhD dissertations (PhD),

  • Amount of expended third-party funds (TPF).

The value of the indicator PP-nat is determined via publications of post-doctoral scientists (including professors) of each BuS. For each acquisition period, the BuSs forward lists of post-doctoral scientists’ names to the CHE. Thereupon, the CHE identifies the respective publications by querying the literature database WISO, which covers English as well as German literature of Economics and Business Administration. The identified publications are then converted into publication points by incorporating the number of authors, quantity of pages, and, in the case of journal articles, the perceived quality of the publication media—in reference to the journal-ranking JOURQUAL2 of the German Academic Association for Business Research (Schrader and Hennig-Thurau 2009).Footnote 3 Since the relevance of journal articles is steadily increasing in the Business Administration discipline in Germany, in 2007 the CHE integrated a new indicator, namely internationally visible publications. The value of this indicator is identified by means of a query on the literature database Web of Science (WoS), which covers scientific literature with a focus on English-language papers. The resulting number of publications, which is acquired by the bibliometric team at Forschungszentrum Jülich, is not converted into weighted publication points, though.Footnote 4 The amount of expended third-party funds during an evaluation period and the number of PhD dissertations per semester—covering the six semesters within the 3-year survey period—are acquired by surveying the BuS.

The validity of these preselected CHE output indicators depends on the primary objective of scientific research. This consists in the production of new, usually publicly available, knowledge about the world (Chalmers 1990, p. 23). The dissemination and discussion of this knowledge mainly takes place via written articles. Therefore, indicators based on these publications are generally accepted and well-established in research and practice, in order to evaluate the research of BuSs. By establishing a quality-weighting for journal articles depending on the reputation of the publication medium and by introducing an additional indicator Pub-int (from 2007 onwards), the CHE has furthermore taken additional qualitative aspects into account. Accordingly, publications (in terms of PP-nat and Pub-int) represent appropriate output measures of BuS research.

PhDs per se comply with the objective of educating and training young scientists. Hence, corresponding indicators are initially applicable in order to measure a desired secondary objective of scientific research. Since any PhD degree is inevitably interconnected with the generation and publication of new scientific findings through a published PhD dissertation, the number of PhD dissertations can indeed indicate certain achievements of objectives within the research activities of a BuS.

Compared to the already outlined indicators, the classification of expended third-party funds as a benefit category is less clear-cut, though. Whereas governmental research policy increasingly employs third-party funds as a performance indicator, the scientific literature controversially discusses their applicability for measuring research. This heterogeneous view results from the fact that third-party funds either constitute a resource effort and therefore an input, or they may be interpreted as a desired research achievement—representing a proxy indicator associated with the raising of funds or predictions of their use—and thus an output.Footnote 5 It is, then, subject to the respective decision-making context or the underlying objectives as to whether third-party funds are to be regarded as inputs or outputs in research productivity analyses.Footnote 6 However, due to their positive perception by governmental research policy in general, as well as by university administrations in particular, the consideration of TPF as a desired output indicator appears to be an acceptable research indicator.

Depending on the examined academic discipline, varying resources constitute the decisive inputs of research activities. But while for several disciplines, production factors such as machinery, laboratories etc. build crucial research inputs, Business Administration research only requires accordant facilities to some extent. Therefore, such factors probably have only little impact on the specific progression of the BuS research production’s efficient frontier.Footnote 7 In contrast, measures of staff assignment in terms of professors and research assistants and their intrinsic human resources represent the fundamental research inputs. As input factors (and in order to define the size of a BuS), we therefore use the number of (staffed) positions of BuS professors and research assistants. This information is obtained by the CHE directly from the BuSs. According to the CHE surveys, the BuSs were explicitly requested to enter such positions that were financed with state money and not with third party funding.

Consequently, it appears that the CHE indicator design appears quite reasonable from the perspective of the field of German Business Administration. That is also substantiated in the formation and consultation of a scientific advisory board made up of well-known expert German scientists in that particular field. The relevance and acceptance of the CHE data is also apparent in the appreciation and application of this data by German governmental bodies and university managements. The CHE methodology has also been given a positive assessment in the scientific literature (e.g. Tavenas 2004; Usher and Savino 2006; Marginson and van der Welde 2007). For instance, in an analysis by Stolz et al. (2010), examining 25 European data acquisition and subsequent appraisal methods for university evaluation, the CHE is rated best.

To summarize, the CHE has generated core competencies in the acquisition and evaluation of university research indicators since the early 2000s. With respect to the indicator Pub-int, with the bibliometric team at the Forschungszentrum Jülich, the CHE also has expert cooperation partners for matters pertaining to publication indicator design and acquisition from a scientometric perspective. Not least of all, the CHE data acquisitions are carried out continuously and will be continued into the future; this secures the maintenance of current datasets in order to ensure further empirical analyses in the future as well. Even though, there is criticism regarding the CHE indicator design or acquisition in the literature (in general, see, e.g., Frey 2007; Jarwal et al. 2009; Kieser 2012; for a specific focus on the CHE research ranking, see Clermont and Dirksen 2015), we initially accept these disadvantages and analyse which properties of returns to scale German Business Administration research production exhibits when using these preselected research indicators as inputs and outputs. The effects of expedient and feasible modifications to the included research indicators on the validity of our results are investigated within the framework of robustness and sensitivity analyses.

However, the dataset of the first data acquisition in 2001 (inputs and outputs of 1998 up until 2000) can only be considered as a kind of pioneer dataset, used as a foundation for successive enhancements of indicator design and acquisition method in the following years. Due to the acquisition procedure, as well as for validity and consistency reasons, this dataset is applicable for empirical analyses just limited (Gilles 2005, p. 130ff; Dyckhoff et al. 2009, p. 27f). Therefore, considering the years of data acquisition from 2004, 2007 and 2010, research data from 2001 up until 2009 are taken as a basis.

These datasets were provided to us by the CHE and—in terms of internationally visible publications—by the bibliometric team of Forschungszentrum Jülich. The resulting datasets are given in the Appendix, namely Tables 3, 4 and 5. Thereby, only those BuSs are presented that feature complete input and output data. All values embody the total sum of respective absolute indicator values referring to the incorporated 3 years of the survey periods. That is, the annualized input and output values are ascertainable by dividing the total values by three. In addition, the top 10 most research-reputable BuSs for each acquisition year are highlighted in bold. The classification of the research reputation of a BuS has been determined on the basis of a supplemental CHE survey. Accordingly, the CHE requested all professors of Business Administration in Germany in each data acquisition year to state up to 5 BuSs that they perceive as being notably strong in research.

Methodological approach: DEA and scale efficiencies

From the perspective of DEA, the BuSs represent DMUs, and the previously discussed input and output indicators constitute their research technology. In analogy to Dyckhoff et al. (2009), we make the following assumptions:

  • The considered BuSs are comparable among each other in terms of the respective timeframes investigated, i.e. an absence of untypical BuSs.

  • The data are deterministic and unbiased.

  • All (conceivable) convex combinations of data points form new fictive, but realizable, activities of virtual BuSs that also belong to the technology of German Business Administration research.

The third above-mentioned assumption corresponds to a data envelopment of all observed data points. The efficient frontier of this five-dimensional (dataset of 2004) or six-dimensional (datasets of 2007 and 2010) convex polyhedron configures the best practice empirical production frontier. In order to determine properties of returns to scale, DEA is an approved instrument, since it has proven to be exceptionally appropriate with reference to multidimensional analyses of the teaching and research efficiency of scientific institutions (see the state of the art). In general, multidimensional productivities are usually subsumed into total productivities; by application of specific weighting factors for all inputs and outputs, aggregation to one-dimensional inputs and outputs and subsequent following quotient creation. DEA enables us—by using the underlying research indicators—to determine the multidimensional relative (empirical) efficient frontier of the research technology as well as to identify its properties. Apart from the multidimensional perspective of DEA, this method’s decisive advantage consists of the fact that ex ante no weighting decision of the diverse (research) objectives has to be made, because they are set endogenously. Furthermore, within the scope of DEA, no parametrical production function type has to be presumed a priori, unlike the case of Stochastic Frontier Analysis (SFA).Footnote 8

DEA spans a multidimensional polyhedron, based on the input and output indicators of all analysed BuSs. The efficient frontier of this polyhedron is determined by the group of efficient BuSs, while all remaining (inefficient) BuSs are located inside the polyhedron. Referring to fundamental DEA models,Footnote 9 we can generally distinguish between four model types, resulting from two ex-ante user-based basic assumptions. On the one hand, we can postulate either constant or variable returns to scale (CCR vs BCC models). On the other hand, we have to differentiate between input and output orientation. In the case of input-oriented models, the respective efficiency scores are determined by calculating the maximal possible proportional (or radial) input reduction without decreasing any output; and vice versa with reference to output-oriented models, by computing the maximal achievable output expansion without increasing any input. The method works via equiproportional projection of each considered BuS onto a frontier’s facet of the predefined multidimensional polyhedron, which is spanned by linear combinations (CCR model) or convex combinations (BCC model) of efficient BuSs. The larger the distance of a BuS to the efficient frontier, the lower its respective efficiency score is. In the case of an output (input) orientation, for example, an equiproportional increase of each output by 100 % (decrease of each input by 50 %) without increasing any input (decreasing any output) amounts to an efficiency score of 50 %.Footnote 10

In order to obtain insights into properties of returns to scale with reference to the underlying empirically observed production technology, we adopt the concept of scale efficiency. It is defined as the quotient of the CCR-efficiency and the BCC-efficiency score (Banker et al. 1984, p. 1088f). If in a specific size range, CCR-efficiency scores with constant returns to scale equal the respective BCC-efficiency scores with variable returns to scale, the slopes of the empirically determined CCR and BCC production functions are identical. This, in turn, entails that in this size range (of staff assignment), locally constant returns to scale must exist. Thereby, the convex BCC production technology always constitutes a subset of the linear CCR production technology. As a result, the BCC-efficiency scores are at all times higher or equal to the efficiency scores under CCR models. In addition, CCR-efficient BuSs are always BCC-efficient. Therefore, scale efficiencies mandatorily feature values of between 0 and 100 %.

Returns to scale of Business Administration research over time

Analysis of all BuSs

In the context of the following examinations, we conduct separate scale efficiency analyses for the three CHE datasets of the acquisition years 2004, 2007 and 2010. In Fig. 1, the resulting output-oriented scale efficiencies of all BuSs are visualized in dependency on the annualized number of professorships.Footnote 11 The data points of 2004 are illustrated as rhombuses, data points of 2007 as triangles and those of 2010 as circles.Footnote 12 The 10 BuSs featuring the highest research reputation for the respective acquisition year are highlighted in black.

Fig. 1
figure 1

Output-oriented scale efficiencies according to the number of professorships over the three time periods

It becomes evident that for all three periods, only a few BuSs employ under 8 or more than 16.5 professors. But whereas the number of small BuSs (under 8 professors) remains constantly low over time, the quantity of larger BuSs (more than 16.5 professors) rises in the course of the periods, and a large number of reputable BuSs are among this group. Based on the dataset of 2004, Dyckhoff et al. (2009) discovered approximately constant returns to scale for a size range of 8–16.5 professors. In this range, all BuSs are characterized by output-oriented scale efficiencies of at least 90 %, most even up to nearly 100 %. To facilitate a comparison to these findings, in Fig. 1 two dotted lines divide the diagram into four sectors (I–IV). Hence, one line indicates a scale efficiency of 90 % and the other marks the (previously) identified number of 16.5 employed professors. When considering all three periods, it becomes apparent that some BuSs in the size range of 8 up to 16.5 professors are located below scale efficiencies of 90 %. Therefore, these BuSs prevent an analogous (unrestricted) transfer of the previously discovered empirical conclusion (of constant returns to scale) onto the data of 2007 and 2010.Footnote 13 But remarkably, the majority of all BuSs are located within the sectors I and III, while sector II only contains two BuSs (2010) and sector IV merely a few, especially smaller BuSs (of 2007 and 2010). When further taking the respective ten most reputable BuSs of each acquisition year as a basis—thus accounting for the expertise of the evaluated professors with regard to the research-strongest BuSs in Germany—then these are almost entirely located within the sectors I and III.

Generally, most BuSs lie within sector I. All inefficient BuSs in this sector are radially projected onto edges of the BCC-efficient frontier. Thereby, the BuS-specific output mix determines the concrete direction of the projection. The (virtual) outputs of those edges, in turn, must match the outputs of the corresponding reference point on the CCR-efficient frontier by at least 90 %. This also means that the distance between the BCC-production and CCR-production function in this region merely amounts to 10 % of the outputs at maximum (in the case of an equiproportional projection). Accordingly, these BuSs could raise their efficiency score—presuming constant returns to scale—only from 90 to 100 % at most, hence by roughly 11 %. The assumption of (approximately) constant returns to scale in this size range seems plausible at this point.

In contrast, the larger sized BuSs in sector III exhibit smaller scale efficiencies. BCC-frontiers and CCR-frontiers differ quite substantially from one another, which exhibits an indication of decreasing returns to scale for larger sized BuSs. Indeed, such conclusions must generally be stated conditionally, because there are only a few BuSs located in sector III. But certainly a high fraction of these consists of highly research-reputable BuSs; precisely those that are considered to be “strong in research” by the surveyed experts. This is why this observation has potentially higher validity.

In the considerations and results presented above, only the input factor professorships has been used so far to define the size of a BuS. However, research assistants also make a crucial contribution to the research production of a BuS, which is why the CHE acquires them as well and also why they are integrated into our DEA analyses as a second input factor. In order to analyse what influence the number of research assistants has on the respective scale efficiency and on the resulting returns to scale, these two input factors are compared in Fig. 2. The number of professorships is still on the abscissa and the number of research assistants is now on the ordinate. In Fig. 2, we abstract from the precise degree of the scale efficiency by only distinguishing between scale efficiencies that are greater than or equal to 90 % (grey) and scale efficiencies that are less than 90 % (white).

Fig. 2
figure 2

Relationship between professorships, research assistants and scale efficiency

An apparently linear relationship between Business Administration professorships and research assistants becomes clear, which is reflected in relatively high correlation coefficients (between 0.62 and 0.75, depending on the acquisition year, with a significance level of 1 %). Although these two inputs do not have a perfect linear relationship, they rather correspond to each other in a certain range. We can infer from Fig. 2 that the assertion of constant returns to scale with respect to professors is independent of the number of research assistants assigned to them. By contrast, we cannot derive clear assertions about properties of returns to scale only with respect to the number of research assistants.Footnote 14 On that basis, the number of research assistants is not included in the definition of the size of a BuS in the following analyses.

In order to analyse the aforementioned conclusions of constant returns to scale for medium-sized BuSs and decreasing returns to scale for larger-sized BuSs in more detail, Fig. 3 illustrates the output-oriented scale efficiencies, dependent on the number of professors for each of the acquisition years separately. Due to their limited number, conclusions with respect to smaller BuSs are necessarily problematic. In contrast, deriving findings with reference to the group of larger BuSs is rather reasonable, because of this group’s quantitative increase over time as well as the major presence of reputable BuSs. Based on these deliberations, we subsequently take a closer look at the respective size range of between 8 and 28.7Footnote 15 professors (reputable BuSs are again highlighted in black). Next, in order to mitigate the impact of single data outliers on an overall conclusion, we approximate the calculated data points with a linear regression assuming a quadratic function.Footnote 16 In Fig. 3, the resulting functional relations are illustrated for each of the three timeframes.

Fig. 3
figure 3

Regressions of the output-oriented scale efficiencies according to the number of professorships over the three time periods

If we now operationalize approximately constant returns to scale by means of scale efficiencies of more than 90 %, then the point of intersection between regression and 90 % scale efficiency determines the crucial transition region. It marks the annualized number of employed professors up to which the assumption of constant returns to scale can be confirmed in terms of the respective timeframe. Supporting the results of Dyckhoff et al. (2009) for 2004, we identify a size range from 8 up to 16.7 professors of nearly constant returns to scale. This size range slightly decreases for the timeframe of 2007 (from 8 up to 15.2 professors), whereas it increases for 2010 (from 8 up to 18.9 professors).Footnote 17

Because of the aforementioned presumption of convexity with regard to the data envelopment (in the context of BCC-production functions), approximately locally constant returns to scale in the respective size ranges ceteris paribus imply that smaller BuSs consisting of less than 8 professors must exhibit increasing, and larger BuSs with more than 16.7 (2004), 15.2 (2007) or 18.9 (2010) decreasing, returns to scale. With reference to practical applications, such observed size ranges of constant or approximately constant returns to scale are typically rather small; in consideration of many medium-sized, but only comparably few small and large DMUs. Hence, the relatively large size ranges of approximately constant returns to scale in the course of the consecutive timeframes investigated turn out to be remarkable.

As mentioned before, drawing conclusions about the properties of returns to scale with regard to larger BuSs for the acquisition year 2004 is inevitably problematic because of their limited number. In view of 2007 and 2010, however, particularly the number of larger BuSs is increasing substantially, which enables us to derive statements in favour of returns to scale. Due to decreasing marginal scale efficiencies of the regressions in Fig. 2, i.e. lower scale efficiencies of larger BuSs, the tendency of decreasing returns to scale with regard to (too) large BuSs appears to be confirmed. Furthermore, an optimal BuS size with respect to research productivity seems to exist; from this size onwards, marginal gains of additional professorships are diminishing. In order to examine the last-mentioned conclusions more precisely within the scope of an individual analysis of particular BuSs, in the next section we examine the temporal development of research-reputable BuSs’ returns to scale. A more detailed analysis of reputable BuSs based on expert judgments by the scientific community for Business Administration in Germany on their own peer-group should reveal interesting and significant results.

Analysis of research-reputable BuSs

Figure 4 illustrates an extract of Fig. 1, focusing on intertemporal movement patterns of scale efficiencies with reference to seven reputable BuSs. In particular, all those BuSs are considered that are part of the top 10 most research-reputable BuSs in each of the three consecutive acquisition years. The depicted paths indicate the temporal development of these BuSs with respect to both their size and (resulting) scale efficiency over the three acquisition years.

Fig. 4
figure 4

Movement patterns of reputable BuSs according to output-oriented scale efficiencies and number of professorships over the three time periods

The movement patterns confirm the results previously discussed. On the one hand, some reputable BuSs are located within sector I of constant returns to scale over all time periods; undertaken moderate expansions or reductions of employed professors do not implicate their scale efficiencies to be decreasing to such an extent that they would move outside of this sector. On the other hand, for other reputable BuSs, more considerable expansions are accompanied by a reduction of the respective scale efficiencies, especially when exceeding 16.5 professors. For instance, BuS 55 (Münster) is steadily increasing in size, which is accompanied by a slight decrease in scale efficiency from 100 (2004) to 93 % (2007) and finally to 89 % (2010). A similar result can be deduced by observing BuS 70 (Vallendar). While this BuS is still scale-efficient in 2004 and 2007, after a considerable increase in the number of professors, the scale efficiency is reduced to a mere 87 % in 2010. This trend becomes particularly evident for BuS 50 (Mannheim), which is the most research-reputable BuS in Germany over all three surveys. Still scale-efficient in 2004, its continuous enlargement is also accompanied by a decrease in scale efficiency to 81 % (2007) and ultimately 69 % (2010). In contrast, referring to BuS 52 (LMU München)—which is the second most reputable BuS in Germany in all three surveys—we can observe an opposite movement pattern. Starting from low scale efficiencies in 2004 and 2007, it achieves a scale efficiency of 100 % in 2010, while simultaneously reducing its professorships.Footnote 18

Thus, altogether we can claim that the considered reputable BuSs, by trend, grow in size over time, accompanied by decreasing scale efficiencies. Moreover, it should be noted in particular that in this section we have exclusively investigated movement patterns of research-reputable BuSs. Because of the high research reputation assigned to each of them, we obviously cannot imply that they are particularly unproductive; this, in turn, gives greater weight to our observation. Accordingly, both previously derived main findings—regarding the size ranges of constant returns to scale and decreasing returns to scale afterwards—can still be confirmed subsequent to this detailed investigation of reputable BuS-specific temporal movement patterns.

Robustness and sensitivity analyses

The observed empirical findings are valid in line with the ex-ante chosen framework of production theory, and with the underlying datasets and allowing for the assumptions set. In particular, the present examinations of returns to scale are theoretically founded on production theory and based on the fundamental causality between production function and the concept of efficiency, and subsequently scale efficiency. Efficiency is always relative in a twofold sense: First, it depends on the underlying preference relation—i.e. the choice of considered performance indicators and their attributed values—as well as the existing spanned production possibility set—i.e. potentially realizable activities in the form of real or virtual BuSs. Hence, evaluation modifications in these two criteria have an impact on the derived production function and the corresponding properties of returns to scale. In the following, we therefore analyse the robustness of the obtained results using sensitivity analyses with respect to these two criteria.

Sensitivity analyses with respect to the production possibility set

When selecting DMUs in the context of productivity and efficiency analyses, their comparability constitutes an elementary prerequisite. Only under the assumption that production technologies as well as considered inputs and outputs of the respected DMUs are homogeneous—i.e. they pursue the same objectives—do productivity analyses within this peer group make sense at all. Since we base our analyses on departments of the same academic discipline, a certain degree of homogeneity appears to be ensured. But because each BuS is integrated into a university-specific environment that is characterized by heterogeneous external parameters and differing stakeholder interests, resulting in individual department structures and possibly diverse objectives, the existence of deviations is quite plausible.Footnote 19 In order to reveal such characteristics, we now abstract from BuSs turning out to be untypical and subsequently analyse the resulting modified efficiency scores.

When selectively analysing certain partial productivities—meaning exclusively focusing on each input/output combination—the respective efficient frontiers of the determined partial production functions are usually substantially characterized by just a few BuSs featuring exceptionally high outputs or low inputs. Single erroneous or untypical data points on the efficient frontier exert a potentially strong influence on (in-) efficiency scores of other BuSs and thus also on the results regarding properties of returns to scale. In contrast, inefficient BuSs—having no impact on the efficient frontier—are only responsible for their own efficiency scores. Therefore, we focus on efficient outliers. When closely observing (all) partial productivities for each acquisition year, several BuSs are conspicuous, outperforming the others considerably with reference to their total value. Hence, they are taken out of the datasets in the scope of the following sensitivity analyses.

The process is exemplified in Fig. 5. This figure illustrates the partial productivities of the BuSs with respect to the internationally visible publications (ordinate) and the number of professorships (abscissa) for 2010. It becomes apparent that BuS 52 is clearly distinct from the other BuSs with regard to this partial productivity and especially with respect to the absolute number. Hence, this BuS substantially determines the respective partial efficient BCC-frontier of the empirically identified partial production function and thus may have an influence on multidimensional efficiency analyses using DEA. Hence, BuS 52 is identified as an outlier and is excluded in the following.

Fig. 5
figure 5

Two-dimensional partial productivity referring to the indicator Pub-int 2010

Similar arguments apply to BuS 70 in 2004 concerning the indicator PP-nat, to BuSs 53, 56 and 71 in 2007 with reference to TPF as well as to BuS 56 in 2010 regarding PhD dissertations. All of these six BuSs are BCC-efficient; except for BuSs 53 and 56 in 2007, they are also CCR-efficientFootnote 20; beyond that, all six are benchmarking-partner for other BuSs multiple times. So, they might have (considerably) influenced the previously determined properties of returns to scale. If we now classify these six BuSs as untypical and exclude them from our DEA analyses, in fact the single efficiency scores increase, while the corresponding output-oriented scale efficiencies still remain on a very high level (see Table 1).

Table 1 Averaged output-oriented scale efficiencies in the case of untypical BuSs being excluded

When comparing the resulting (modified) scale efficiencies with the original calculations and especially the identified range of constant returns to scale, then the scale efficiencies of 2004 and 2007 barely differ from one another. Only in 2010 do they decline more (by 5 %), but still exhibit 91.2 % on average. Therefore, the conclusion of constant returns to scale for medium-sized BuSs seems to be still valid. The differences in scale efficiency levels of all BuSs in comparison to those from the interval of constant returns to scale primarily result from (the described) strongly decreasing scale efficiencies of larger BuSs; thus, our previously deduced conclusion of decreasing returns to scale for large BuSs can still be verified.

Sensitivity analyses with respect to the preference relation

In order to assess the influence of individual research outputs on the properties of returns to scale, in the following we vary the comprehensive set of partial output components. For this purpose, in each case the indicator Pub-int together with one further output indicator is excluded from the respective DEA analysis.Footnote 21 As can be seen in Table 2, excluding certain output indicators only exerts minor influence on our observed empirical results. In the previously identified size intervals of constant returns to scale, the corresponding averaged four-dimensional output-oriented scale efficiencies are—over all sensitivity analyses and timeframes investigated—considerably higher than 90 %. The low scale efficiency values considering all BuSs are caused by (partly strongly) decreasing scale efficiencies of larger BuSs as well. However, it is quite obvious that the averaged scale efficiencies in the respective size range of constant returns to scale are decreasing distinctly in the course of the investigated periods when both indicators addressing publication activities (Pub-int and PP-nat) are excluded simultaneously. As a consequence, with 91.9 % in 2010, they feature—compared to all other partial considerations—the lowest values in the ‘constant’ interval. Hence, the results tend to react more sensitively to a simultaneous exclusion of both publication indicators.

Table 2 Averaged output-oriented scale efficiencies in the case of the indicator mix being varied

As noted in the description of the indicator design and acquisition before, the nationally visible publications are weighted based on the number of authors and the perceived quality of the publication medium. By contrast, such weighting is not done for the internationally visible publications by the CHE. For the benefit of a symmetrical procedure, we therefore also investigate whether differences result concerning the properties of returns to scale when the internationally visible publications are weighted analogously to the nationally visible publications. Thereby, we weight the number of authors in accordance with the process used in the CHE’s PP-nat. With respect to journals’ quality weighting, we do not refer to the JOURQUAL2 but rather use the citations of the respective articles. In doing so, we take the divergent acquisition methodology of the Pub-int indicator based on a WoS query into consideration. In general, the impact on scientific advancements and the dissemination of knowledge is operationalized based on citations to create conditions for knowledge spillover benefits. Citations thus represent a proxy measure of the value of output and are therefore predestined for a corresponding weighting. However, the citations of the articles compiled in the Pub-int indicator follow a strongly skewed distribution, and numerous articles remain uncited at the moment of inquiry. For that reason, the actual citations are not used for weighting here. The WoS journal impact factor offers an alternative weighting option, particularly with respect to journal quality. We therefore weight each article in the internationally visible publications with the WoS journal impact factor at the respective moment of inquiry.Footnote 22 As Table 2 shows (“Pub-int weighted” row), however, these modifications have no decisive impact on the revealed properties of returns to scale, which is why our prior assertions also apply to a corresponding weighting of internationally visible publications.

In addition, we analyse the effect on the properties of returns to scale, if we regard third-party funds as a research input instead of an output. In this case, the averaged output-oriented scale efficiencies still exhibit high values and show similar characteristics to those already identified in the scope of the previous sensitivity analyses.

The respective progressions of the function approximations—calculated analogously as before—also confirm this observation. Hence, similar intersections are to be seen compared to Fig. 2, thereby the size range considered to be constant ceteris paribus often diminishes slightly. Only an exclusion of both publication indicators leads to considerable shifts of intersections. Accordingly, the previously observed plausible sensitivity and relevance of the two publication indicators are confirmed once again.

One further (possible) modification addresses the orientation of the employed DEA model. Presuming an input orientation in the context of scale efficiency measurements via DEA leads to a less distinct revelation of the empirical findings presented and discussed in the previous sections. Due to the conspicuously weak output performance of many BuSs regarding the CHE postulated inputs, this is no effect, however, that is empirically rich in content. Numerous BuSs feature relatively low output indicator values in this case. Reducing personnel inputs, equiproportionally, while maintaining their low levels of output performance, BuSs are projected onto straight-line segments of the BCC-efficient frontier consisting of less than 8 professors and thus feature, in some cases strongly, increasing returns to scale. This, however, does not enable any implications concerning other segments of the BCC production function consisting of more than 8 professors.

Implications, limitations and future prospects

On the basis of the non-parametric method of DEA, we analyse which properties of returns to scale German BuSs exhibit with reference to research production for the nine-year time period from 2001 until 2009, thus evaluating and extending the analyses by Dyckhoff et al. (2009) for the period 2001 to 2003. We employ the CHE research data to generate valid datasets of relevant input and output criteria. Despite high dynamics in terms of the number of included BuSs, their sizes as well as the design and acquisition of evaluation criteria, we can determine clear and robust results. We discover that over all considered consecutive periods, medium-sized BuSs from 8 up to 16.7 (2004), up to 15.2 (2007) and up to 18.9 (2010) averaged annualized professorships feature approximately constant returns to scale in research. For larger BuSs above these size ranges, returns to scale decrease (quite strongly in some cases), indicating that BuSs should not increase their academic staff in excess of a certain (critical) level—at least from a research productivity point of view. This, in turn, implies an optimal (productivity-) size of BuSs to exist in a medium-sized range. By means of diverse sensitivity analyses, we illustrate that our findings are also valid for different modifications of influencing parameters.

Certainly, our study is subject to some limitations, which should be reflected on. They are either of an empirical nature, due to utilization of the CHE data, or of a methodical kind, due to application of the DEA methodology. Apart from the discussed general deliberations regarding input and output indicator characteristics and adequacy, the CHE design and acquisition should be scrutinized. For instance, biases of data may occur, because the BuSs classify their Business Administration professorships by themselves. This is due to the fact that the definition and the differentiation of Business Administration professorships from their academic environment are not distinct. The intertemporally varying numbers of professors reported by accordant BuSs, suggest that the differentiation from related and overlapping academic disciplines, e.g. Business Informatics, might not always have been executed in a consistent manner (for a detailed analysis from an accounting perspective, cf. Clermont and Dirksen 2015).

For output indicators, data biases might also occur. This especially applies to the acquisition of publication data using queries in literature databases. In their studies, Clermont and Dyckhoff (2012a, b) reveal that none of the common literature databases feature a complete coverage of all relevant business and economic journals for German Business Administration researchers. But altogether, against the background of practical implementation, the considered indicators seem acceptable under the given conditions; in particular by reasons of diverse and continuously improved procedures of quality assurance by the CHE. In this way, the distinctiveness and robustness of our empirical results becomes particularly evident (especially over the different consecutive timeframes).

Moreover, the research situations of BuSs are characterized by diverging environmental surroundings and workloads in teaching (for instance, public universities versus private universities), which are not considered by the underlying datasets. Therefore, we ultimately conduct partial productivity analyses of the comprehensive academic productivity of BuSs per se, even in the present multidimensional case. Interdependencies between teaching and research activities are neither taken into account. Thereby, basically both positive and negative effects of teaching activities on research productivity are plausible. In this sense, teaching can be considered to be a time-consuming endeavour on the one hand; but on the other hand, discussions with students might also promote research activities.

Further possible limitations result from the applied analysis method of DEA. By reasons of the postulated property of convexity regarding the datasets (within the limits of generating the BCC production functions), the empirical production functions are inevitably concave, so that returns to scale can only be increasing for low input quantities and decreasing for large quantities. Hence, for a significant interval of constant returns to scale to be deducible at all, datasets particularly have to include small and large BuSs amongst the BCC-efficient ones, adequately positioned in multidimensional data envelopment. For this reason, detected size ranges of constant or nearly constant returns to scale are typically less extensive in practical applications if there are many medium-sized but comparatively few small and large DMUs. Therefore, referring to our empirical findings, the relative extensive size range of (approximately) constant returns to scale over a period of 9 years is exceptionally remarkable. For this interval as well as the observation of decreasing returns to scale for larger BuSs, our results are quite reliable, while statements concerning the properties of returns to scale for smaller BuSs are problematic due to their limited number.

Conditioned by the defined production possibility set and the chosen preference relation, both of which are dependent on the available (and used) data, as well as the diverging interests of different stakeholders (or decision-makers), the informative value of this study is limited with regard to concrete recommendations for policy decisions. An unreflected adoption of our findings in order to justify, for example funding decisions for universities in Germany might then result in incorrect or misguided incentives. Hence, our study represents a partial (specific) perspective of just one area of a university’s tasks, that is to say research. Furthermore, not all thinkable and/or possible facets of research production are covered. Therefore, our study should certainly not serve as an exclusive basis for decision-making.

Since there is a lack of specific theoretically founded explanations, we can merely speculate about the reasons for our empirical findings. Accordingly, substantial research synergies in the form of economies of scale and scope between German Business Administration professorships within the same BuS appear to be hardly existent. On the contrary, administrative and coordination efforts seem to accumulate over-proportionally (from a certain number of employed professors onwards). In the range of medium-sized BuSs, the research outputs of individual BuS professorships (including respective research assistants) rather add up. This conclusion is plausible in view of the fact that in the German academic discipline of Business Administration, professors usually work individually. Even though some projects are in fact undertaken in cooperation, often this is the case within (their) professorships or between professorships of different universities. Cooperation between different professorships of the same BuS or university are less frequently observed.

Cooperations within a BuS of one university could be promoted, for instance, in terms of joint supervisions of PhD students by multiple professors in postgraduate programs, a concept launched by some universities several years ago. A crucial prerequisite to foster joint research projects between professors within the same BuS is a thematically close profile and focus of a BuS in corresponding research clusters; as initialized by the German federal government within the scope of the “Excellence Initiative”. Such clusters could facilitate similarities in content and/or methodology of research. Likewise, this trend towards specialized BuSs and closely cooperating research clusters (or research areas) has been initiated by several universities—in many in conjunction with the introduction of Bachelor and Master programs in Germany. In BuSs, such programmes and convergences of research topics could produce synergistic effects among the participating professorships. Of course, higher coordination requirements arising from administrative processes, for instance in the form of meetings and workshops, could have a negative impact. But the nature and extent of the de facto effects still require additional research. Likewise, the impact of such developments on corresponding properties of returns to scale therefore cannot be assessed yet, posing an interesting starting point for future research.

Further research perspectives arise from the previously discussed limitations of our study. In this paper, we have therefore assumed the data to be deterministic, which represents a necessary presumption in the application of DEA. A modification could now be made to the effect that SFA, which allows stochastically distributed datasets to be analysed, is used in lieu of DEA. The results of a SFA analysis can then be used to check our findings and, if necessary, to further validate them. The fact that a concrete production function must be assumed ex ante, however, is a disadvantage of using SFA. When DEA is used, by contrast, the orientation of the applied DEA models can be changed. In this paper, we have assumed that only the output can be changed (in the short- to middle-term); for the sensitivity analyses, we also investigated which assertions result from exclusively varying the inputs. Research into scale efficiencies in the event of simultaneous changes to the input and the output, on the other hand, remains to be conducted. In that case, unoriented DEA models must be used to calculate scale efficiencies.

As previously described in the discussion of limitations, the progression of the research production function, and therefore the resulting returns to scale, depend on the input and output factors that are used in the corresponding analyses. For example, qualitative aspects of research could be more closely taken into consideration in future and continuative analyses. In this regard, the third-party funds indicator could be differentiated such that the third-party funds granted by different funders are separated and/or differently weighted because, for example, the German Research Foundation (DFG) sets high standards for scientific rigour in the context of assessing a third-party funding proposal (Joerk and Wambach 2013).

In principle, the CHE has used quantitative measures until now with respect to compiling the indicator set and particularly regarding publication activities, which is to say that published articles are counted (partially quality weighted). Including such indicators in the performance assessment of BuSs could produce disincentives insofar as the research results are divided and published in the smallest possible fragments. The number of publications could therefore be increased correspondingly. This effect could be intensified in the underlying CHE 2007 and 2010 datasets, given that two quantitative indicators are used there to determine publication activities. To reduce these effects, information about the impact or outcome of the publications would be desirable. Referring to this, the citations that a research article receives are a frequently discussed indicator in scientometrics (van Raan 1996). Nosek et al. (2010), for instance, point out that citations represent an impact indicator that is “valid, relatively objective, and, with existing databases and search tools, straightforward to compute.” Therefore, it still has to be investigated which impact an inclusion of citations exerts on the properties of returns to scale of BuSs’ research production.

As previously outlined, the analysed research productivity of a university generally constitutes only one part of the comprehensive spectrum of academic activities, which furthermore includes teaching and administration. In this sense, the used datasets could be expanded by adding indicators that represent teaching and administration outputs in order to analyse properties of returns to scale with reference to these additional outputs. With respect to teaching, Johnes (2006), for example, uses two indicators consisting of the alumni graduation marks in an offered course of study as output and the students’ school marks when applying for a course of study as input. Other qualitative measures, such as evaluation results, dropout rates, supervisory relationships or diversity of the teaching portfolio, would also be conceivable in corresponding analyses. It should be noted, however, that to date there is no recognized set of productivity and performance indicators for teaching, as opposed to research. This is primarily due to the fact that teaching and student learning are tightly interwoven with one another in a co-productive process.

That our results are not random is shown by the inclusion of different consecutive periods on the one hand and on the other hand by multiple robustness and sensitivity analyses. As previously described in the introduction, analyses of returns to scale on research production processes are subject to particular country- and discipline-specific environmental conditions. If these do not (significantly) deviate from each other, our results can essentially be transferred to other objects of research, for instance other social sciences in Germany, such as economics. However, this does not apply to equipment-intensive disciplines, such as engineering, which, due to funding for machinery or experimental setups, are characterized by considerably higher third-party funding and therefore, where applicable, induced economies of scale and scope. Regarding national specifics, European university systems are comparable with the German one above all others and should therefore exhibit similar returns to scale of research. To that effect, Bonaccorsi and Daraio (2005) also found constant returns to scale for Business Administration research production in Italy. But it is questionable whether our results, for example, are applicable to Business Administration research in the United States of America, where the university system is characterized by a substantial degree of (highly competitive) privatisation. Whether, in particular cases, similar results can be observed for other academic disciplines and/or countries, however, requires further research.