Introduction

In recent years, various nations have placed increasing emphasis on evaluating the production efficiency of research activity in universities and public research organizations. This has created a need for improved methods of research evaluation. Over the course of the years there was “a convergence of methods towards peer informed, metrics based, departmental level evaluation” (Hicks 2009). The peer review approach remains central, and within this, bibliometric analysis provides useful support for the assembled panels of experts (van Raan 2005; Rinia et al. 1998).

One of the advantages of the bibliometric approach, which is readily applicable to the hard sciences,Footnote 1 is the possibility to measure labor productivity, which is a fundamental indicator of research efficiency. This factor is not measurable through peer review, in which costs and times limit the evaluation to a partial segment of the entire scientific production. As a consequence, peer-review approaches can assess only the quality of the research output submitted to evaluation. Although a certain level of correlation between output quality and productivity has been demonstrated (Abramo et al. 2009a), direct measures of productivity would permit much more than a rough approximation.

Bibliometric measurement of productivity presents two main obstacles, though. The first being the reconciliation of the different ways the authors affiliated to the same organization report it in the address. The second being the unequivocal association of publications with their true authors. It is not surprising that the literature offers few analyses, all of which are limited to a restricted number of scientific disciplines and research institutions (Macri and Dipendra 2006; Kalaitzidakis et al. 2003; Pomfret and Wang 2003). To the best of our knowledge, only Abramo et al. (2008a) have achieved comparative bibliometric measures of research productivity for all the hard sciences at all the universities of a national system.

Other obstacles to equitable comparison of research productivity involve the factors of production: comparison of labor productivity among various research units should be conducted at parity of other production factors and economic rents. But the factors of production, with the exception of labor, do not always permit ready measurement and accurate attribution to individual production units. It is even difficult to measure the labor factor, in hours, since the time that scientists dedicate to research varies within single universities, among institutions, and certainly between those employed at universities and those in public research institutes. It is also difficult to measure capital and certain factors that go beyond merit (such as geographic location,Footnote 2 or the accumulated experience and knowledge of the scientists belonging to an institution), and to assign measurements to individual research units, with subsequent normalization, even though these factors impact directly and indirectly on output of research activity. In fact, the quality of scientific production, as measured by national peer review assessment exercises, would be influenced by these same variables.Footnote 3

Limiting our attention to the relatively “more measurable” labor input, we must still consider that research staffs are composed of different academic ranks, which receive different salaries. Various scholars have examined the relationship between scientific productivity and academic rank and their studies show a significant differential in productivity with variation in rank (Prpic 1996; Zainab 1999; Bordons et al. 2003). As early as 1978, Blackburn et al. (1978), in a study sample of American academics, showed that full professors publish at a higher average rate than associate professors. Dickson (1983) and Kyvik (1990) have captured the same effect in their respective studies of Canadian and Norwegian universities. There has been less study of the relationship between quality of output and academic rank. Bordons et al. (2003) analyze the impact of publications by Spanish Research Council scientists by gender and professional category, in two specific areas: Natural resources and Chemistry. They show that the average impact factor of journals in which full professors publish their articles is higher than that for publications by the lower academic ranks. Abramo et al. (2009c) extend the analysis to all the hard sciences and demonstrate that Italian full professors average more publications than associate professors (and these more than assistant professors), and also in journals with a higher impact factor. A further study by Abramo et al. (2009d) demonstrates a strong correlation between productivity and impact, meaning that the scientific production by the most productive scientists is also, on average, of greater quality. Ben-David (2009) showed that Israeli economists with the rank of professor receive on average more citations than their colleagues with lower ranks. These studies confirm the expectation that quality of output reflects academic rank.

A consequence is that university rankings based on productivity or on quality by uniform labor unit will clearly favor organizational units with a greater concentration of higher ranks. If national research assessment exercises do not take this effect into account, leaving resulting distortions in their rankings, there could be possible dangerous effects on allocation of public funds and on the image of the institutions observed. This is the case for the example of the first and only Italian national research evaluation exercise, VTR, and for the subsequent allocation of the portion of public financing that is partially based on VTR rankings. These rankings have not accounted for the varying presence of staff ranks among different universities.

The present study intends to measure the extent of distortions in national performance rankings of research institutions when academic rank, and relevant salaries, are not taken into account. We do not expect that such distortions are very high on average, because of two main reasons. The first being that the concentrations of academic ranks are similar across universities, with few possible exceptions especially among younger universities. The second being that academic salaries in Italy are fixed at national level and depend only on rank and seniority, not on merit.

Using bibliometric techniques, we compare two different rankings of research productivity in Italian universities: one which considers the labor factor as homogenous and one which considers the differing academic rank of the research staff. We carry out such comparisons at two different levels: at a detailed scientific sectorial level; and at more aggregated discipline level. In each discipline and scientific sector within the discipline, we measure the changes in the above said rankings, and provide the relevant statistics. Considering that data about the academic ranks and salary ranges of the Italian university personnel are available, and also that the proportions of such personnel in organizational units, although similar, are not the same, we propose that in such a case, the comparison of research productivity by “unit of cost” would be more equitable than comparison by unit of labor, all other limitations of productivity measurements remaining the same.

The following section of this paper describes the field of observation for the study, the dataset and the methodology used. Section 3 presents the results of the analysis. The final section offers a discussion of the results and the authors’ concluding considerations.

Methodological approach

Research activity is an input–output production process in which the inputs consist of human and financial resources, scientific instruments, materials, etc., and where outputs have a complex character of both tangible nature (publications, patents, conference presentations, etc.) and intangible nature (personal knowledge, consulting activity, etc.). The knowledge production function has a multi-input and multi-output character. This in turn creates a multi-faceted problem when it comes to measuring the scientific productivity of labor, and requires scholars to make precise choices in methodology.

In this work, measuring the scientific productivity of Italian universities in the hard sciences, we first consider input only as the number of researchers involved, but subsequently also consider their relative cost.

Concerning output, there are multiple forms of codification for new knowledge produced by research activity. Having limited the field of analysis to the hard sciences, we choose scientific publications as a proxy for research output, which certainly finds support in the literature (Moed 2005). The research productivity of individual scientists is not normalized to their actual hours of research time or to other productive factors, since there is a complete lack of data that can be attributed to the level of individuals.

Dataset

The data used in the study are obtained from the Observatory on Public Research in Italy (ORP), a bibliometric database maintained by the authors and derived from Thomson Reuters’ Web of Science (WoS). The ORP provides a census of WoS indexed scientific production since 2001, from all research institutions situated in Italy. Beginning from the ORP data, this study extracted all publications (articles and reviews) authored by researchers in Italian universities for the period 2004–2006. A reconciliation of the different forms of name of the same universities followed.Footnote 4 Finally, using a complex algorithm for disambiguation of the precise identity of the authors, each publication was attributed to the university scientists who wrote it.Footnote 5

In the Italian university system, each researcher is assigned to a single official scientific disciplinary sector (SDS). For the hard sciences, there are 183 SDSs,Footnote 6 grouped into 8 disciplinary areas (UDAs): Mathematics and computer sciences; Physics; Chemistry; Earth sciences; Biology; Medicine; Agricultural and veterinary sciences; and Industrial and information engineering.Footnote 7 The census by author name permits attribution of measures of output to individual researchers, and then by aggregation to the SDS and UDA of a university. The methods used overcome considerable obstacles and provide levels of accuracy that have not previously been attained in large-scale studies in the literature. When one observes large populations of scientists, the number of homonyms among their names is very high (in the Italian academic system 12% of the 60,000 scientists have names that are homonyms), and the task of their disambiguation within acceptable margins of error is formidable. This is why bibliometrics-based studies have generally been carried out at aggregated levels of analysis, such as at the level of entire universities. When they are conducted at the levels of single scientists or research group they are limited to one or few organizations or scientific disciplines, in which case it is possible to disambiguate manually. Disambiguation cannot be done manually in the case of an evaluation of an entire national research system, where an enormous quantity of data is involved. However, this step is required in order to avoid distortions in productivity measurement caused by several factors: (i) the differing distribution of resources among the various scientific areas of each university, (ii) varying degrees of publication and citation “fertility” among scientific disciplines; (iii) variation in the data source in terms of its differing coverage of the range of journals published in each disciplinary area; and (iv) the researchers generally publishing in more than one subject category.

For the 2004–2006 triennium, this study concerns the 69 Italian universities active in the 183 hard science SDSs, representing a total of 34,000 research staff with over 81,000 publications. The official database of the Ministry of Education, Universities and Research (MIUR)Footnote 8 was used to provide a census of all university research personnel and their ranks. This ministry is responsible for the recognition of university status, allocation of regular operating funding, and the control and evaluation of university function.

Data concerning salary costs for research personnel were obtained from the DALIAFootnote 9 database, which is also maintained by the MIUR. The current Italian university system provides that research personnel are assigned to three ranks: full professors, associate professors and assistant professors. Definitive confirmation of an individual’s rank arrives after a three year “probationary” appointment, following an examination of the individual’s performance. The university system also includes a small number of “research assistants”, a rank which is being eliminated, and which resembles that of an assistant professor. Table 1 shows the numbers, total costs and average cost per rank of these personnel, for the triennium.

Table 1 Data concerning Italian university personnel, mean values 2004–2006

Full professors compose 29.5% of university personnel but represent 40.6% of total personnel costs. Assistant professors compose the largest portion of personnel, at 37.7%, but represent only 27.4% of the entire cost. The last column of Table 1 presents the figures for average costs per academic rank, which are used in the subsequent elaborations of productivity on the basis of cost.

Indicators

For each publication in the dataset, the study considers an indicator of quality defined as Article Impact Index, measured on a 0–100 percentile scale, according to the citationFootnote 10 distribution for publications of the same type and year falling in the same ISI subject category.Footnote 11 A value of 90 indicates that 90% of the articles (or reviews) of the same year, falling in the same ISI category, have a lower number of citations than the article (or review) considered. In this way the quality measurement distortions due to the different citation fertilities among subject categories are limited.

The indicator for evaluation of the bibliometric output of the researchers in the various university SDSs is Fractional Scientific Strength. This is given by the sum of the publications achieved by the researchers of a single university SDS, with each publication weighted according to its Article Impact Index and normalized according to the number of organizations to which the coauthors belong. With this method it is possible to consider all dimensions relevant to output: the quantitative (through number of publications), qualitative (through Article Impact Index) and the dimension of contribution (through the count of co-authorship).

The productivity of a particular university SDS is given by the ratio of Fractional Scientific Strength to the input factor for the same SDS. For the productivity per labor unit (LP), the input factor considered is simply the number of scientists present in the SDS, while for the calculation of productivity per unit of cost (CP) the input factor considered is the overall cost of research staff at the SDS, derived from the parameters indicated in the last column of Table 1.

Continuing on from the level of the SDS, the productivity values for a full university UDA are then obtained by aggregation, after standardization and weighting. Productivity measures of each university in each SDS are therefore standardized to the national mean in the same SDS. This standardization serves to eliminate bias due to the different publication and citation rates of the SDSs within a single UDA. Data weighting instead takes account of the variation in representativity, in terms of personnel numbers and costs, of the SDS represented within each UDA (Abramo et al. 2008b). For a generic university we thus have:

$$ LP_{j} = \sum\limits_{s = 1}^{{n_{j} }} {\left( {{\frac{{LP_{s} }}{{\overline{{LP_{s} }} }}} \cdot {\frac{{{\text{Add}}_{s} }}{{{\text{Add}}_{j} }}}} \right)} $$

where:

  • LP j  = productivity per labor unit in UDA j,

  • LP s  = productivity per labor unit in SDS s,

  • \( \overline{{LP_{s} }} \) = national mean of productivity per labor unit in SDS s,

  • Add s  = number of scientists in the university considered in SDS s,

  • Add j  = number of scientists in the university considered in UDA j,

  • n j  = number of SDSs in the university considered in UDA j.

Analogously:

$$ CP_{j} = \sum\limits_{s = 1}^{{n_{j} }} {\left( {{\frac{{CP_{s} }}{{\overline{{CP_{s} }} }}} \cdot {\frac{{{\text{Add}}_{s} }}{{{\text{Add}}_{j} }}}} \right)} $$

where:

  • CP j  = productivity per unit of cost in UDA j,

  • CP s  = productivity per unit of cost in SDS s,

  • \( \overline{{CP_{s} }} \) = national mean of productivity per unit of cost in SDS s.

Results

As described above, ratings of productivity for Italian universities were calculated per labor unit and unit of cost, for the 2004–2006 triennium, and then used to obtain rankings. In the following, changes in rankings when switching from measure of productivity per labor unit to unit of cost, are shown at the UDA and SDS levels. Table 2 presents the variations under the two methods of ranking, as recorded for each UDA in each university.Footnote 12

Table 2 Variations in ranking when switching from measures of productivity per labor unit (LP) to unit of cost (CP), for Italian universities, by university disciplinary area (UDA), 2004–2006 data

Table 3 presents further statistics concerning the distribution of the rankings under the two different methods, by UDA, for the field of observation. As expected, it is readily apparent that there is a very high correlation between the two rankings, in all areas (last column of Table 3). The coefficient of correlation varies from a minimum of 0.972 for Biology to a maximum of 0.996 for Agricultural and veterinary sciences. But at the same time, the variations in ranking between the two methods are also quite substantial: the number of universities for which the ranking changes under the two methods ranges from a maximum of 86.4% for Physics to a minimum of 36.5% for Agricultural and veterinary sciences. This last UDA shows the strongest correlation between the two rankings: 33 of the 52 universities maintain a constant ranking under the two methods. It also presents the lowest values for the other statistics presented in Table 3: the greatest shift in position is only 3 places, seen at 3 distinct universities (Sassari, Teramo and Udine) while the average shift in rank is less than one (0.615) and the median is zero. The maximum mean value of change in ranking is seen in the biology UDA (2.667), followed by industrial and information engineering (2.258), physics (2.237) and chemistry (2.207). The chemistry UDA offers the extreme case of a university that shifts 17 positions under the two methods of ranking. Other wide jumps in ranking occur in Physics, where the University of Reggio Calabria “Mediterranean” gains 15 places under the CP classification, with respect to its ranking for LP. In Industrial and information engineering there is a shift of the same magnitude: in this case the University of Rome “Foro Italico” loses 15 positions under the CP classification compared to its LP ranking. In Biology, the maximum variation in ranking is 13 positions, and concerns three universities: The University of Teramo gains positions, while the universities of Milan “Vita-Salute San Raffaele” and Venice “Ca’ Foscari” lose the same number. The same extent of shift occurs in Earth sciences, for the University of Trent, which loses 13 positions when classified for CP as compared to LP.

Table 3 Variations in ranking statistics when switching from measures of productivity per labor unit (LP) to unit of cost (CP), by university disciplinary area (UDA), 2004–2006 data

Table 4 presents data on the calculation and ranking of productivity for universities active in the Chemistry UDA, as an in-depth example of one of the areas that presents greater shifts in rankings under the two methods. In this UDA, 42 of the 58 total universities show a different ranking under the classification by LP and by CP. Of these 42, 39 show variations in ranking with absolute values less than or equal to 4. The maximum shift is 17 positions, as noted above, for the University of Teramo: this university, a rather young one, jumps from 40th position under LP to 24th under CP. The staff complement here consists of 4 scientists (averaged over the triennium) with an average cost of €64,400, which is the least among all the universities active in the UDA, since there are no full professors present. The situation is similar for the University of Cassino, which places in 40th position under LP but rises to 24th position under CP. The trend is the opposite for the University of Catania, with a heavy concentration of top-ranked personnel among its 107 scientists (mean cost per scientist: €94,700), which contributes to losing 7 positions under the classification by CP compared to that for LP. Only the International School for Advanced Studies of Trieste shows a higher value of mean cost per scientist, at €98,900. In general, there is a significant correlation (-0.739) between the variation in LP and CP ranking and the mean cost per member of research staff in each university, active in this UDA.

Table 4 Comparison between productivity per labor unit (LP) and unit of cost (CP) for Italian universities, for the Chemistry UDA, 2004–2006 data

At the SDS level, Table 5 presents data on the calculation and ranking of productivity for the 45 universities active in the Pharmacology SDS of the Biology UDA, as an example of the variation that may be observed at a more detailed level. The shifts in ranking seem less than at the level of UDA: the mean value of shift is 1.33, in absolute value, with a median of 1. The maximum variation is seen for the University of Milan “Vita-Salute San Raffaele” which drops from fifth position for LP to 13th for CP. The maximum “positive” shift in direction is seen for the Second University of Naples, which gains 4 positions, moving from 38th ranked for LP to 34th ranked for CP. In total, eight universities show increases in ranking that are equal to or greater than 3 places, for CP, while 12 universities do not show any change in position.

Table 5 Comparison between productivity per labor unit (LP) and per unit of cost (CP) for Italian universities, for the Pharmacology SDS, 2004–2006 data

Conclusions

Bibliometric techniques permit the measurement of research productivity of universities and public research institutions. Comparative measures of labor productivity should be conducted under parity of other factors of production, but these factors are difficult to measure and attribute to individual scientists. The first and only Italian national exercise of research evaluation, based on peer review techniques, treated the labor factor as uniform, meaning that the comparative quality of organizational units was not normalized to take account of variations in distribution of academic rank. This may occur again and in other countries as well. The current study illustrates the number and extent of distortions which occur when the labor factor is treated as uniform in the Italian university system. Other literature on the argument indicates that there is a significant difference in average productivity among academic ranks, which, when the labor factor is considered uniform, results in more favorable evaluations for universities with greater concentration of full professors.

The proposed study compared rankings of productivity for Italian universities with respect to labor unit and unit of cost. The analysis was conducted from the bottom up, beginning with the identification of the authorship of over 81,000 publications by all university 34,000 scientists working in the hard sciences, then by aggregation at the level of the scientific disciplinary sectors in individual universities and at the further level of disciplinary area. At both these levels there is a strong correlation between the two measures of productivity, but also some variations in rankings, especially in reference to a number of outliers that show substantial shifts in rank for “cost” productivity as compared to labor productivity. This occurs for universities where the personnel complement is notably imbalanced in favor of higher or lower academic ranks, and which are therefore unavoidably favored or disfavored by the assessment methodology that does not take account of the representation of research staff by academic rank.

The measurements proposed do not take account of variations in the time dedicated to research by the staff members, although teaching load and other institutional duties are not necessarily equally divided. Nor does the methodology consider the capital available to the organizational units under observation, or other factors external to merit that could impact on quantity and quality of scientific production.

Even with these cautionary notes, the study provides a useful indication of how to proceed towards research assessments that are more robust and exhaustive than those of the current state of the art. In particular, the study proposes an improvement in measurement of labor productivity that should be useful in support systems for the decisions of those who, at various levels, are responsible for the management and evaluation of research institutions and research systems. While ranking distortions due to overlooking academic rank, result negligible on average at aggregated levels of analysis, such as discipline level, they should be more noticeable at the single scientist or research group levels. The authors intend to investigate this in the future, to the benefit of those universities that implement incentive systems based on research performance.