Keywords

1 Introduction

The reaction of scientific communities against evaluation of research is almost entirely concentrated in SSH disciplines. STEM disciplines seem to have accepted that research evaluation, as it is often stated, “is here to stay”. Some exceptions are sometimes raised in Mathematics or Clinical medicine, in particular against some of the practices in bibliometrics (e.g., Impact Factor), but overall these arguments constitute a minority opinion.

Why is this the case? I suggest that the reasons cannot be purely sociological, or related to the way in which scientific communities organise their work and communicate their result. Nor can it be political or ideological: the evidence that political or ideological opinions significantly differ by discipline is scattered and not robust. Something different, or deeper, must be at work. Since scholars are motivated more by the intrinsic logic of their scientific work than by external incentives (although incentives matter a lot), the explanation must be found at the epistemic level.

By “epistemic” I mean the way in which scientific communities produce valid knowledge, or the procedures, criteria, practices by which they recognise inter-subjectively the value and validity of the knowledge produced by others, and by this way submit themselves to the same rules. In this perspective, the inter-subjective dynamics of communication and validation are not separated from the internal dynamics of knowledge, or the intrinsic persuasiveness of the knowledge exchanged (Ziman 1978, 2000).

In this sense epistemic is not the same as epistemological, since the latter requires a second-level abstract and professional reflection on the rules of scientific work. Not all scientists are also philosophers of science (indeed, only very few), but all good scientists have a solid mastery of a series of rules that are used to discriminate knowledge claims according to their purported validity.

Nor is it the same as sociological, since at this level the main interest is the way in which communities build up their agreement (or disagreement), irrespective of the content of knowledge. Sociological studies of science are mainly interested in the way in which socially defined actors, like scientists, set the boundaries of scientific vs non-scientific knowledge (Gieryn 1983, 1995, 1999; Taylor 1996), define scientific disciplines (Lenoir 1997; Abbott 2001), reach agreement or disagreement about claims (Knorr Cetina 1999), use material infrastructure and laboratory facilities to build up shared meanings (Latour and Woolgar 1979), create the conditions for repeatability of experiments (Collins 1975, 1985, 1999) or balance scientific power relations (Frickel and Moore 2006). For programmatic reasons, sociological studies do not deal directly with the epistemic content of knowledge, as separated (or separable) from the social interactions associated with it (Barnes and Edge 1982; Mulkay 1991; Barnes et al. 1996; Yearley 2005).

While I will use materials from epistemology as well as sociology of science, the main focus will be on the epistemic level, as elaborated by authors such as Ziman (1978, 2000).

In this chapter I address the following questions:

  • What are the epistemic differences between STEM and SSH that may explain the differences in the orientation towards research evaluation?

  • Are there epistemic differences across disciplines in SSH that may explain intra-SSH differences in the orientation towards research evaluation?

  • Are there research quality criteria on which communities in SSH may converge? Or, is it possible to address epistemic differences with procedural fairness?

2 Epistemic Differences Between STEM and SSH

An influential stream of literature, inspired by logical positivism, argued that the difference between STEM and SSH is very simple: the former are scientific disciplines, the latter are not (Steinmetz 2005).

By scientific discipline meant a body of knowledge that could, at least in principle, formulate causal propositions. The formulation of causal relations requires a number of conditions that are found in natural sciences, but not fully in social sciences, even less in disciplines that deal with language. In natural sciences it is possible to assume the invariance of the object, so that controlled experiments can be carried out.

This view has dominated the scientific literature for decades after the Second World War. It is still maintained by some authors.

However, it is no longer assumed as the dominant theory, particularly after the developments of philosophy of science and social studies of science in the 1960s. The impact of Kuhn (1962) has been crucial here: the reason why scientists may formulate causal propositions is not that they control each of them in isolation, but because these propositions are consistent with an overall paradigm, whose foundations do not have the same level of controllability. In addition, scientists produce a large variety of propositions, not only causal ones, referring to their experimental apparatus, the concrete rules of operation in the laboratory setting, or the practices of exchanging results.

Post-positivistic accounts of modern science admit a larger range of propositions as scientifically valid. This opens the way for asking to what extent disciplines in SSH may be defined scientific as well.

It is possible to summarise this issue separately for Social Sciences and Humanities.

In Social Sciences the issue of scientific validity of propositions has a long history, starting with the foundations of classical political economy and sociology in the eighteenth and nineteenth centuries. In the thought of classical authors such as Weber and Durkheim, knowledge produced in Social Sciences may well be defined as scientific, but not in the sense of producing invariant causal propositions (explanation) but rather propositions that make the behaviours of social actors intelligible by referring to their motivations (interpretation). Social Sciences are no less scientific than natural sciences, to the extent to which they submit their propositions to the same kind of rigorous control, but not through the use of experiments (which cannot be done by definition) but by establishing some level of stability of the relation between reasons for action (motivation) and observed action.

To what extent these disciplines can be defined “scientific” and what are the differences with respect to STEM? In the following I reject the notion that SSH disciplines are not scientific and investigate rather which epistemic differences can be identified.

First, researchers in STEM aim at discoveries, while researchers in SSH have only occasional discoveries (a new archaeological site, document, text, manuscript…) but most often aim at new interpretations of existing texts. The focus on discoveries means that scientists are in competition amongst themselves. Science is competitive because researchers fight to be the first to publish discoveries and receive the credit.

Second, research in STEM is cumulative, because scientists build upon the contributions of others, either in the past or from current competition. Science is a collective undertaking, not an individual enterprise. Science is both competitive and collaborative at the same time. There is a sharp difference here with respect to SSH, in which cumulativeness is much lower (Walliser 2009a, b).

Third, because of competition for discoveries and cumulativeness, the appropriate communication channel is the journal article (Lindsay 1978; Bazerman 1988; Cronin 1984, 2005). The scientific journal is serial or periodical, it offers researchers all over the world the opportunity to be updated regularly on discoveries, the format of the article is suitable for communicating discoveries, and the peer review system is efficient in solving issues of information asymmetries on the attribution of priority. The scientific journal system follows the competitive structure of science (Dasgupta and David 1994). Over time, the competitive dynamics generate a hierarchical system based on a cumulative process of reputation building: scientific journals that have published important discoveries are read more frequently; consequently authors compete to be published in them; the increase in the number of submissions makes it possible to raise the rejection rate, making the quality of journals even higher and attracting more readers, and so on. It is this structure of scientific activity that makes it possible to build quantitative measures of research quality. In particular, once the role of citations is clarified in an unambiguous way, and the set of journals for which scientists compete is sufficiently large, then the very competitive dynamics generate a system in which the underlying quality is reflected in the relative measures of citations applied to papers, authors, institutions, while the impact factor of journals is considered a reliable measure of their average quality. Therefore there is a strong connection between the nature of scientific activity in discovery-driven fields, the overall system of academic publishing, and the reliability of quantitative measures of research quality based on citations.

On the contrary, the journal article is not the suitable medium for SSH, because new interpretations require long explanations best suited for the book format (Baldi 1998; Brooks 1985, 1986).

Fourth, because of competition for priority, cumulativeness, and the workings of the scientific journal system, citations are an essential element of scientific communication. In STEM citations have unambiguous meaning of credit assigned to authors that made the previous discoveries. As it has been originally discussed by Merton, Garfield and De Solla Price, and more recently formalised by Dasgupta and David, citations to the previous literature are a necessary ingredient of scientific publishing (Bornmann and Daniel 2008). This necessity is neither ethical nor practical, it is functional. By functional is meant, according to Merton’s sociological approach, that individuals are forced to use a citation system that complies with the collective rules of the scientific community, irrespective of the individual willingness.

In order to be credited for a discovery, the authors must demonstrate their contribution is new with respect to the state of the art. In the absence of citations, it would be on the shoulders of the readers to check carefully whether there is anything new, clearly a very inefficient solution. Thus the overall system of scientific journals is based on referees who directly check the credibility of the authors’ statements, acting as agents on behalf of the scientific community. In doing so, they force authors to list all relevant citations. Furthermore, due to the cumulativeness of scientific discoveries there is no need to cite authors from the distant past, but only the papers published in the last few years, which include all the relevant knowledge. This is a striking feature of scientific papers: only a few scientific authorities of the past are cited, not because they are ignored, but because their contribution is embedded in the citations of more recent authors.

In SSH, on the contrary, researchers quote authors from a distant past, very often classical authors, and produce works that are not cumulative but complementary, segmented or even alternative to each other. While the segmentation into scientific fields and sub-fields is largely agreed in STEM, and is usually not the outcome of individual decisions, in SSH part of the activity of most creative authors is the definition of new fields or new segmentations. The existence of progress, i.e., that some works are not worth being cited because their contribution has been subsumed into others’ contributions, is usually recognized very late, often after the authors cease their activity or die. Consequently, there is a need for a different theory of citations in SSH. Citations serve different purposes and should be classified accordingly.

Finally, there is a different role of paradigmatic pluralism. In STEM there is most often a dominant paradigm, sometimes with one or a few minority positions. Due to the cumulative nature of science and limited paradigmatic diversity, competition is open. On the one hand, within disciplinary boundaries all researchers compete fiercely for discoveries, without internal segmentations that may protect against competitors. On the other hand, since peer review is (generally speaking) a blind process, the past reputation and academic status of authors are not relevant to the probability of being published. This means that incumbents, or people with a recognised academic status, do not enjoy monopolistic positions in the long run. New entrants like junior researchers and authors with unorthodox views are easily recognised. Under these conditions, it is not possible for a single author or group of authors to monopolise the citations or to manipulate the reputational indicators.

This is not the case in SSH, in which paradigmatic pluralism is not the exception but the rule. On the one hand, there are internal segmentations that are not due to disciplinary differences but rather to paradigmatic options (rooted in the choice of object, methodologies and techniques) but also to value-laden positions (academic schools and traditions, ideological positions, political affiliations and attitudes, cultural orientations).

Competition is not completely open but segmented. Scholars have sometimes a two-layered choice: first, with which paradigm they want to be affiliated; second, how to compete within the paradigm chosen. In some important sense, there is competition among paradigms, but each of them is organised into its own scientific and academic structure (often with dedicated journals, conferences, scientific societies). Competition within the paradigm is not open but controlled by the leaders who contributed its creation. The relationship between paradigms is a matter of academic power, or maybe of paradigmatic change in the long run. On the other hand, in SSH peer review is not universally adopted. The identity of authors is generally known by those who make editorial decisions in journals and book series. Since books are the most important source, the control of editorial decisions is more easily controlled than in journals. This makes competition among researchers even more restricted.

As it appears from the above discussion, there are clear counterparts of this situation in the field of industrial organisation in economics. The kind of competition experienced in science is similar to the situation of competitive markets, in which entry is open, incumbents never get a monopoly position, and it is not possible for an incumbent to manipulate strategic variables to its own advantage.Footnote 1 This is why, in my opinion, scientists are not Foucauldian (see below). They find that the representation of commensuration as a form of hidden power is not appropriate for the way in which science works in their fields. It is not a matter of lack of reflexiveness, or pragmatic orientation, as opposed to the kind of critical work advocated in social sciences. Even scientists acutely aware of the social implications of their activity never subscribe to a Foucauldian argument. Simply put, competition is so harsh and the rate of knowledge production so overwhelming that no power coalition is stable.

This is not necessarily the case in SSH, where the competition is more of a monopolistic type, or even collusive oligopolistic. In other words, due to the fragmentation of disciplines and paradigmatic pluralism, the possibility of controlling a discipline for long periods is not negligible.

3 Epistemic Differences Within SSH

Yet this picture is still incomplete. On the one hand there are disciplines in SSH that have historically emulated STEM disciplines. On the other hand there are internal differences within SSH that also have implications on the orientation towards research evaluation. Thus we are faced with the challenge to examine differences within SSH disciplines.

In recent years the methodological foundations of social sciences (Sayer 1992; King et al. 1994; Goertz 2006; Moses and Knutsen 2007; Della Porta and Keating 2008; Brady and Collier 2010; Goertz and Mahoney 2012) and the position of social sciences with respect to general issues raised in the philosophy of science (Sayer 2000; Delanty 2005; Delanty and Strydom 2003; Benton and Craib 2011; Steele and Guala 2011) have been investigated thoroughly. A few cross studies (Steinmetz 2005; Walliser 2009a, b; Camic et al. 2011) have examined the differences across disciplines, while some other studies deal with the impact of social sciences in society (Flyvbjerg 2001; Brewer 2013; Bastow et al. 2014).

In parallel, a similar process started in Humanities, though somewhat less articulated, and partly as a response to the academic decline of these disciplines (Kernan 1997; Bate 2011; Belfiore and Upchurch 2013; Small 2013; Brooks 2014). Here a few historical comparative studies are also available (Bod 2013).

From this methodological and comparative literature, associated with related disciplinary studies, I have obtained a clear picture of the main epistemic problems addressed by various disciplines in SSH. In a recently published book (Bonaccorsi 2015) I presented a quite detailed reconstruction of the epistemic debate on four disciplines in SSH (history, political science, anthropology and English literature), combining historical material on the process of institutionalisation of the discipline in the academic system with an analysis of the main theoretical and methodological controversies. I strongly believe that this is a promising direction for research. Comparative studies that combine epistemic issues with institutional details will illuminate the way in which valid scientific knowledge is created. By taking disciplines as object of analysis I recognise that there are also internal distinctions within disciplines (Becher 1989; Abbott 2001) and try to take them into account.

This approach is not only useful to address the controversial issue of evaluation. It is my contention that entering into the epistemic black box of disciplines in SSH is also the only way to build up rigorous arguments to defend them vis-à-vis other disciplines, funding agencies and policy makers. There is a need to build up an argument about the scientific nature of SSH, based on a thorough recognition of the way in which they build up valid knowledge, though with epistemic processes that are completely different from the ones used in STEM. The damage to SSH generated by the wave of theorising that has suggested that they are just another way of producing texts instead of a truly scientific endeavour is currently underestimated. It is not enough to underline the pragmatic value of SSH in society. What is needed is a demonstration of the intrinsic validity of the knowledge produced by SSH scholars.

In this section I will sketch the main results of the detailed analysis carried out in Bonaccorsi (2015) and add other prominent disciplines in SSH. I review the four disciplines discussed at length in the book (history, political science, anthropology and English literature) and add other large disciplines in Humanities (philology, art history, psychology) and Social sciences (economics). The discussion below will be very concise. Interested readers are referred to references quoted in Table 1 and to the extended discussion and long reference list in the book.

Table 1 Epistemic differences across disciplines in SSH

I suggest that the orientation towards research evaluation is a function of four constructs, which combine historical factors with epistemic dimensions:

  1. (a)

    History of the academic institutionalisation of the discipline

  2. (b)

    Main methodological orientation

  3. (c)

    Position with respect to neo-positivism after Second World War

  4. (d)

    Position with respect to post-structuralism in the 1960s and 1970s.

Let me explain the building blocks of the model. By academic institutionalisation I mean the way in which a discipline comes to be separate from others, receive an academic label, is taught at universities in a separate way, academic positions are created and hence learned societies are formed. The institutionalisation process may be very long, taking decades (Becher 1989; Abbott 2001; Hyland 2012). A discriminant factor is whether a discipline is recognised from the beginning or is separated from previously existing disciplines. In the former case, newly created disciplines maintain the “memory” of their institutionalisation by keeping existing disciplines at a distance. They even challenge existing disciplines, either methodologically or substantively.

Thus for example English literature in the US academic system is the outcome of a separation process which took most of the nineteenth century to be completed. Literary studies were initially compressed between philology, which was the dominant discipline in US universities that followed the German educational model, and low level literature reading courses (Baldick 1983; Court 1992). The institutional separation took place in the early twentieth century and was associated with a deep reflection on the epistemic status of the discipline, as opposed to philology (Abrams 1997). Criticism was the solution to this differentiation process, but a consequence of this process was the need to discriminate between authentic literary works and other works. The formation of a canon was not initially a discriminatory practice but rather an epistemic necessity.

At the other extreme, history came to be institutionalised in universities shortly after its epistemic rationalisation. Its methodological foundations have been discussed for decades but within strongly held disciplinary boundaries, with limited need to differentiate from other disciplines.

The second building block is methodological orientation. A classical distinction is between a nomothetic orientation, or the formulation of law-like generalisations, subsuming a large empirical reality under general laws and an idiographic orientation, or the tendency to offer a detailed qualitative description of unique cases or situations. The methodological orientation is not unique in most SSH, although in some of them one can recognise a dominant paradigm and a few minority positions. In most cases there is a coexistence of a pluralism of methodologies, usually associated with higher level choices like the choice of the level of analysis or the value-laden presuppositions.

To make again two opposite examples, in economics the orientation taken by the mainstream, neoclassical school is that propositions take the form of law-like generalisations. The power to infer general conclusions from the examination of individual observations comes from the adoption of a modelling methodology, framed into a mathematical language, associated with rigorous inferential techniques from statistics and econometrics. At the other extreme, anthropologists refuse altogether the notion of law-like generalisations and instead offer long descriptions (Geertz 1973, 1983), or rich, extended, articulated descriptions of unique empirical settings, from which theoretical conclusions can be originated (but without the support of inferential techniques) (Eriksen and Nielsen 2001).

The methodological spectrum, between nomothetic and idiographic approaches, does not render justice to the methodological issue. Another dimension is whether scientific disciplines have developed a core of methodological rules that are commonly accepted and socialised in the discipline from the earliest days of careers (typically, at the doctoral level). History, for example, has a well-developed syllabus of methodological texts and exercises that are built around the discipline of archival sources (Farge 1989; Potin 2013). Surprisingly, art history has a common core of methodological rules centred around the notion of attribution. Attribution of a piece of art to an author, or a period, or a style: a strict discipline that summarises a large variety of technical skills and methodological rules. Another discipline with a strong core of methodological rules is archaeology. Here we find a quasi-discovery kind of human science: archaeological excavations are similar to scientific discoveries, and the explanation of findings mobilises a large set of logical (mainly abductive) rules of reasoning.

The third and fourth building blocks examine the way in which the discipline has addressed two major philosophical and epistemological challenges of the twentieth century: the rise of neo-positivism in the 1930s and its diffusion in the academic environment after World War II, and the turbulent advent of post-structuralism, associated with the French school of Foucault, Derrida, Barthes, Lyotard, Baudrillard and with authors like Hayden White and Stanley Fish.

The neo-positivist position challenged non-STEM disciplines to demonstrate their scientific status (Bryant 1985). This pressure was felt strongly in the post-Second World War academic environment in the US, and found wide acceptance in economics, political science and psychology (see for example Lerner and Lasswell 1951; Bell 1982; Hilgard 1987), and, to a lesser extent, sociology (Bernstein 1983; Ross 1991; Platt 1996).

First let’s examine economics. On the one hand, in economics the issue of scientific validity has been addressed (solved, many would say) by combining two elements: the language of mathematics and the axiomatic approach to motivation, or to the reasons for action of reflexive agents. The formalisation of economic variables in mathematical language ensures the controllability of propositions, as derived logically from formal premises. The adoption of a mathematical language has made economics a somewhat separate social science. The axiomatic approach addresses the fundamental problem posed by Max Weber: social agents are not like inanimate objects, whose behaviour can be examined objectively. Agents have reasons for action (motivations) and representation of reality (beliefs). To interpret their behaviour we need a theory of their reasons for action and their representation. But by definition, reflexive agents may modify their motivations and/or representation in response to the modelling exercise by social scientists. This creates a circle that cannot be closed in the same way as in natural sciences. Modern economic theory does not address these issues, but relies on the assumption that agents behave according to a set of abstract criteria described by the theory of rational choice. The axiomatic foundation rests on a powerful philosophical and logical base, which gives plausibility and prestige to the assumptions.

On the other hand, psychology has made somewhat the opposite move. Motivations and cognitions are not assumed at an axiomatic level but are made themselves observable. By developing a powerful experimental apparatus, modern psychological research has reduced all issues of human behaviour to the observable level. In this sense psychology reacted to the neo-positivist challenge in a different way, by accepting the experimental method and/or qualifying quasi-experimental or naturalistic methods in a truly causal perspective. As Hilgard (1987) qualifies it, psychology is committed to “find regularities within limited domains” (p. 803). Economics and psychology are, however, two exceptions. Other disciplines in Humanities, such as history or anthropology, literature or art history, and in Social Sciences, such as sociology or (to a certain extent) political science, have followed a different path. In these disciplines the neo-positivist challenge has been, generally speaking, plainly rejected.

In turn, the post-structuralist challenge originated in the French tradition of human sciences as a reaction against Levi Strauss’s structuralism and linguistics (Gellner 1985, 1992; Lamont 1987). This tradition dissolved the distinction between scientific knowledge and folk knowledge by deconstructing the texts in which all kinds of knowledge are embedded. Contrary to the old tradition of philology, which aimed at reconstructing the true meaning of the texts, post-structuralists emphasise the radical indetermination of the meaning of texts and the interaction between the text and the readers (following the lesson of hermeneutics after Gadamer and Ricoeur, but also after the theory of reception of literature by Iser 2012 and Jauss 1978). Consequently, there is no distinction between texts of various nature: for example, texts written by historians could not claim superior validity to texts of fiction (Rosenau 1992; Sarup 1993).

Two opposite reactions are worth examining. In history, the influential work of Hayden White (White 1973, 1978, 1987) tried to demonstrate that the notion of “historical truth”, which was the backbone of the profession since the eighteenth century (Carr 1961), was void of content and ideological. In a number of brilliant books he made the point that historical reconstructions cannot claim any validity in addition to what can be obtained rhetorically for any kind of text (Nelson et al. 1987). Several authors refined his arguments and developed a post-structuralist theory of writing history (Appleby et al. 1994; Megill and McCloskey 1987; Jenkins 2003). Interestingly, the scientific community reacted negatively. Momigliano (1984) defended vigorously the peculiar notion of truth that is the normative goal of professional historians. Several books were written to reject this theory (Windschuttle 1996; Evans 1997, 2001; Iggers 1997), and its arguments are in practice no longer discussed in the community of historians. A reaction against this approach was also developed in philosophy (Boghossian 2006).

Several steps in the methodological evolution of the discipline in the twentieth century help to understand this fierce reaction. First, the discipline had already addressed the issue of the subjective role of historians in selecting archival sources, after the seminal works of Maurice Halbwachs (1925, 1950). Furthermore, it also addressed the epistemological issue of the nature of historical “proof”, after the ambitious formulation of the “paradigma indiziario” by Carlo Ginzburg (1986) and the methodological programme of microhistory. Second, historians rejected the defence that authors like Derrida offered for Paul de Man, a Belgian author who migrated to the US, whose early work was found guilty of supporting Hitler’s theses. Derrida and others argued that authorship is a collective enterprise and that responsibility was to be assigned to the context, not to the author. Professional historians strongly rejected this line of argumentation. Finally, historians in Europe and the US had to face the wave of revisionist writers, who claimed to be legitimate academic historians but denied the Holocaust. In this occasion, academic historians prohibited these authors from giving seminars in university departments, with the argument that revisionism has no scientific grounds. Summing up, history has developed a deep and articulated epistemic approach by building up a sophisticated methodological toolbox and by reacting vigorously to challenges, either from within the discipline or from outside.

The story is different for English literature. In this discipline the post-structuralist call for breaking the authority of authorship and deconstructing texts was embraced with enthusiasm. An influential argument was that all value judgments are contingent (Herrnstein-Smith 1988). The diffusion of new curricula in US universities about minority literature is a consequence of the critical approach to the formation of the canon (Bloom 1994), or the list of academically admissible texts, to be studied by students and read in the classroom (von Hallberg 1983; Bérubé and Ruth 2015). An entire new disciplinary field was created, labelled Cultural Studies, in which the methodological tools do not come from philology or literary criticism but from human and social sciences.

Summing up, there are visible differences in the way in which disciplines in SSH have developed, institutionally and epistemically, since their foundation in the academic context. A comparative historical and epistemic analysis sheds light on interesting differences. It is my contention that these differences may explain the approaches taken by these disciplines with respect to research evaluation. I am now providing a sketch of this model based on considerable international literature (reviewed extensively in Bonaccorsi 2015), official documents of learned societies in Europe and elsewhere, and recent Italian experience. Table 2 summarises the main argument.

Table 2 Explanatory model of orientation towards research evaluation in selected SSH disciplines

The definition and measurement of orientation towards the evaluation of research is an interesting issue that deserves further research. For the time being I summarise the evidence by illustrating a spectrum of positions, as follows.

  1. (a)

    Bibliometric orientation: acceptance of quantitative evaluation based on indexed journals and citation measures.

  2. (b)

    Positive orientation with extensive consensus on research quality criteria to be used in peer review.

  3. (c)

    Positive orientation with controversies on research quality criteria.

  4. (d)

    Negative orientation.

Economics and psychology are among the few disciplines in SSH that have accepted bibliometric evaluation, since their epistemic evolution has led to the acceptance of journals as the main communication channel. Their nomothetic orientation makes the comparability of results easier. They basically accepted the neo-positivist challenge to demonstrate their scientific status (although with different solutions) and rejected altogether the challenge of post-structuralism.

I predict a positive orientation towards research evaluation when disciplines have built up strong epistemic foundations and a shared methodological core. This feature can be found in history (archival work), philology (textual analysis), art history (attribution), archaeology (excavation) or anthropology (extensive description). All these disciplines rejected both neo-positivism and post-structuralism. A distinction can be made among them on the basis of the importance of epistemic pluralism. In disciplines such as history, philology and art history there is a large consensus on a number of criteria of research quality, even across differences in approaches (e.g., for historians of different political orientation).

There is a strong belief in the possibility of evaluating the quality of research work even if it is carried out by authors with an opposite overall orientation. In anthropology or political sciences the epistemic differences are somewhat more problematic. Political scientists taking the rational choice orientation are in conflict with those assuming the historical-comparative approach and vice versa (Apter 2001). In anthropology there is a strong core of methodological criteria, but there is also a tradition of critical thinking that emphasises the importance of dissent, minority positions and political activism. In addition, a school of study adopted hermeneutics as the main methodology (Clifford 2005; Clifford and Marcus 1986), not without conflicts (D’Andrade 1995). One would classify anthropology somewhat between positive and negative orientation (Barnard 2000). These issues do not lead these disciplines to reject the evaluation altogether (although some authors in anthropology are among the most active against it: see Power 1997; Strathern 2000; Amit 2000; Dahler-Larsen 2012), but raise a series of fundamental questions about the preservation of pluralism and procedural fairness.

Finally, I predict a negative orientation if a discipline was born in conflict with other, more established and “scientific” ones, rejects the notion of a core of methodological foundations and rather accepts the post-structuralist claims. My analysis here is limited to English literature, for which the historical track record is extensive. In this discipline most authors subscribe to the notion that research evaluation is just another way for establishing domination in the academic world, limiting academic freedom. The linkage between evaluation, with its request for a set of agreed research criteria, and the formation of the literary canon, according to these authors, is too strong to stay unnoticed. Since the rejection of the canon (as defended by Bloom 1994) is one of the most recent foundational steps of the evolution of the discipline, it is no surprise to observe a rejection of evaluation as well.

There are other possible candidates for this negative position, including some schools in anthropology, sociology, philosophy (mainly Continental), or literature in various European countries (in addition to Italy, I would mention in particular France and Germany) (see for example Citton 2010 or Eagleton 2015). But to conclude in this direction would require a large scale comparative and historical analysis, similar to the one already carried out in the book.

My results are broadly consistent with the findings of Lamont (2009), one of the few comparative studies on differences in evaluation criteria of SSH disciplines. While Lamont is more interested in observations on procedural fairness in the evaluation process, I converge with her observations moving from an epistemic perspective.

4 Taking into Account Epistemic Differences and Evaluating with Fairness

The discovery of large differences within SSH disciplines, of course, creates a major problem for researching evaluation. As noted by the historian of architecture Carlo Olmo (in Bonaccorsi 2015) what is needed is a theory of reception of evaluation, a theory that takes into account the epistemic issues of various scientific communities.

Is such an effort feasible? Before answering this questions it is necessary to review the arguments, put forward in the international debate, that cast doubt on the feasibility and merit of evaluation.

In the last decades there has been a critical movement that has issued arguments based on the works of Michel Foucault (Foucault 1978a, b). His analysis of the modern institutions in medicine, public health, psychiatry, sexuality, prisons had an enormous influence. He has in fact realised the Holy Grail of the explanatory power of social sciences: to show that, underlying the observable reality and against all easily available evidence, there lies an order that is constructed by social actors following implicit rules of behaviour. These rules are not explicit to actors but are instead hidden behind apparently neutral and objective devices, like classifications, categories, numbers, and standards. These social devices shape reality in such a way that they make compliance the only rational behaviour, and deviance from the rules an anomaly.

More recently, in parallel with the surge of evaluation systems and the construction of indicators, the arguments from Foucault have been applied to the fields of higher education and science. According to a number of authors, evaluation realises the kind of surveillance identified by Foucault (Foucault 1966, 1975; Dean 1990) as the dominant trait of modern societies. The activities of universities, which have traditionally been entrusted with autonomy and academic freedom, are increasingly subject to inspection, measurement and evaluation by bodies external to academia (Power 1997; Strathern 2000; Amit 2000). These bodies incorporate instrumental rationality, by asking universities to behave as producers of identifiable objects, like publications, and not as critical social actors. Their rationality is inevitably associated with technical instrumentation, which is ideologically dangerous because it hides the manipulation behind apparently neutral technical indicators, which are presumed to be value-free and objective. The social acceptance that is given to authorities that use numbers is just a form of subordination to a new form or power. Thus quantification of social reality is just a form of power, dressed with the clothes of objectivity.

These arguments are fascinating. The interpretive power of Foucault’s work is large, particularly in the studies based on meticulous philological work. Yet, I believe we have to resist this fascination. Like Ulysses with the sirens, we should listen to these arguments while being tightly tied to the mast. In my understanding, the mast is empirical research. Good social sciences should open the way to replicability of empirical findings in different settings and contexts, in space and time. And the findings should be subject to the kind of control that is required to give claims a scientific status.

It is true that commensuration is a form of power. Following Bourdieu (1984) universities are producers of cultural capital, a scarce resource that is distributed unevenly in society and consolidates power asymmetries. But the interesting question is whether this power is compatible with modern democracy, or is inevitably associated with manipulation and control.

To start with, the application of numbers to social reality is not the product of industrial capitalism, but goes back to the seventeenth century. It is in this period that Pascal and Bernoulli lay the foundations for what is now the theory of probability, by giving a mathematical foundation to the notion of uncertainty. Their ideas were soon applied to social events, such as insurance against damages, or mortality tables of the population of large cities. Over time, the need of governments to raise taxes and to create welfare institutions required a standardisation of collection of data across regions and countries, making increasing use of social statistics (Anderson 1988; Anderson and Fienberg 1999; Bulmer et al. 1991).

Patriarca (1996) has shown that in nineteenth century Italian kingdoms the application of statistics to social reality was just an extension of ideas from the Enlightenment, against traditional sources of power. Indeed, statistics were a powerful tool to extend the domain of controllable knowledge, against the claim that social phenomena could only be examined using tacit, non-articulated, comprehensive kind of knowledge, as it was in traditional societies.

Classical studies in the history and sociology of statistics, such as Desrosières (1993, 2008a, b, 2014), Porter (1995) and Stigler (1999), have done a wonderful job in showing how social statistics are not the kind of objective type of knowledge that the public believes, but are inextricably linked to political power and its goals.

I argue that these contributions do not necessarily lead to reject commensuration (Espeland and Stevens 1998). The transformation of social reality into numbers is a fundamental way to extend social control over reality itself (Dudley Duncan 1984; Crosby 1997). It is a profoundly democratic process, although one that, contrary to other institutions of modern democracies, is more difficult to understand and is bound with more technical details. The fact that the public tends to trust numbers without questioning their origin and meaning, and therefore is subject to manipulation, is indeed true but does not detract from the importance of using numbers. Other institutions of modern societies, such as the media, are highly vulnerable to manipulation, but no serious scholar of modernity would deny their role. The fact is, commensuration requires a scientific approach unlike communication, entertainment, or journalism. The scientific approach is not intuitive and does not conform to common sense. Rather, it is highly counterintuitive and requires hard discipline and control of the reasoning process. People are not intuitive statisticians, as they are not intuitive scientists. Therefore commensuration tends to be cultivated only in small circles of experts whose mission is to devise ways to collect data, transform them into information, process information with the use of indicators and other tools in order to produce meaningful knowledge.

Commensuration is therefore an intrinsic part of modernity. The critical attitude towards commensuration, based on Foucault and Bourdieu, and more recently on the psychoanalytic and psychotherapeutic movement (see for example Gori and co-authors: Gori 2011, 2013; Abelhauser et al. 2011), does not serve democracy in society well.

Having stated this general point, let us turn to the question of whether SSH research may be subject to commensuration. SSH scholars have their own quality criteria. When they read a book, they are able to formulate a qualitative judgment about the merit of the underlying research. These judgements are robust with regard to ideological, methodological and political differences, even of a strong nature. The challenge is whether qualitative judgments can be reliably transformed into quantitative measures, and whether these measures are comparable.

New results in social sciences support the view that this is indeed possible. On one hand, Michèle Lamont (2009) has persuasively shown that scientific communities in SSH have their own research quality criteria, expressed in qualitative terms but firmly held by the members. She also shows that communities that do not commit to the elaboration of such criteria due to the fragmentation of the discipline, ideological conflicts, or the weakness of methodological bases are communities that suffer from loss in reputation, cohesiveness and attractiveness for students.

On the other hand, recent developments in decision theory (Balinski and Laraki 2010) make it possible to conclude that qualitative judgements can be reliably transformed into measurements, even without imposing the unrealistic assumptions associated with rational choice theory. Furthermore, the aggregation and comparability of measurements does not necessarily require the even more unrealistic assumptions required in order to avoid the famous Arrow impossibility theorem (List and Pettit 2011). What is needed is not commensurability (or the existence of a common measurement), which is too demanding, but comparability, which is, according to philosopher Ruth Chang, always possible (Chang 1997, 2002). What is needed is just the moral and political willingness to compare, for the purpose of achieving socially beneficial goals (Bagnoli 2006).

Summing up, an epistemic approach to research evaluation is promising. It recognises the large differences in the epistemic procedures of disciplines in SSH (an argument extensively discussed by Ochsner, Hug and Daniel 2016) and opens the way to understanding their reception of evaluation. At the same time it firmly confirms that SSH disciplines may converge on a core set of discipline-specific quality criteria without violating epistemic pluralism, academic freedom, and the right to dissent. Once these core criteria are discursively established, people will produce qualitative judgements, not quantitative ones, on pieces of research produced by peers. Recent results in decision theory and social psychology confirm that it is possible to transform these qualitative judgements into an ordering (not necessarily a ranking), without violating personal preferences of the evaluators.

A necessary condition for evaluation is procedural fairness. This requires the adoption of a mix of procedural devices, such as transparency in the selection of experts, self-candidatures, rotation of experts, duplication of roles in presence of severe antagonism among schools, short periods in charge. A permanent dialogue with scientific communities should be kept open. Quality criteria must be published and regularly updated. Detailed and continuous work on the drafting and wording of questions to be adopted during the peer review process is also needed. All these solutions (and others) are important to generate trust on the side of evaluated researchers. It is a long process.

Under these conditions, research evaluation in SSH will not be accepted as a necessary evil, but as an occasion to re-open, or sometimes to establish from scratch, a self-reflexive exercise on research quality criteria. It is my contention that this exercise is valuable not only for SSH disciplines, but also for society.