The article “Blurring the distinction between empirical and normative legitimacy? A methodological commentary on ‘Police legitimacy and citizen cooperation in China’” by Johnathan Jackson and Ben Bradford (2019) criticizes one of the alternative ways of defining, measuring, and modeling the concept of police legitimacy by Sun et al. (2018). The most valuable contribution of their essay is that they have warned us against taking on a measuring device without carefully considering the theoretical implications. Rightfully, one of the criticisms is that the confirmative factor analysis modeling is not a good adjudication tool to differentiate possible sources of legitimacy and constituent components of legitimacy. In addition, it raises a more fundamental question: How should social scientists conduct their research? However, in the process, Jackson and Bradford have made several sweeping claims. The tones of their broad and absolute claims prompt this article. Therefore, in this rejoinder, we argue that theory should and must be tested across cultures as a priori and that the approaches to the measuring of police legitimacy adopted by Sun et al. (2018) and by Tankebe (2013), and by scholars such as Huq et al. (2017), Tyler (1990), and Tyler and Jackson (2014), both have merits. Neither of them is complete and perfect. Ultimately, we argue that many methodological issues are, in fact, unresolved or unresolvable at this point in the study of legitimacy.

Legitimacy and its Test across Cultures

The study of legitimacy is hot these days as legitimacy is a hallmark of all major contemporary societies. Both democratic governments and authoritarian regimes claim to have legitimacy. Tyler’s influential book Why People Obey the Law (1990) plays a critical and defining role in the popularity of studying police legitimacy.

Legitimacy, however, is a “complicated and contested concept” (Cao and Wu 2019, p. 4) and is “elusive and multifaceted” (Bottoms and Tankebe 2012, p. 168). According to Tyler and Jackson (2014), legitimacy is an abstract and unobservable psychological construct. Even a brief discussion of its complexity is beyond the scope of this rejoinder. We thus focus only on the definition of legitimacy in the context of policing.

Building on the extant literature, Jackson (2018, p. 147) uses legitimacy to describe whether “those that are subject to authority confer legitimacy on their authority” and legitimacy “concerns the normative justification of power in the eyes of those who have to abide by that power structure.” In the current article, Jackson and Bradford relabeled what they previously called “popular legitimacy” (Jackson 2018; Tyler and Jackson 2014) as “empirical legitimacy.” The term is indistinguishable from Tyler’s (2002) early concept of “subjective legitimacy” (2002). The term “subjective legitimacy” was used by the Committee to Review Research on Police Policy and Practices. Tyler urges researchers to “utilize subjective legitimacy as a criteria of police legitimacy” (2002, p. 71). He argues that

the belief that it is important both for the police to be legitimate, in the sense that they bring their actions into line with the law and the norms of appropriate police conduct, and for the police to be seen as legitimate by the residents of the communities they protect. It is not enough to focus on the actual quality of police performance, since police agencies may execute their job duties effectively and constitutionally and still find themselves without community support. (p. 72).

Other scholars focus on one aspect of legitimacy—confidence in the police—and develop the line of inquiry (Cao et al. 1996; Cao and Wu 2019; Ren et al. 2005; Sampson and Bartusch 1998; Taylor and Lawton 2012; also see Jackson and Bradford 2010; Tyler 2011). These authors regard legitimacy as a matter of public reception of police activities. In this sense, legitimacy is germane to confidence in the police because the final judgment of police actions is in the eyes of the public whom police officers swear to serve and protect (Cao and Wu 2019; Tyler 2002).

Citing David Beetham (1991) and Jean-Marc Coicaud (2002), who built upon Max Weber (1968), Bottoms and Tankebe (2012, p. 147) conclude that Jackson, Tyler, and their colleagues’ definition of police legitimacy, which stress principally the reactions by citizens to the decisions and rules made by the police, is in accord with the central components of what is labeled as audience legitimacy (also see Nagin and Telep 2017: perception of legitimacy). They (2012, p. 147) argue that audience legitimacy “covers most of the ground in answering Tyler et al.’s important question about what factors create and sustain audience legitimacy.”

In Jackson and Bradford’s (2019) article, they cite the definition of political philosophers who divide legitimacy as an empirically-laden legitimacy by social scientists and normative legitimacy by a group of outside experts (see Hinsch 2010). A similar dual approach is referred to in anthropology, as the emic-etic dilemma (Davidson et al. 1976)—emic, gaining insight from the perspective of individuals within a group, and etic, gaining insight from the perspective of an outside observer. Anthropology values both approaches, and there is no doubt that criminology does as well.

Jackson and Bradford (2019) argue that “the content of legitimation (i.e., the bases on which legitimacy is justified or contested) are an empirical question.” Therefore, it cannot be studied with an a priori definition. Put differently, they argue that police legitimacy is place-specific and culture-specific.

Their overarching argument, we contest, is too broad and too absolute. It is not entirely wrong because it is consistent with a phenomenological perspective—meaningful human action can be only open to qualitative appreciation. However, as social scientists, Jackson and Bradford (2019) justify their lambasting of Sun et al.’s approach using the phrase “cultural sensitivity”.Footnote 1

Consequently, Jackson and Bradford (2019) raise a fundamental question: how should a researcher conduct an empirical test of a theory? The root of this debate is the theory-observation nexus (Cao 2004; Gibbs 1985; Merton 1968). More specifically, can a theory be general enough that it should be tested across cultures? Put more critically and formally, is theoretical impregnation of observation circular?

There is undoubtedly a connection between the content (the unit of analysis, the variables, and the hypothesized relationship) and context. Space is important (Sampson 2013), so is culture (Ferrell et al. 2004; Young 2011). Above all, context is important as it has consequences (Cao 2004; Lilly et al. 1995). The context delimits the time as well as the area of application of a theory. Indeed, we do not oppose the general idea that content or culture, especially vernacular culture, depends on the context. On the other hand, the theoretical concept of legitimacy is general, and one criterion of a good concept/theory is its applicability to different contexts (Akers 1997; Cao 2004; Farrington 2015; Tittle 1985). From the classic argument of Popper (1968), theory is a priori judgment, and testing a theory across cultures must have such a priori assumptions. After all, one purpose of testing a theory is to verify the regularity of relationships in different locations/context (see Liu 2009, 2017), and the boundary conditions under which the theory does or does not hold (Farrington 2015, p. 387). This is the case if the test results of a theory are stated in the “ceteris paribus” form.

According to Jackson and Bradford (2019), Sun et al. (2018) imposed the definition of legitimacy developed in England to the Chinese public, and as a result, they got what they intended to get. Jackson and Bradford advocate that empirical legitimacy must tie to a local culture and when testing legitimacy in a new context, one must not assume any prior concept for the locals. According to their alleged cultural sensitivity approach, legitimacy can only be a bottom-up thing and can only be studied culture by culture because each culture may have different weights on the components of legitimacy.

We disagree. In doing research, we are trained to learn theory first and from theory, we deduce hypotheses to be tested. Then, we collect data in different locations to test the theory. A theory is a deductively connected set of empirical generalizations, and to test a theory’s generalizability is a hypothetico-deductive process (Cao 2004; Merton 1968; Popper 1968). It seems to us that this is what Sun et al. (2018) did in their research. They did have an a priori definition (theory), and they attempted to test whether the data are consistent with the expectation from the theory. The crime they are accused of is “it imposes an Anglo-Saxon perspective under the smokescreen of empirical discovery” (Jackson and Bradford 2019). If this conduct is condemned in conducting research, most of our tests of theory, especially testing theories developed in the West in a different culture or vice versa, are all guilty.

From our perspective, Jackson and Bradford’s introduction of political philosophers’ normative legitimacy has outgrown the original definition of Tyler’s legitimacy. Legitimacy in criminological research, we argue, is first of all, audience legitimacy, not political philosophers’ legitimacy. Second, the idea of police legitimacy may have originated in England. Manning (2010, p. 90) argues that “Anglo-American policing is democratic policing” and nowadays policing by consent has traveled to all corners of contemporary societies.Footnote 2 Even in authoritarian societies like China, most people in today’s global world seem to know what the police should and should not do. To test whether they, in fact, know this is genuinely empirical, not “the smokescreen of empirical discovery” (Jackson and Bradford 2019). Bottoms and Tankebe (2012, p. 145), citing Beetham’s (1991) claim, note that audience legitimacy is “common to all societies.” The legitimacy of police norms, ipso facto, has been forming internationally. It is, therefore, nearly universal.

In this section, we have discussed several definitions of legitimacy as well as one of our key disagreements with Jackson and Bradford (2019) on the fundamental issue of whether a theory is general enough so that it should be tested as a priori in cross-cultural settings. We believe that testing a theory in a new context is an important empirical activity and that a good theory is transcending.

The Measurements of Legitimacy

In addition to the theoretical and conceptual complications, there is no consensus on the measurement of police legitimacy. Much of the survey research in this area has inconsistently operationalized legitimacy (Mazerolle et al. 2013), and few evaluations have examined (see the exceptions, Gau 2011, 2014; Reisig et al. 2007) the construct validity of existing scales. The perfunctory attention to whether measured variables reflect the theoretical construct is partially responsible for the current debate.

Central to Tyler’s (1990) original measurement of legitimacy is “the perceived obligation to obey the law and as support for legal authorities” (p. 45). The sources of legitimacy are procedural justice (also see Tyler and Jackson 2014; Huq et al. 2017). Earlier attempts of the measurement of legitimacy (Sunshine and Tyler 2003; Tyler and Huo 2002) show that legitimacy can have as many as four subscales: obligation to obey the law, cynicism about the law, institutional trust, and feelings about legal authorities. More critically, when legitimacy is measured as a separate construct, the measures of procedural justice, distributive justice, effectiveness, and lawfulness can be used to predict legitimacy. Under this scheme, procedural justice, distributive justice, effectiveness, and lawfulness are considered legitimation. That is, they are the bases on which legitimacy is justified or contested.

In contrast to the above scheme, Sun et al.’s (2018) measurement of legitimacy combines procedural justice, distributive justice, effectiveness, and lawfulness. That is, legitimacy is comprised of procedural justice, distributive justice, effectiveness, and lawfulness. How did they arrive at this point in their research? This decision was made, as we see, based on three pieces of information. First, it was inspired by Bottoms and Tankebe’s (2012) theoretical analysis of legitimacy in which the authors concluded that Tyler’s definition of legitimacy does not cover all possible components of audience legitimacy. Second, the measurement comes directly from the prior study conducted by Tankebe (2013) and published in the top journal of the field Criminology. In that article, Tankebe argues against the conflation of legitimacy with the cognate concepts of “trust” and of “obligation to obey the law.”Footnote 3 Third, Sun et al.’s adoption of the measure was assisted by their own analysis of the data. That is, confirmative factor analysis (CFA) indeed supports this scaling empirically. We will come back to this point in a later section.

What is the logic ground for Jackson and Bradford’s opposition to Tankebe’s (2013) and Sun et al.’s (2018) approach to defining, measuring, and modeling of legitimacy? Jackson and Bradford base their objection not on the clarification of the concept of legitimacy, but on the traditional/charismatic authority. They cited Tyler and his colleagues’ works as “the standard approach to studying empirical legitimacy,” forgetting their own statement that “Legitimacy is an abstract and unobservable psychological construct, and there are numerous ways to operationalize the perceived right to power, aside from the standard ways of institutional trust and/or normative alignment and/or obligation to obey (Tyler and Jackson 2013)” (Jackson and Bradfort 2019).

As aforementioned (Cao 2004; Merton 1968), in testing a theory, researchers deduce propositions from the theory, formulate hypotheses, and test them against the data. The skepticism principle of science urges a scientist to think outside the box by questioning everything in an attempt to determine the validity of an argument. It seems to us that Sun et al. (2018) followed these scientific procedures by extrapolating legitimacy to China. Bottoms and Tankebe’s analysis of legitimacy (Bottoms and Tankebe 2012) leads to the conclusion that either trust or perceived obligation to obey the law can be straightforwardly equal to legitimacy, and they called for experimentation with fresh ways of measuring legitimacy. They specifically suggest that the new approach will necessarily “incorporate rather than supplant Tyler’s procedural justice argument, since its two dimensions—quality of decision-making and quality of treatment—are embraced with the notion of shared values” (Bottoms and Tankebe 2012, p. 166). As a result, people can disagree with the approach and should raise their concerns about the logical deductions, but they cannot reject or “falsify” it based on simply the authority status. After all, the preferred legitimate authority is the rational-legal type (Weber 1968).

In fact, we believe that Jackson and Bradford’s uneasiness with the new approach probably has also missed the persistent gap between concept and its measurements (Cao 2004). This is a long-term unresolved, and potentially unresolvable, barrier with social science as a soft science. We can call for the movement toward standardization as Cao did in his 2004 book, but if we use our authority status to declare any measure as the standard, it will become a cause célèbre and invite animated disagreements.

Again, we are not here to pass the final judgment on this issue, but pinpoint the fact that the two measures are derived from two different understandings of the meanings of police legitimacy, and, furthermore, there is a gap between the concept and measurement. Merton’s aphorism (Merton 1968, p. 494) 50 years ago is as relevant today as it was, “empirical research exerts pressure for clear concepts.” We must be attentive to the dangers of conceptual vagueness. Further, Miller (1994, p. ix) argues that “logic is used only to probe, never to prove.” More importantly, “Measurement is not operationalization but a transformation” (Pawson 1989, p. 122).

We also hasten to add that we are doing social science research; there are very few things in which we can be so absolute. Watkins (1968, p. 269) declares half a century ago that “The hope which originally inspired methodology was the hope of finding a method of enquiry which would be both necessary and sufficient to guide the scientific unerringly to truth. This hope has died a natural death.” Similarly, Miller (1994, p. 2) writes that “almost no one now supposes that empirical statement, even simple observational ones, can be established with certainty.” In concluding their debate on the measuring instruments and logical deduction, Gibson et al. (2002, p. 806) stated that “We state in no uncertain terms that our aim has not been to show that we are right and those in disagreement with us are wrong; at this juncture, empirically, there are precious few absolute rights or wrongs in criminology.”

In sum, concepts without standard measures are a serious problem that criminologists must confront. Many scholars (Cao 2004; Cullen et al. 2019; Graham 2018; Mazerolle et al. 2013) call for attention to the problems of conceptualization and measurement. Furthermore, Cao (2004) argues that the lack of standardization of key concepts resulted in the poignant performance of scientific criminology. One of the solutions he suggested is to make concerted efforts toward standardization, not to arbitrarily establishing or rushing to impose a standard.

Statistical Analyses of Legitimacy

In addition to the conceptual and theoretical critiques, Jackson and Bradford (2019) find fault with the logic of relying on CFA. This is a good point, and it can be taken as a word of caution. In fact, we agree that the CFA modeling is not a good adjudication tool to differentiate possible sources of legitimacy and constituent components of legitimacy.

Therefore, one important point Jackson and Bradford (2019) made is that the researcher should make their decision of whether the measurement actually captures the concept based on the extant literature and theory instead of totally relying on the statistical techniques, no matter how sophisticated these statistical references are. Put differently, statistical techniques can only assist and/or support theory, not the other way around. We cannot be held hostage by statistical techniques. Advanced and sophisticated statistical modeling makes our life easier, but it cannot replace human logic and it cannot make decisions for us (Cao 2004). Critical decisions should remain in the hands of the researcher. Confirmative factor analysis tells us how well the scaling is but does not tell us whether we should scale in this way or in another way.

For that matter, we believe that although the point is valid and relevant, we do not see that Sun et al. (2018) base their choice of the measurement solely on CFA. As aforementioned, they were inspired by theoretical insight of Bottoms and Tankebe (2013), and they followed Tankebe’s empirical lead (2013) in their research. Therefore, the question is NOT whether CFA can or cannot adjudicate an operational decision, but whether CFA can or cannot assist a researcher in reaching a decision. The answer is an affirmative one, writ large.

Jackson and Bradford (2019) oppose Sun et al.’s (2018) approach to defining, measuring, and modeling of police legitimacy, which they argue would leave little or no possibility of assessing which, if any, is the most important component of legitimacy. This is a strange statement and, in the footnote one, they repeat a similar point. Sun et al. (2018) could clearly assess which component of their legitimacy was most relevant within their models. Indeed, Sun et al. (2018, p. 288) revealed that “With the largest absolute value of the factor coefficient, lawfulness stands out as the most important variable in calculating the component of legitimacy, followed by distributive justice, procedural justice and finally effectiveness.”

Actually, Jackson and Bradford (2019) have done a service to the field of criminology by providing external validity for the legitimacy measure advanced by Sun et al. (2018) and Tankebe (2013). They found that legitimacy measured in this way is valid in all nations under their study. As we know, external validity is always an empirical question and it cannot be assumed a priori (Taylor 1994).

Finally, Jackson and Bradford speculate that Sun et al. (2018) may start “a trend in criminology, where researchers in a novel context use the same approach to defining and measuring police legitimacy.” As for whether such an approach will become a hipster or a trend, it is unknown because it is a future-oriented statement.

Concluding Remarks

Jackson and Bradford (2019) have advanced our understanding of the possible use and misuse of CFA. Unintentionally, they also provided us with the empirical evidence of the external validity of the measure of legitimacy as proposed by Sun et al. (2018) and by Tankebe (2013) in thirty nations. In the process, they have raised some fundamental issues in conducting research. That is, whether we could test the generalizability of a theory in a different culture and how we should do it. Their criticisms of Sun et al.’s (2018) work are important because they help sharpen our awareness of the gap between the concept and its measurement, thus making us ponder harder about the challenges that we face. Concepts and their measures are far from being a happy couple that goes together like a horse and carriage. They are separable and sometimes incompatible. Although the current perfunctory attention to this gap is deplorable, the debate, as we see it, is unlikely to settle. The jury is still out. We hope that the readers of this article will be the ultimate amicus curiae because we are not dealing with a completely right-or-wrong type issue. Like legitimacy, it is in the eyes of the beholder.

Although we have many disagreements, we applaud Jackson and Bradford (2019) for bringing their ideas on these topics to the notice of the discipline. In addition, we agree on the following two points: first, CFA modeling is not a good adjudication tool to differentiate possible sources of legitimacy and constituent components of legitimacy; and second, “There is space for alternative approaches to measuring legitimacy” (Jackson and Bradford 2019). Therefore, we echo Bottoms and Tankebe’s call (Bottoms and Tankebe 2012, p. 166) for continued “experimentation with fresh ways of measuring legitimacy.” We also hope that imposing an a priori (top-down) definition to test legitimacy in a different culture AND uncovering the variation of legitimacy from the bottom-up can BOTH be considered a part of “numerous ways” to capture a more complete and more inclusive picture of legitimacy. Finally, as researchers, it is legitimate for us to vacillate between these two opposing, yet not necessarily contradictory, approaches.