1 Introduction

Schneider and Wagemann (2006) propose a separation of conditions into two distinct groups—remote and proximate—and to analyze the impact of these conditions on the outcome in a stepwise manner with qualitative comparative analysis (QCA). This procedure has been labeled as two-step QCA and pursues two, mutually non-exclusive, goals: a mitigation of the problem of so-called limited diversityFootnote 1 and the explicit modeling of the effect of contextual factors on the phenomenon of interest.

While the general logic of the two-step protocol seems to resonate with a broad range of scholars,Footnote 2 it is so far rarely (successfully) applied. Maggetti (2009) analyzes the role of independent regulatory agencies; Sager and Andereggen (2012) apply a two-step QCA in the field of evaluation review; Schneider (2009) investigates which political-institutional configurations lead to the consolidation of democracy in different societal contexts; and, along similar lines, Tomini and Wagemann (2018) probe the conditions for democratic breakdown and regression in diverse settings. Sedelmeier (2016) investigates the compliance record of the EU’s post-communist members. Toots and Lauri (2015) use two-step QCA in order to investigate the quality assurance policies in civic and citizenship education. And Mannewitz (2011), though not applying two-step QCA, provides further clarifications to the protocol.

One reason, so I argue, why the two-step approach is not yet put to use more often in applied research is because of the nature of the first step.Footnote 3 Schneider and Wagemann (2006) conceptualize the first step to be an analysis of sufficiency with low consistency values. Not only has this always stood on shaky set-relational grounds, it also has partially undermined the goals of reducing limited diversity and providing clarity of the contextual effects on the outcome. I therefore argue that the first of the two steps in the protocol should be redefined as an analysis of necessity and only step 2 understood as an analysis of sufficiency. While already implicit in its original formulation, this crucial feature of the two-step QCA approach has largely been overlooked and its analytical consequences not spelled out in sufficient detail.

First, finding necessary context conditions in step 1 limits the choice of logical remainders for counterfactual claims during step two. Here the paper will make use on the recent writings on easy and difficult counterfactuals (Ragin 2008) and untenable assumptions (Schneider and Wagemann 2012). Hence, while the original version of the two-step protocol was already presented as a partial remedy against limited diversity, one important reason for this claim is only spelled out in this paper. Second, in applied QCA it is the rule rather than the exception not to find any single (remote) condition being necessary for the outcome. Yet, recent set-theoretic writings have highlighted the role of so-called SUIN conditions (Mahoney et al. 2009) and new software developments (Dusa 2018) are making the analyses of SUIN conditions easier to implement. In short, by revising the two-step approach, this paper makes it stand on firmer set-theoretic ground, links this approach to newer writings in the set-theoretic literature, and makes it applicable to a broader range of data.

The paper proceeds as follows. Section 2 contains a brief primer of the two-step approach. I first summarize its general logic and rationale. Then I highlight the weaknesses of its present form, after which the notion of SUIN conditions is spelled out in Sect. 3. Section 4 presents the updated version of the two-step QCA approach. Details of the updated step 1 and how to combine it with step 2 are outlined. In Sect. 5, I illustrate the updated two-step QCA protocol with an applied example. Section 6 summarizes the argument and briefly discusses how the two-step QCA approach relates to other set-theoretic approaches that also deal with structures in the data similar to that of remote and proximate conditions.

2 The old two-step QCA protocol—a primer and problems

I first summarize the basic rationale of the two-step approach and then highlight the problematic conception of the first step as an analysis of sufficiency.

2.1 The rationale of the two-step

The backbone of the two-step approach is the distinction of the conditions into so-called remote and proximate factors.Footnote 4 Many theories in the social sciences—explicitly or implicitly—make such distinctions (Kitschelt 2003) and, depending on the theory at hand, can be based on various, mutually non-exclusive dimensions: temporal, spatial, the causal distance to the outcome, and/or the purposeful (non-)malleability by actors.

Remote factors originate farther back in time and are often located farther away in space. Mostly because of this, remote factors tend to be stable, cannot be subjected to purposeful changes and are, instead, given to actors. This is why remote factors are often also referred to as the context within which processes unfold and actors act. Remote factors are therefore usually theorized as being causally more distant conditions. They do not directly produce the outcome but provide the context within which proximate conditions unfold their effect on the outcome (Thomann and Manatschal 2016; Blatter and Haverland 2012). Another related property of remote factors is that they often are not attributes of the unit of analysis itself. Proximate factors, in contrast, originate closer to the outcome, both in time and space, are more volatile, are often subject to conscious manipulations by actors, and denote characteristics of the unit of analysis. They are also theorized as being causally closer to the outcome than remote factors. In the two-step approach, they are not conceptualized as effect of remote conditions.

The two-step QCA approach translates this analytic distinction into a protocol for empirical analysis. In a first step, only the remote factors are analyzed. Schneider and Wagemann (2006, p. 761) invoke the notion of ’outcome-enabling conditions’, that is, the analysis of only remote conditions is not meant to fully explain the outcome but simply identify the circumstances under which the outcome is made possible.

In order to achieve this, the old two-step protocol so far stipulates that step 1 is run as an analysis of sufficiency, with a purposefully low raw consistency threshold, and by producing the most parsimonious solution. For step 2, only those remote conditions are analyzed together with proximate factors that have passed the empirical hurdles set in step 1. The goal in step 2 consists in identifying the remote contexts within which combination of proximate factors produce the outcome. According to the old protocol, in step 2 the raw consistency threshold should be set at high levels and the conservative solution be chosen.Footnote 5

In addition to providing a fairly specific methodological protocol on how to implement the pervasive theoretical notion of remote and proximate conditions and to thus identify the contextual effects on the outcome, the two-step approach has also claimed to mitigate the problem of so-called logical remainders. As Schneider and Wagemann (2006) show, compared to a one-step QCA, the number of potential logical remainders is drastically reduced when splitting the group of conditions into two, as is done in the two-step approach. For instance, with eight conditions, in a one-step QCA there usually will be over 200 logical remainders, whereas in the old two-step QCA in the best-case scenario (an even split of remote and proximate conditions) this number cannot be higher than around two dozens.Footnote 6

2.2 Problems with the old two-step protocol

The crucial shortcoming of the old two-step protocol is the unclear set-theoretic status of the solution formula generated by step 1. By requiring low raw sufficiency consistency, step 1, in essence, proposes as contextual conditions for the outcome factors that are neither sufficient nor necessary.Footnote 7 Consistency was suggested to be low because otherwise there would nothing else left to be explained for the proximate conditions in step 2. Schneider and Wagemann (2006) further suggest the parsimonious solution in step 1. This was done in order to keep the number and complexity of remote contexts at bay. Both, low consistency and the parsimonious solution, are research-practical suggestions rather than methodological in nature.

Apart from the fact that it is unconvincing to apply QCA, method designed for detecting set relations, for not finding such relations, several other disturbing consequences have followed from the advice to run an analysis of sufficiency with a low raw consistency threshold. First, more often than not, many remote conditions turned out to be outcome-enabling and were thus moved into step 2. As a consequence, the old two-step protocol has tended to produce a (much) higher number of sufficient terms than a one-step approach does. This debilitates the researcher’s capacity of providing a theoretically sound (and succinct) interpretation of the results.

A second analytic pitfall caused by the low consistency criterion has been that remote conditions could, and often did, disappear from the solution formula after step 2.Footnote 8 This is a rather troubling phenomenon, not only because in step 1 those conditions had been declared as outcome-enabling and therefore should be part of all sufficient terms. It is also troubling because the very purpose of the two-step QCA approach is to identify combinations of remote and proximate conditions. If, however, the old two-step procedure currently in place is prone to yield sufficient terms exclusively composed by proximate conditions, then this signals a weakness of that protocol.Footnote 9 Notice that the disappearance of remote conditions in step 2 of the old protocol is not a sign that the distinction between remote and proximate factors is necessarily wrong for the data at hand. Instead, it is a methodological artifact of a flawed protocol.Footnote 10 As I discuss below (Sect. 4), one advantage of the updated two-step QCA protocol is that it spells out explicit (empirical) criteria under which the two-step QCA approach is not applicable in a given study and should be abandoned.

In defense of the old two-step protocol one could argue that it is not a disadvantage if some paths leading towards the outcome do not include any contextual condition—if this more adequately represents the structure behind the occurrence of the outcome. To this I would respond that, yes, of course, not all explanations of social phenomena must consist of remote-proximate conjunctions of conditions. However, the use of the two-step QCA approach is motivated by the goal of identifying precisely such remote-proximate conjunctions. The logic of the two-step QCA is such that it assumes the existence of contextual factors and the goal is to specify which contextual factors these are. Whereas the updated protocol (see below) stays true to this basic feature of the two-step approach, the old protocol with its in-built possibility of disappearing contexts does not. As a consequence, if a researcher has convincing reasons to dismiss contextual factors in at least some of the sufficient terms leading to the outcome, she is better served by simply performing a one-step QCA. The results thus obtained will generate some sufficient terms with (only) and some without remote factors involved.

Fig. 1
figure 1

Graphical representation of old two-step protocol

Figure 1 provides a graphical representation of the old two-step protocol. Step 1, depicted on the left-hand side, aims at identifying an inconsistent set relation. Having an inconsistent sufficiency relation means that there are cases with remote condition A or BFootnote 11 that are outside of outcome set Y. More problematically, short of perfect coverage of the inconsistent sufficient relation based on remote conditions, there are also instances of outcome Y without any context, i.e. cases outside of \(A+B\) but with Y.Footnote 12 In step 2, depicted on the right, researchers then obtain various proximate sufficient terms P by performing separate step 2 sufficiency analyses for each context identified in step 1.Footnote 13 Some of them contain the remote context identified in step 1 (\(P_1\) and \(P_2\) in Fig. 1), but others might not (\(P_3\)). The former two-step protocol does not rule out this possibility of sufficient terms that exclusively consist of proximate conditions.Footnote 14

3 SUIN conditions and the two-step approach

As mentioned, already in its original version, Schneider and Wagemann (2006) define as the goal of step 1 the identification of outcome-enabling conditions, i.e. the context within which the outcome can occur. In essence, thus, the purpose of step 1 is to identify the necessary conditions for the outcome. Perceiving remote conditions as necessary is also in line with the notions (a) that they are causally more distant than proximate factors and (b) that these conditions alone are not supposed to already fully explain the occurrence of the outcome.

Given that the interpretation of step 1 as an analysis of necessity is straightforward, why has this not been articulated before? One reason seems to be the sufficiency bias in QCA-based research, which was even more pronounced a decade ago. A second and related reason might be that Schneider and Wagemann (2006) themselves left room for ambiguity when conceptualizing remote conditions not only as outcome-enabling, but also as 'fostering’ and 'enhancing’. As Mannewitz (2011) rightly points out, the latter two terms suggest a relation of sufficiency rather than necessity. A third reason for why step 1 has not been perceived in terms of a necessity analysis could be that, until some years ago, research on necessary conditions tended to be confined to the analysis of single conditions. Because often times there simply are no single necessary conditions to be found in the data, the entire two-step approach would often not have been applicable if step 1 had consisted in a necessity analysis of single conditions.

For some time already, literature on so-called functional equivalents or macro-variables (Berg-Schlosser et al. 2008; Rokkan 1999), argues that either one condition (A) or another condition (B) can represent a higher-order concept (C), i.e. \(C= A + B\). Yet, it is only since Mahoney et al. (2009) that the notion of so-called SUIN conditions has been systematically spelled out in the context of set theory. SUIN conditions in necessity analyses are akin to INUS conditions in sufficiency analysis. SUIN is the acronym for “a sufficient but unnecessary part of a factor that is insufficient but necessary for an outcome” (Mahoney et al. 2009, p. 126). For example, if empirically \(A + B\) form a non-trivial consistent superset of Y\(( A + B \leftarrow Y)\) and A or B are conceptualized as functionally equivalent attributes of a higher-order concept C\((C= A + B)\), then both A and B are SUIN conditions for Y.

Mahoney et al. (2009) develop their notion of SUIN conditions in the framework of historical explanations with a specific focus on elaborating historical sequences in, supposedly, a small number of cases. However, SUIN conditions can, and I think, should also be in the focus of scholars working on static data and a larger number of cases. More specifically, SUIN conditions are a useful notion for step 1 in the two-step protocol. Extending the search for supersets to disjunctions increases the chances of finding consistent enough supersets of the outcome in step 1.

Yet, and importantly, these consistent supersets must also pass the hurdle of empirical relevance and theoretical soundness. Creating supersets of the outcome by combining ever more conditions via logical OR is an easy task [just as it is easy to create subsets by combining ever more conditions via the logical AND operator, see e.g. Braumoeller (2016)]. The search for supersets must therefore be constrained. Not all supersets that can be found are automatically meaningful necessary conditions for the outcome. That means, in addition to passing the consistency threshold, a disjunction also must pass the coverage (Ragin 2006) and relevance of necessity (RoN, Schneider and Wagemann (2012)) threshold and researchers need to name the higher-order construct that is represented by the disjunctions that together form the superset. Without fulfilling these requirements, no meaningful claim of necessity can be made (Schneider 2018). In practice, this means that while there are many consistent disjunctions that are supersets of the outcome, only few, and sometimes perhaps none, pass the more thorough test of being meaningful necessary conditions. In this latter case, the two-step QCA approach is not feasible for the data at hand and should be abandoned.

4 The updated two-step QCA protocol

This section spells out the updated protocol for the new two-step QCA approach and highlights its differences to the old protocol.

4.1 Step 1—identifying necessary contexts

Just as before, in step 1, only remote conditions are analyzed. Instead of an analysis of sufficiency, the updated step 1 now consists of an analysis of necessity.Footnote 15 This analysis should include the search for disjunctions if no atomic conditions are found to be consistent enough supersets of the outcome. The consistency threshold should be set at a high level.Footnote 16 If consistency is lower than 1, it needs to be checked if the inconsistent cases are deviant cases consistency in kind (Schneider and Rohlfing 2013). If so, one should rethink whether to declare this this remote context as necessary because the existence of such cases means that one can achieve the outcome outside of any of the contexts identified by the researcher. This not only contradicts the claim of necessity. It also creates the danger that the sufficiency solution found after step 2 will display terms without any context condition.

In addition to these consistency tests, all those consistent supersets need to be refuted that are empirically trivial. Here, the two parameters of coverage (Ragin 2006) and RoN (Schneider and Wagemann 2012) should be used. Especially when dealing with disjunction, RoN provides the more sensitive measure because it takes into account the size of the disjunction not only vis-a-vis the outcome, but also vis-a-vis the negation of this disjunction. This is important because disjunctions tend to be big sets, sometimes so big that many, or sometimes even all, cases are members of the context.Footnote 17

When dealing with disjunctions, a further selection criterion for ruling out consistent supersets is whether or not a theoretically meaningful concept can be formulated of which the disjuncts are functionally equivalents. The logic of SUIN conditions dictates that without this higher-order concept explicitly spelled out by the researcher, no meaningful claim of necessity is possible (Ragin 2000, 209).Footnote 18

The left-hand side of Fig. 2 graphically displays the updated step 1. Unlike in the old protocol, step 1 now ends with the identification of a consistent superset C, which in Fig. 2 is composed of the two SUIN conditions \(A + B\). There are, thus, no cases that are members of outcome Y that are not also members of remote context C, whereby C is composed of \(A+B\). Further, because of the requirement of high coverage and RoN values, trivial supersets are ruled out as a meaningful context. This means the space outside of Y but inside C is kept small, which, in turn, lowers the number of cases that are members of context C but not outcome Y.

All remote conditions that pass the empirical and theoretical hurdles in step 1—either as single necessary conditions or as SUIN conditions—are transferred to the sufficiency analysis in step 2.Footnote 19 If no set of conditions can be identified that passes all these theoretical and empirical hurdles, then the two-step QCA approach cannot be applied for the data at hand. Researchers may then want to simply apply a standard one-step QCA or turn to one of the approaches with similar purposes mentioned in the concluding Sect. 6.

Fig. 2
figure 2

Graphical representation of updated two-step protocol

4.2 Step 2—identifying remote-proximate sufficient terms

Step 2 consists of an analysis of sufficiency. It includes all proximate conditions plus all those remote factors that have been identified as necessary contexts in step 1. In case of disjunctions, all SUIN conditions need to be introduced into the analysis in step 2. In other words, if step 1 revealed \(A + B \leftarrow Y\) and that \(A + B\) represents the higher-order concept C (\(C = A + B)\), then both remote conditions A and B need to be part of the analysis in step 2.

Alternatively, one could create condition C, add it to the data set, and use C instead of A and B in step 2. While in line with the logic according to which A and B are functional equivalents of C, this strategy has disadvantages, but also one advantage. It prevents researchers from revealing which proximate factors line up with which of the SUIN conditions to form a sufficient term for the outcome. It also prevents a comparison of the empirical importance of the SUIN conditions in terms of the (raw) coverage of the sufficient terms they are involved in. The advantage of this strategy is that it provides and even stronger reduction of the number of conditions to be analyzed in step 2. In short, then, using aggregate context conditions in step 2 is an option if and when the primary purpose of the use of two-step consists more in reducing limited diversity; it is less appealing if the goal is to unravel different contexts in which the outcome occurs.

In the updated two-step protocol, there is only one analysis of sufficiency in step 2 and this analysis includes all the remote context conditions that have been found in step 1. Having just one sufficiency analysis greatly simplifies the analytic procedure and the complexity of the results it produces. In addition, it reduces the number of remainder rows involved in the logical minimization. Introducing all context conditions into one single sufficiency analysis in step 2 is possible because with the updated two-step protocol it is ensured that no remote context conditions disappear or jointly appear in the same sufficient term in a manner that would contradict the findings from step 1.

Schneider and Wagemann (2006) suggest to obtain the conservative solution in step 2. In the updated approach, the choice of solution type should follow the analytic goal. If the two-step approach is employed in order to identify mutually exclusive types in step 2, then no logical minimization should be performed and simply the sufficient truth table rows be used for substantive interpretation. This ensures that no overlap between sufficient terms (‘types’) exists.Footnote 20 If mutually exclusive types are not the analytic goal, then any solution type (conservative, intermediate or most parsimonious) can be used in step 2.

If the intermediate or most parsimonious solution is chosen, researchers must make sure to first block all those logical remainders from being included into the logical minimization that would contradict the statement of necessity made in step 1. For instance, if step 1 revealed the necessary remote context C with \(C = A + B\), then in step 2 all logical remainders must be blocked that are subsets of \(\lnot C = \lnot (A + B) = \lnot A * \lnot B\). In other words, in step 2 researchers need to produce the enhanced most parsimonious (or intermediate) solution formula [ESA, Schneider and Wagemann (2012)].

4.3 Benefits of the updated protocol

Applying ESA in the context of the updated two-step QCA approach has various benefits. First, no untenable assumptions contradicting the statement of necessity are made. Second, and related, by excluding some remainders, their number is reduced. Reducing the number of logical remainders has always been one argument in favor of the two-step QCA approach. With the updated two-step approach, there is an increase in this remainder-reducing effect and this reduction stands on solid grounds. Those remainders are excluded that would lead to logically untenable claims. Third, the number of remainders are also reduced because there now is only one sufficiency analysis in step 2 rather than as many as there are remote contexts.

Fourth, and still related to the exclusion of untenable assumptions, the updated two-step approach ensures that all sufficient terms that are obtained in step 2 consist of combinations between remote and proximate conditions. If the remote necessary context consists of a disjunction, then each sufficient term in step 2 contains at least one SUIN disjunct. As the right-hand side of Fig. 2 shows, all of the proximate sufficient terms (\(P_1 - P_3\)) fall within not only outcome Y (due to the consistency threshold), but also within remote context C. That is, the design of the updated two-step protocol ensures that each proximate term contains the remote context.

Critics might argue that the updated two-step protocol seems to do away with two useful feature of the old protocol, namely that different contexts could be identified, rather than, seemingly, just one with the new protocol; and that each context could consist of combinations of remote conditions. With regard to the first critique, notice that the new protocol allows for different contextual conditions in the form of different SUIN conditions. Rather than being conceptually independent, as in the old protocol, the new protocol simply requires these SUIN conditions to conceptually relate to a higher-order concept. This does not prevent researchers from detecting different SUIN conditions being linked to different proximate terms. In Fig. 2, the proximate term \(P_1\) is combined with a different SUIN context than proximate term \(P_2\), whereas \(P_3\) requires both SUIN contexts to be present. With regard to contexts that consist of logical AND combinations of remote conditions, it should be noted that in principle nothing prevents a SUIN condition to consist of two or more single remote factors combined by logical AND.Footnote 21 In practice, it is a matter of the data at hand whether any conjunction of context factors (as SUIN or by itself) manages to pass the empirical and theoretical hurdles for qualifying as a necessary context.

Notice that in applied QCA it can happen that—despite the use of ESA—a sufficient term does not contain a remote context condition. If this occurs, it is not due to untenable assumptions (which have been barred by ESA). Instead, it happens because among the truth table rows that pass the consistency threshold some contain the necessary condition while other contain its negation. In the process of logical minimization, the necessary context condition is then minimized away. This problem is shunned if in step 1 a high consistency threshold is chosen and the existence of deviant cases consistency in kind is kept to a minimum or completed avoided.Footnote 22

In sum, the major innovation of the updated two-step QCA approach is that step 1 is designed as an analysis of necessity. This triggers several (beneficial) consequences. First, step 1 stands on sound set-theoretic grounds; second, by including the notion of SUIN conditions, the chance for identifying necessary remote contexts and thus the viability of the two-step approach is increased; third, by applying ESA in step 2, the number of remainders is further decreased and for reasons that follow a clear logic; fourth, due to ESA, it is ensured that all sufficient terms consist of a combination between remote and proximate factors, just as intended when using the two-step approach; fifth, the fact that step 2 now consists of one single sufficiency analysis simplifies the analysis, further reduces the number of remainder rows, and contributes to finding results that are theoretically more penetrable; sixth, the clear empirical criterion as to when the use of this approach is (not) warranted simply is if in step 1 no necessary remote context can be identified. Table 1 provides a summary of the key differences between the former and the updated two-step approach.

Table 1 Comparison of updated and old two-step protocol

5 An empirical illustration

In this section, I briefly illustrate the updated two-step protocol using data from Haesebrouck (2015). Since the focus is on methodological issues and because I significantly alter the original data for presentational purpose,Footnote 23 no attempt at contributing to the substantive literature is made or a critique at Haesebrouck’s substantive analysis is intended. Furthermore, the empirical example uses fuzzy sets, but the principles and practices of the two-step QCA can be applied to crisp sets or multi-value sets [e.g. Sager and Andereggen (2012)] as well (Mannewitz 2011).

Haesebrouck (2015) uses two-step QCA in order to identify the remote (international) and proximate (domestic) factors that jointly explain the democratic participation in UN peacekeeping operations.Footnote 24 The fuzzy-set QCA is based on 22 cases and there are four remote and four proximate conditions. For the occurrence of the outcome, Haesebrouck identifies two remote contexts and a total of three sufficient terms (intermediate solution).

The application of the two-step QCA approach can be considered ideal. Most of the problems that tend to occur with the two-step approach do not manifest themselves in Haesebrouck’s study. Compared to most other two-step QCA applications, the number of sufficient terms identified is unusually low; all terms contain the remote context conditions; and (meaningful) remote context conditions could be identified, to begin with.

By applying the two-step QCA approach in its original form, the study suffers from unavoidable disadvantages, though. The set-theoretic underpinning of step 1 is unclear. In principle, step 1 is meant to identify contexts that enable the outcome. In practice, the two-step in its present form reveals conditions that are (inconsistently) sufficient at best. This unavoidably leads to claims that are difficult to sustain. For illustration, Haesebrouck identifies two allegedly outcome-enabling contexts and both of them consist of conjunctions between two remote conditions (\(\lnot MS * MC\) and \(\lnot MS * PI\), respectively). If ’enabling’ is to be understood in terms of being required, or necessary, for the outcome, then these conjunctions would need to pass the necessity test. Since, however, none of the conjuncts passes this test (\(\lnot MS\) comes close but is dismissed by Haesebrouck), the conjunctions cannot be considered necessary. If, instead, the notion of context is understood in terms of sufficiency, then a simple one-step QCA seems to be preferable because the old two-step protocol does not reveal sufficient contexts either. Another shortcoming of applying the old two-step is that the unique coverage parameters are biased towards too high values. This is the result of performing several separate sufficiency analysis in step 2 on the same cases without taking this into account when reporting coverage.Footnote 25

5.1 Updated step 1

For the empirical illustration, I am using R packages QCA 3.3 (Dusa 2018) and SetMethods 2.4 (Oana and Schneider 2018). The replication material is available on the author’s Dataverse at https://dataverse.harvard.edu/dataverse/cqs.

In step 1, the four remote factors (MCGPPIMS) are subjected to an analysis of necessity for outcome large military personnel contribution (LC). For this, we can use function superSubset in package QCA 3.3. If there are single conditions that pass the consistency and coverage threshold, they are identified by this function. If there are no single conditions, then the minimal set of disjunctions is revealed.

In line with the protocol, we choose high consistency (0.9), coverage (0.6), and RoN (0.5) thresholds. As Table 2 shows, this reveals the SUIN conditions \(MC + PI\). Either military capacity (MC) and/or prior peacekeeping involvement (PI) jointly form an empirically non-trivial, consistent enough superset of outcome large personnel contribution (LC). For the sake of the methodological argument, let \(MC +PI\) stand for the higher order construct ’military-focused country’ M. It is M that is necessary for LC, whereas MC and PI, respectively, are functional equivalents of M.

Table 2 Step 1—analysis of necessity

As mentioned in Sect. 4, researchers should also check if there are deviant cases consistency in kind. A graphical representation in the form of an XY plot can be used, which has the additional benefit of also visually checking the degree of skewedness and thus the empirical relevance of the disjunction. Figure 3 shows that there are no deviant cases consistency in kind (i.e. the upper-left quadrant is void of cases) and while many cases are members of disjunction (\(MC + PI\)), there are several cases with low membership in it. Step 1 thus concludes with the identification of disjunction \(M = MC + PI\) as a meaningful necessary context for outcome LC. Countries only contribute large military personnel (LC) if they are military-focused (M) and such focus expresses itself through either high military capacity (MC) and/or prior peace-keeping involvement (PI).

Fig. 3
figure 3

YX plot for remote SUIN context conditions

5.2 Updated step 2

In step 2, the four proximate conditions LELPPV,  and ED are analyzed together with the two remote SUIN conditions MC and PI in one single analysis of sufficiency of outcome LC.

As Table 6 in the Appendix shows, remainder rows 1-16 are all instances of the negation of the necessary remote context (i.e.\(\lnot M = \lnot (MC + PI) = \lnot MC * \lnot PI\)). According to ESA, all these remainder rows need to be set to \(OUT=0\) in order to prevent these remainders from being used for producing the most parsimonious or intermediate solution. Else they would constitute untenable assumptions.

In line with Haesebrouck, we opt for the intermediate solution in step 2. This produces the results shown in Table 3. There are three sufficient terms. All three consist of remote and proximate conditions. Remote SUIN context MC needs to be combined with either proximate conjunction \(LE*LP*ED\) or \(LE* \lnot PV*ED\), whereas remote SUIN PI leads to the outcome in combination with proximate factors \(LE*PV*ED\).Footnote 26

Table 3 Step 2—sufficiency solution formula

If we check the easy counterfactuals made for producing this enhanced intermediate solution (see Table 4), we see that none of them contradicts our statement of necessity formulated in step 1, i.e. no easy counterfactual is a subset of \(\lnot MC* \lnot PI\)—just as intended by ESA. Furthermore, because only one sufficiency analysis is performed in step 2, the coverage parameters are not biased.

Table 4 Easy counterfactuals used for enhanced intermediate solution

6 Concluding remarks

This article has aimed at improving the formal coherence and research practicability of the two-step QCA approach as originally formulated by Schneider and Wagemann (2006). The key for this, so I have argued, consists in a reformulation of step 1 of the protocol and to apply recent innovations in set-theoretic methods. I propose that step 1 should aim at identifying conceptually meaningful and empirically consistent and relevant necessary context conditions. Step 2 then identifies the proximate conditions that, jointly with the remote context, are sufficient for the outcome. Table 5 summarizes the updated two-step QCA protocol.

Table 5 Summary of updated two-step protocol

The virtues of reframing step 1 as an analysis of necessity are manifold: (a) it- clarifies the set-theoretic status of step 1; (b) it increases the chances of being able to apply two-step by making it more likely to find necessary remote contexts; (c) at the same time, it lowers the number of remote-proximate sufficient terms, thus increasing theoretical interpretability; (d) it further reduces the number of logical remainders produced; (e) it ensures that all those sufficient terms in step 2 will contain the necessary contexts found in step 1; and (f) it provides more straightforward empirical criteria for when the two-step approach is not warranted for the data at hand, even if there are good theoretical priors that suggest a differentiation between remote and proximate conditions: two-step QCA cannot be applied if no remote context can be found that passes the empirical and theoretical hurdles for necessary conditions as formulated in the literature on necessary conditions.

In the past years, several innovations in QCA and set-theoretic methods more broadly have been formulated that, in one way or another, aim at handling data structures similar to the remote-proximate structure in two-step QCA and which in given circumstances might offer superior alternatives to the two-step approach. For instance, Baumgartner (2009) introduced cna as, among other things, a tool for unraveling the causal order among conditions. One crucial conceptual difference to the two-step QCA approach is that in the latter causal dependencies between remote and proximate conditions are not investigated, whereas the former is focusing precisely on such relations. If causal chain-like arguments are the goal, in which remote factors cause proximate conditions, which, in turn, lead to the outcome, then approaches like cna (or sequence elaboration (Mahoney et al. 2009) or comparative multilevel analysis (Thomann and Manatschal 2016)) are to be preferred over two-step QCA. Further, instead of directly using two-step QCA, researchers might want to turn to a diagnostic tool offered in package SetMethods 2.4 (Oana and Schneider 2018) and first proposed by García-Castro and Arino (2016). Function cluster, in essence, reveals if the sufficiency solution found revealed by a one-step QCA does hold for sub-populations of those cases. Such sub-populations can be created based on contexts as defined by the researcher. If sufficiency solutions do not differ across contexts, then two-step QCA seems superfluous. If the solution does not fit all contexts, though, then the use two-step QCA seems justified. The difference of the cluster diagnostic tool to the two-step approach is that in the former, the necessity of remote contexts is neither required nor tested, whereas that test is at the heart of the updated two-step QCA approach.

Future work on analytic strategies for handling remote-proximate distinctions could elaborate if different protocols are needed depending on whether this distinction is based on a temporal, spatial, or other dimension. The two-step QCA approach currently treats all differences alike. Along similar lines, one could investigate under which specific circumstances the old two-step protocol might be the better choice, such as, for instance, when the contextual conditions are not considered outcome-enabling (i.e. necessary) but rather outcome-triggering (i.e. sufficient). Researcher interested in analyzing outcome-triggering context could try and apply the old two-step protocol. They would then need to solve its shortcoming identified in this paper, though. Alternatively, and more promising, they might simply apply a standard one-step QCA with both remote and proximate conditions and interpret each sufficient term in light of the remote-proximate distinction.

In conclusion, also in its updated version, the choice of two-step QCA must rest on a strong theoretical argument that a general distinction between contextual and proximate conditions is sensible and feasible and that the outcome is best modeled by the combination between these factors. If such theoretical priors exist, then the two-step QCA approach in its updated form is a promising research strategy within applied QCA.