Despite nearly a century of critiques published about their improper use (e.g., Becker, 2005; Breaugh, 2006; Burks, 1926; Meehl, 1971; Spector & Brannick, 2016), control variables continue to be found in almost every paper in the organizational sciences (Atinc, Simmering, & Kroll, 2012), as well as in papers from other fields that rely on nonexperimental methods such as surveys. The main issue is that control variables are used in an automatic way, in hopes that including a handful of control variables would somehow eliminate the influence of extraneous factors, rendering estimates of focal relationships more accurate. What is too often ignored is the nature of the underlying assumptions and implicit model that is tested when control variables are added to an analysis, and that if that model is incorrect (likely in most cases), inferences based on analyses including control variables can be wrong. This paper will discuss what the testing of models using control variables represents and the nature of inferences that can be drawn from such model tests. I will present detailed instructions for making the most of control variables by using the systematic and theory driven hierarchical iterative control (HIC) approach.

Beginning with Becker (2005), a series of papers in the organizational sciences has discussed methodological limitations with the way control variables are typically used. These papers are consistent in noting that control variables should not be included blindly, but they should be based on a sound theoretical justification (Becker et al., 2016; Bernerth & Aguinis, 2016; Breaugh, 2008; Carlson & Wu, 2012; Spector & Brannick, 2016). Further, they agree that it is important to provide complete reporting of results involving control variables. The HIC approach builds on these ideas in providing a comprehensive strategy for the inclusion of control variables in an investigation that is based on a sound foundation and involves detailed reporting of results.

The Nature of Control Variables

Control means incorporating specific features into an investigation to eliminate alternative explanations for observed relationships among variables, in the simple case a correlation between X and Y. The observed correlation between measures and operationalizations is assumed to reflect underlying theoretical constructs that typically are not directly observable, such as internal psychological states (e.g., attitudes or perceptions). What is always uncertain, however, is why those observed variables are related, and whether those relationships might be due to extraneous factors. Spector and Brannick (2011) distinguished extraneous factors that affect measures from extraneous factors that affect constructs themselves (illustrated in Fig. 1). The X and Y circles in the figure represent theoretical constructs, each assessed with its corresponding x and y measure shown in the squares. The two-headed dashed arrow between x and y indicates that there is an observed correlation between them, but the reason is uncertain. C1 represents a biasing factor shared between measures x and y which is a source of common method variance that can produce a distorted correlation relative to the true relationship between underlying constructs of X and Y. C2 represents an extraneous factor that affects the constructs themselves and can produce a spurious relationship between X and Y. In both cases, extraneous factors account for observed relationships, representing threats to inferences about the existence of and reason for underlying relationships among the theoretical constructs.

Fig. 1
figure 1

Illustration of extraneous variables that can affect constructs or measures

Control variables can be introduced in an analysis (e.g., multiple regression or SEM) for the purpose of eliminating the effects of extraneous factors on the results. They are routinely included in the hope that by controlling one or more extraneous factors, the analysis will be more accurate (Spector & Brannick, 2011 refer to this as the purification principle) or at least a more conservative estimate of a relationship. After all, if a variable, say in a regression model, is still significant after controlling for a host of control variables, there must really be an underlying relationship among the constructs of interest. Thus, people feel that including control variables is preferred as they enhance confidence in findings. In many cases, reviewers will insist that authors add controls to analyses under the belief that it leads to better inference.

An important consideration is that the inclusion of controls is based on implicit underlying assumptions and that including controls in a tested model changes the nature of that model. In the simplest case, we begin with an observed X-Y relationship. We might assume that one variable (X) is the cause of the other (Y). The first step in studying this possibility is to show that there is a relationship (Shadish, Cook, & Campbell, 2002). If we introduce a control variable, C, we test the model that both X and C relate to Y, and we generally hope to show that X is still statistically significant in the presence of C. However, the testing of this model only makes sense if we assume that C is a driver of both X and Y, at either the construct or measurement level. If it is, then the test of X-Y, conditional on C also being in the model, is a test of whether or not X explains incremental variance in Y, once the common cause of C is controlled. It can provide evidence about whether or not C is a feasible explanation for the X-Y relationship.

A problem in blindly including a given control variable is that there are other connections among X, C, and Y that can produce the same pattern of results. As noted by MacKinnon, Krull, and Lockwood (2000), for example, the same pattern of results will occur if C is a confounding variable driving both X and Y or is a mediator that explains the X-Y relationship. If XCY is correct, then controlling for C when testing the X-Y relationship will lead to an incorrect conclusion that X and Y are unrelated. They are in fact related, and C helps explain the underlying mechanism. It should be kept in mind that there are many potential underlying mechanisms that might produce the same pattern of results (Meehl, 1971; Spector, Zapf, Chen, & Frese, 2000b). As Meehl (1971) noted, blinding including control variables is not conservative but is rather reckless as it can lead to erroneous inference.

The Nature of Inference in Nonexperimental Studies

There are several types of inference that might be drawn from a study, depending upon its design. Kraemer, Stice, Kazdin, Offord, and Kupfer (2001) discuss three kinds of inference that one might reasonably make from a given study. A correlate is a variable that merely relates to another. Most of our studies using cross-sectional and even longitudinal designs limit reasonable inference to the conclusion that two (or more) variables are related and thus are correlates (Spector, 2019). A proxy variable is one that can predict a future variable. This does not mean that one variable is just assessed prior to the other, but that one variable can forecast something that has not yet happened (Spector & Meier, 2014). For example, we use pre-employment assessments to predict subsequent job performance after hiring. Performance has not yet happened at the time of the assessment, but it can be forecasted by a test taken prior to hiring. In this case, the variable is considered a proxy because it is not clear why it was able to successfully forecast the future. Was it because the construct reflected by our proxy variable is an actual cause of the outcome, or is it merely something that is associated with the cause? If we use a personality test to predict future performance, is that personality characteristic the actual cause, or is it something else related to personality? Merely knowing that X predicts Y does not tell us that X is the cause of Y.

Finally, a causal variable is a variable that, when manipulated, will result in a subsequent effect on another variable. For example, we find that when employees are exposed to an intervention, their performance improves. Kraemer et al. (2001) refer to this as a cause merely because a manipulation of X reliably leads to Y. Of course, as with the proxy case, it is possible that your manipulation did not only affect X but affected other things that are the real cause of Y (Spector, 2019 elaborates on this point for building a causal case). Nevertheless, from a pragmatic perspective, knowing that you can achieve an outcome with a particular manipulation is important because it enables you to achieve desired results such as improved performance, even though the reasons are not what you think they are. Obviously, it is important from a scientific perspective to figure out what the real driver of an outcome might be, as it can lead to more effective interventions.

Ultimately, the long-term goal of our research streams is to understand the causal connections that underlie the phenomena we study. No single study can provide conclusive evidence for such conclusions. Rather each provides a small piece that adds to a broader case. Control variables are important tools for providing comparative tests that can rule in or rule out the possibility that observed relationships are due to extraneous factors, such as showing reasons that proxy variables are able to forecast outcomes. Nor is the control variable approach the only method that is used to shed light on causal processes. The instrumental variable method, for example, has been used in a variety of fields to isolate the causal effect of one variable upon another through the introduction of a third (or more) variable (Crits-Christoph, Gallop, Gaines, Rieger, & Connolly Gibbons, 2018; Malone & Lusk, 2018). The underlying logic, however, is similar to control variables as discussed below in that a number of assumptions are made that if correct would lead to an expected pattern of results, but finding that pattern does not mean the assumptions are correct.

To make the logic of control variables more concrete, let us consider the relationship between stressors and strains. It has been suggested that negative affectivity (NA) or neuroticism contaminates the assessment of stressors and strains, producing an artifactual relationship between their measures (Watson, Pennebaker, & Folger, 1986). In other words, stressor-strain relationships are due to bias at the assessment level and not to relationships among underlying constructs. If this idea is in fact correct, then if we were to test for the relationship between stressors and strains while including NA as a control variable, we should find that stressors have no effect (in the statistical sense). The underlying logic is of the form

“If A (NA bias is correct) then B (no stressor-strain correlations when NA is controlled).”

This logic works fine if we know for a fact that NA is a common cause of stressors and strain assessment (or the underlying constructs), but we do not know for certain whether or not that is the case. At best, we can say that our results are consistent with a common method variance or spuriousness explanation, but in the absence of additional evidence, we do not know if we are correct. This is because the reverse logic does not follow, that is,

“If A (NA bias is correct) then B (no stressor-strain correlations when NA is controlled).”

Does not logically imply

“If B (no stressor-strain correlations when NA is controlled) then A (NA bias is correct).”

If entering NA as a control variable in an analysis accounts for stressor-strain relationship, it does not follow that; therefore, NA is a bias in assessment of stressors and strains. It might or might not be. This would be an untested assumption. At best, we can say “If B then maybe A,” but there are a lot of other possibilities (for alternate NA mechanisms in stress studies, see Spector, Zapf, et al., 2000). To reach a conclusion that the relationship between our X and Y might (or might not) be due to C is not very satisfying to authors, editors, peer reviewers, or readers, so what more can we do? Actually there is a lot, but it requires a programmatic approach to deeply investigate why X and Y are related.

The HIC Method

Science is about collecting evidence and building cases to support claims and conclusions. Ideally, evidence is accumulated in a hierarchical and iterative manner that allows different forms of evidence to support inferences about the nature of relationships among variables. The typical use of blind controls that are just thrown into analyses as an afterthought is not contributing much to our understanding of why a particular X and Y (or series of Xs and Ys) are connected. We need to avoid the faulty logic that finding X and Y are related in the presence of control variables means we can conclude that our theoretical model is correct. As noted earlier, the logic of model testing

If A (my model is correct) then B (X and Y will be related)

does not imply the reverse

If B (X and Y are related) then A (my model is correct)

Nor does it imply

If A (X and Y are related in the presence of C) then B (my model is correct)Footnote 1

To build a convincing case for claims about connections among underlying X and Y constructs, we need to approach our research in a much more programmatic manner. This means conducting analyses and a program of studies in a hierarchical and iterative manner that first finds evidence that X and Y are connected and then seeks evidence to rule in or rule out a series of potential alternative explanations that run across multiple studies, multiple papers, and multiple research teams utilizing multiple methods. This can be approached using the following 7-step procedure (see Table 1 for a summary).

Table 1 Seven steps to the hierarchical iterative control method

Generate a Research Question

It goes without saying that the first step in research is to decide on the purpose or research question to be addressed. In the simple case, the question might be if a given X and Y are related to one another. Such questions might already be answered in existing literature (step 2). With purely exploratory/inductive research efforts, questions can be quite general, such as what factors might be drivers of a given outcome. Deductive/theory testing efforts would begin by deriving from theory specific hypotheses that are to be tested. When X-Y relationships are already well established, the research questions and hypotheses might focus on explaining why the X-Y relationship exist, so the focus from the initial step would be on potential control variables.

Conduct a Background Literature Review

An essential part of all research efforts is to identify what is already known about X and Y. This step can help establish that a baseline relationship exists between the target variables. It also can be helpful in identifying what does and does not relate to the X and Y variables. In some cases, there might be a rich literature from which to draw ideas about what might be driving the X-Y relationship, both in terms of serving as possible sources of method variance and in serving as potential causes that might produce spurious relationships.

Although most of the time control variables that might be sources of extraneous variance will have empirical support (i.e., they relate to the focal variables of interest), there might be times when choice of control variables is based purely on theory. This can occur in new areas when empirical background is limited, or when ideas from one discipline are brought into another. In such cases, it will be important to provide evidence that these purely hypothetical control variables are in fact related to the Xs and Ys in question.

Establish Baseline Relationship

Once the background work is complete, empirical testing can begin. Even in cases where it has been shown in prior studies that X and Y are related, it is important to verify that what is expected from the literature can be reproduced. It is not unusual for even established relationships to become elusive and fail to replicate. The empirical testing might begin inductively through exploratory research that indicates interesting patterns among variables. It might begin deductively by generating a hypothesis about variables from an existing theory. In either case, one must show that an X-Y relationship exists using the chosen measures and procedure and ideally that it exists reliably by showing that it is observed across multiple samples and/or studies.

When research is an exploratory effort to determine which of a potentially large set of variables relates to a target outcome, careful steps need to be taken to avoid over-interpreting results that can be due to type 1 errors. This means being systematic and pre-planning analyses to avoid p-hacking. Exploratory methods mean investigating which of a set of predictors might relate to a criterion. P-hacking, on the other hand, is performing a series of analyses for the purpose of making a particular relationship statistically significant. Pre-planning means deciding in advance the analyses to be used, such as correlating each predictor with the criterion. Those that come out significant would need cross-validation to rule out the likelihood of type 1 errors. With large sets of potential predictors, k-fold cross-validation with more than one cross-validation subsample should be used to add greater confidence. What should be avoided is an iterative process where repeated testing is conducted on nonsignificant correlations, for example, by dropping cases or collecting more data, what Simmons, Nelson, and Simonsohn (2011) refer to as researcher degrees of freedom. Of course, research reports should explain in detail the series of analyses that were conducted to provide the reader with a context from which to interpret results.

At beginning stages of survey research, it probably makes most sense to use efficient cross-sectional designs to establish relationships. It is wasteful to use resource intensive research designs early only to establish whether or not variables are related. Once it is shown that a given relationship can be reliably found, it is time to move on to the next steps that address why measures of X and Y are observed to be related. This includes the effects of extraneous factors on both measurement and underlying constructs.

Identify Potential Mechanisms and Control Variables

Method Variance

One obvious possibility is that X and Y are related due to the action of shared biases at the measurement level that inflate correlations due to the action of common method variance (Podsakoff, MacKenzie, & Podsakoff, 2012). For example, as suggested by Watson et al. (1986), suppose individuals who are high on NA will rate their stressors and strains to be high regardless of the actual levels. In this case, the NA effect is purely on measurement and not on the underlying constructs, and because it affects reports of both stressors and strains, it will tend to inflate their correlation. If, on the other hand, NA affects one but not the other, the correlation will be deflated because of the action of uncommon method variance (Spector, Rosen, Richardson, Williams, & Johnson, 2019). If those high on NA report strains (but not stressors) to be high regardless of actual level, the strain measure will contain extra error variance due to NA. The action of common and uncommon method variance can distort (inflate or deflate) observed relationships due to their impact on measurement and can serve as an explanation for an observed X-Y relationship.

Uncommon method variance that affects only one variable can obscure relationships among underlying constructs. Sources of uncommon method variance can act as suppressor variables and, when controlled, will result in larger observed relationships. It is therefore possible that at step 2, you observe nonsignificant relationships between X and Y due to contamination of X or Y. Although such cases might be rare, there are times when a researcher would have theoretical reasons to expect the action of uncommon method variance and use controls to investigate whether that might be the case. This would be reflected in an increased relationship between X and Y in the presence of the proposed biasing factor.

Spuriousness

Spuriousness occurs when the relationship between two (or more) variables is due to a third (or more) variable. The relationship between X and Y means that they are correlates in the Kraemer et al. (2001) sense, but the reason for that relationship is not due to the effect of X on Y (or the reverse). Rather, X and Y are related because they share a common cause or causes. For example, we know that measures of workload correlate negatively with measures of job satisfaction from a meta-analysis of nearly 100 studies (Bowling, Alarcon, Bragg, & Hartman, 2015). It is tempting to conclude that heavy workloads lead to low employee job satisfaction, but is workload really a driver or is this relationship due to other factors? One feasible explanation is that skill in performing work tasks leads to both perceptions of workload and job satisfaction. As illustrated in Fig. 2, individuals who are skilled will be able to manage their workloads efficiently and likely see them as lighter than do individuals who lack skill and struggle to complete tasks. At the same time, skill leads to good job performance, which is likely recognized and rewarded, and those rewards might lead to job satisfaction. All of these connections represent alternative explanations to the typical stressor leads to strains assumption. In a particular investigation in which we have a measure of workload level and job satisfaction, it is difficult to know why we observed that relationship. This leads to the necessity to generate and rule in or out alternative mechanisms such as illustrated in this example.

Fig. 2
figure 2

Alternative mechanism for the workload-job satisfaction relationship

Generating Hypothetical Mechanisms

Generating potential explanations for why a given X and Y are related involves both empirical and theoretical work. This means specifying potential effects on measures (method variance) and effects on underlying constructs (spuriousness). It begins in the simple case with an X-Y relationship, but more complex cases involving additional variables and entire models are also possible. Keep in mind, however, that thoroughly investigating complex models can mean testing for multiple mechanisms simultaneously. It would be more manageable initially to break complex models into simpler components to attack piece by piece before tackling an entire complex model.

The best way to generate a series of mechanisms is by doing a thorough conceptual/theoretical analysis of why a given X-Y relationship might exist. This can be a creative process of synthesizing what is known about the variables in question, the literature on assessment and research methodology, and existing theory. It involves brainstorming ideas to generate a list of potential mechanisms that can be considered as alternative explanations for a given relationship. Those mechanisms will identify specific control variables that can be targets for the HIC sequence of tests to rule them in or out.

A good starting point for conceptual analysis comes from a thorough literature review (step 2) that includes both empirical and theoretical sources. What is of direct interest is identifying which variables have been empirically identified to relate to the target variables, and which variables are not. Variables that relate to the target variables are potential candidates to explain their relationships. Variables that do not relate are probably not relevant. Meta-analyses can be particularly helpful for identifying candidates for further consideration, showing not only mean correlations, but also the degree of variability across studies, and in some cases the effects of moderators.

The general literature on assessment can be particularly helpful in identifying potential sources of method variance (for a detailed list of potential sources see Podsakoff et al., 2012). For example, there is an extensive literature on the potential biasing effect on self-reports of social desirability (Crowne & Marlowe, 1964), which reflects the tendency for individuals to respond to survey items in a socially desirable direction, making this a potential candidate for a control variable. It should be noted, however, that Moorman and Podsakoff (1992) found little evidence that social desirability related to most of the organizational variables they investigated. Other potential candidates for further study include mood (Podsakoff, MacKenzie, Lee, & Podsakoff, 2003), negative affectivity (Watson et al., 1986), and neutral objects satisfaction (Weitz, 1952) which some have noted is the general tendency to be satisfied with things in life that might bias self-reports (Judge, 1993).

Finally, there might be theories that describe relevant mechanisms that underlie relationships between target variables. Some offer alternative explanations for why target variables are related. For example, it has been observed that older workers tend to have higher job satisfaction than younger workers (Clark, Oswald, & Warr, 1996). Is this because older people are just happier in life because they are older, or is something else going on? Wright and Hamilton (1978) considered several potential mechanisms that might explain the aging effect. One proposed by Quinn, Staines, and McCullough (1974), is that older workers are more satisfied, not because they are older, but because they have better jobs due to experience, or what Wright and Hamilton called the Job Change Hypothesis. Although experience in a particular job to some extent comes from age, it really comes from tenure in the occupation. Of course, it is difficult to have long tenure at a young age, but not all older workers have long tenure in an occupation. That comes from years since entering a field, and people might begin a career at almost any age.

Theory might suggest potential control variables that are known to relate to the target Xs and Ys. On the other hand, it might suggest variables for which few data exist. In that case, it will be important to establish that these purely theoretical control variables are related to X and Y at initial stages of their investigation. If they are not related, the logical conclusion would be that the theoretical thinking is incorrect.

Once a literature review is conducted to identify relationships of the target variables with potential control variables, and potential theoretical mechanisms, those findings need to be integrated and hypotheses generated. This is a brainstorming process in which feasible explanations are devised and evaluated. The best way to do this is by writing each hypothesis along with a brief rationale. Feedback from others should be sought in a variety of ways.

  • Brainstorming sessions: This would involve members of a research team who are working on the project. It could also involve colleagues and friends who would not be part of the research team but would be willing to provide feedback.

  • Crowdsourcing: It can be advantageous to seek ideas beyond the research team and immediate colleagues to expand the perspectives that are represented. One approach could be to use a crowdsourcing procedure (Boughzala, de Vreede, Nguyen, & de Vreede, 2014) to solicit input from individuals willing to volunteer their ideas. An example using a limited and targeted population was conducted by Seeber et al. (2020) who invited a sample of collaboration scientists to complete a survey where they could weigh in on using artificial intelligence (AI) as members of a team. A broader approach might sample a cross-section of working people to provide ideas about why target variables of interest might be related. This could be done through an online survey, or through more labor intensive methods such as focus groups or interviews.

  • Conference presentations: The ideas generated could be part of a presentation at a conference. Comments and feedback would be provided by reviewers of the submission, and could be sought from the audience, either at the session or afterwards.

  • Conceptual article publication: If the ideas form the basis of a conceptual or theoretical article, journal reviewers would provide feedback on the paper. Once it is published, additional feedback could be invited from colleagues.

The generation of hypothetical explanations should be considered an ongoing activity and should not end once an initial set of mechanisms is generated. That should be considered a starting point, but as empirical testing proceeds, findings should inform rejection of some proposed mechanisms and possibly modification of others. Additional mechanisms might be proposed based on findings that suggest new avenues to explore.

Empirical Testing of Hypothetical Mechanisms

Once potential mechanisms are generated, the HIC approach to empirical testing can begin. This is where control variables and control strategies come into play but in a comparative way. A sequence of tests proceeds in which controls are introduced hierarchically in order to rule them in or out as potential explanations for an observed X-Y relationship. These controls can be included as measured variables that can be statistically manipulated to determine their effects on analyses. Controls can also involve design features that allow comparisons of relationships when data are collected using different methods.

Including Controls as Measured Variables

Most discussions of control variables concern including additional measured variables in an analysis to remove the effects of extraneous factors on observed relationships among variables of interest. It is here that control variables are often used inappropriately without consideration of their specific role. The HIC approach begins by establishing relationships of interest as a baseline and then proceeds to consider the effects on analyses of one or more additional control variables, chosen after careful analysis in the prior steps. These hierarchical tests (comparing results with and without controls) can be done within a single study, or across repeated (iterative) studies. An iterative approach is recommended because the inclusion of a variety of control variables increases the possibility that some observed results are due to type 1 or type 2 errors. Being able to replicate an effect, or lack thereof, provides confidence that this was not the case.

Once the baseline X-Y relationship is established, add controls one-by-one to rule them in or out as potential explanations. In most cases, hypothesized explanations are for individual control variables and not combinations, that is, hypotheses suggest that various control variables will affect measures of interest, and not that one control variable’s effects will be conditional on other control variables.

There are times, however, when multiple control variables are entered into an analysis together, but choice to do that should ideally be based on the theoretical mechanisms one is proposing to test. One should avoid the temptation to try many different subsets in an attempt to find the combination of variables that yields the “best” results. Hierarchical does not mean conducting lots of analyses to find the most significant results, because doing so will capitalize on chance and increase the likelihood of type 1 errors and problems with replication. Analyses should be carefully chosen and pre-planned. In some cases, this will mean first testing each control variable separately and then combining them together into an analysis.

There are important issues with introducing too many correlated controls into an analysis, particularly when the sample size is not large, namely, low statistical power, and inability to replicate results. As more correlated predictors are added to an analysis, the standard errors for regression parameter estimates for a given variable become larger. This is true for the target X variables, as well as for parameter estimates for control variables. Entering several correlated control variables can impact the significance test for each one (reduced power) and can result in sample-specific patterns, that is, the control variables that are significant versus nonsignificant can vary from sample to sample. This can be a particular concern with complex model testing, where sample sizes are often less than optimal. Sample size should be a consideration when deciding how many variables to include in an analysis.

The simplest analytical tool for investigating control variables is multiple regression because it allows an easy hierarchical approach. You can see if control variables are significant in the analysis and what happens to the standardized regression coefficient for the target predictor variable (X) in the presence of the controls. Drawing conclusions should not be based merely on whether or not the coefficient for X loses statistical significance. Rather, the interest is in how much of a reduction occurs. Certainly, significance tests can be useful tools to indicate if the reduction of X’s predictability is significantly greater than zero. For example, Clogg, Petkova, and Shihadeh (1992) provide a t test for comparing the regression coefficient for a predictor with and without a control variable added. It should be kept in mind, however, that statistical significance is affected by sample size; so with a large sample, a relatively small reduction might be significant, whereas with a small sample, a relatively large reduction might be nonsignificant. So interpretation of significance should be combined with the effect size (extent to which coefficients change). A significant change in regression coefficient from .40 to .38 should have a different interpretation from one that changes from .40 to .18.

It should be kept in mind, however, that adding control variables does not only result in decreased regression coefficients. When a variable is only related to X or to Y, it acts as uncommon method variance that can affect results of an analysis. Adding a control variable that relates to X but not Y (the suppressor variable case) will result in an increase in the parameter estimate for X, but a decrease in estimation precision as the coefficient’s standard error increases. A control variable that relates to Y but not X will result in no change in the parameter estimate for X, but an increase in precision by reducing the coefficient’s standard error. Failing to include a variable that produces uncommon method variance in X but not Y will result in an underestimate of the effect size for X. Failing to include a variable that produces uncommon method variance in Y will reduce precision, but not bias the estimate of effect size.

To illustrate how control variable analysis might work, I chose the question of how mood and negative affectivity (NA) as a personality trait might affect relationships between job stressors and job satisfaction. I extracted the correlation matrix from O’Connell (1991) based on a sample size of 108 that included measures of these variables. As noted earlier, it has been suggested that mood and NA might bias the assessment of job stressors and strains including job satisfaction. Included were the stressors of role ambiguity, role conflict, organizational constraints, and interpersonal conflict, all of which correlated significantly and negatively with job satisfaction consistent to what has been repeatedly found in the literature. These relationships support the idea that high levels of stressors are associated with low levels of job satisfaction. The next step was to regress job satisfaction on each stressor separately (four analyses), each stressor plus either mood or NA, and then each stressor, mood, and NA. The results can be seen in Table 2. First shown for each stressor is the standardized regression coefficient when the stressor was entered alone. Next are the analyses with each control variable entered separately. As can be seen, in all cases, the standardized coefficient for the stressor declines when either control variable is entered. For role ambiguity and conflict, the stressor coefficient remains statistically significant; for organizational constraints and interpersonal conflict, it does not. This clearly shows that there is overlap between the stressors, mood, and NA. Particularly for constraints and conflict, we could not rule out the possibility that the stressor-strain connection is due to mood or NA.

Table 2 Illustration of using control variables with multiple regression

Given that there has been the suggestion that NA effects might be due to mood rather than personality (Spector, Chen, & O’Connell, 2000a), this is a case for which there is a good reason to enter both control variables together. As shown in Table 2, when entered together, the NA standardized coefficient reduces considerably and in all cases is nonsignificant. These results are consistent with the view that NA effects are due to mood, at least when both are assessed concurrently. Of course, we cannot be certain based solely on this pattern of results that NA is a driver of stressors and strains, only that results with two of the stressors are consistent with that possibility. They are also consistent with other potential mechanisms as discussed by Spector, Zapf, et al. (2000).

There are a variety of statistical analyses that can be used to test for the effects of control variables with measured variables beyond multiple regression. Assuming all variables are continuous, the alternatives most often used are structural equation modeling (SEM) and multilevel modeling (MLM) when data fit a multilevel structure. Multiple regression is useful when investigating the impact of one or more control variables on a given X-Y relationship. SEM has the advantage of including both a measurement model and a structural model in the analysis. It also allows for the exploration of control variable effects on a multi-stage model at different points. With a complex model that involves several variables arrayed over 3 or more stages, the action of a potential control variable can be modeled at many different points. This means the background work would have to identify which of the connections between pairs of variables in a model are expected to be affected. For example, if some variables reflect fairly factual and objective features of a job, one would not expect social desirability to be an issue. It might be an issue, however, with questions concerning potentially sensitive personal information.

Introducing Controls as Design Features

It should be kept in mind that whereas control variables are normally discussed in the context of statistical control, there are design features that can be introduced in order to control potential extraneous factors that might affect observed relationships. These should not be overlooked as they allow for the testing of relationships using alternative methods that can provide additional confidence to conclusions about the actions of extraneous variables. For example, alternative sources of data provided by coworkers, supervisors, or others are sometimes used to control for common method variance. A given X-Y correlation can be compared between a case with all self-reports versus a case where X and Y are assessed by different sources, say, X from a coworker and Y from a supervisor.

Introducing an element of time into a design can be useful in controlling for some extraneous factors. If X and Y are separated in time, the effects of occasion factors, such as mood or consistency biases, can be reduced. With relatively short time frames, those occasion factors would have to be transitory. As time frames increase, a greater number of potential occasion factors are controlled, but this is only viable if the variables of interest are stable over time. If the levels of the underlying constructs change from time 1 to time 2, the lag is likely to obscure rather than illuminate the relationship between the X and Y constructs.

A third means of controlling some extraneous variables is with the use of multiple trained judges. The use of multiple judges whose assessments are combined helps reduce the impact of idiosyncratic personal biases on the overall measurement. Of course, this assumes that the judges do not share biases. Training and providing rubrics can increase the accuracy of assessments and can also help in reducing individual biases.

Experimental approaches can be helpful in isolating extraneous factors. This becomes feasible when the variables of interest can be manipulated. Experiments can be conducted to investigate some sources of method variance, such as survey design features (Spector & Nixon, 2019). They can also be used to experimentally study the connection between certain features of jobs and an outcome, such as job performance. You might begin with a self-report study and then compare those results to a study in which the job feature is experimentally manipulated. This would be comparing a correlate to a causal variable, using the Kraemer et al. (2001) classification. It would show that the observed relationship between X and Y is likely or unlikely to be due to the action of X on Y. Inferences would be based on the assumption that the job feature in question was in fact manipulated, and that it was the same feature that was assessed with the self-report. This means conducting research that provides construct validity evidence for the self-report measure, and for the efficacy of the intervention used to manipulate the job feature. Quite likely, the self-report measure would be used to provide validation evidence for the intervention, relying on the principle of converging operations. If after intervention employee perceptions of the job feature change as expected, one would conclude that the intervention had the desired effect. There is no guarantee that this is in fact the case, as there can be alternative reasons for convergence. One is that the item assessing the job feature reflects something positive about the job; the intervention was seen by employees as a pleasant break from work that put them in a good mood, and it was that mood and good feelings about the job that influenced the self-reports, not the content of the job.

Step 6: Interpret Results and Reconsider Potential Mechanisms

Once planned analyses are conducted, results are interpreted and prior theoretical thinking is re-evaluated and extended. The purpose is to inform future research, and not to serve as the basis for a fishing expedition with the data at hand to see if different analyses might produce more statistically significant patterns. Even with exploratory research, analyses should be pre-planned, and new ideas that arise after data are analyzed should be tested ideally on new data. This is a stage where results might be presented at conferences or submitted to journals, and where feedback is sought for additional ideas. Brainstorming within the research team can generate new mechanisms to consider in light of results from step 5. Finding support for a control variable as an explanation is not conclusive based merely on a pattern of statistical results. Additional research needs to focus on why the control variable might have related to X and Y strongly enough that it can explain, at least statistically, the X-Y relationship. This can lead to new thinking about potential mechanisms. Examples of this sort of HIC thinking can be seen in Meehl (1971) who discussed several reasons that socioeconomic status would relate to social participation and schizophrenia and in Spector, Zapf, et al. (2000) who provided a series of mechanisms through which NA would relate to job stressors and job strains. The output of step 6 will inform the design of a subsequent study or series of studies to test this modified set of mechanisms to explain both the original X-Y relationship and the reason that a control variable in step 5 might relate to X and Y.

Step 7: Conduct More Empirical Research

It is essential to both replicate and extend results from the original study, not only from step 5, but also from background studies, as noted earlier. We need to be sure that earlier findings can replicate, and we need to test additional mechanisms proposed in step 6. Such studies might continue to utilize low-cost designs as first steps. To thoroughly rule in or out potential mechanisms, we need to introduce additional control features into the design to go beyond statistically controlling measured variables within a study.

From this point, the research process iterates between steps 6 (reconsidering old mechanisms and deriving new ones) and step 7 (further testing). For some researchers, this process might unfold over dozens of studies spanning many years. For others, the process might stop as new ideas become elusive and interests shift to other topics. In the long run, a systematic evaluation and re-evaluation of feasible alternative mechanisms will help build convincing cases that some X-Y relationships are solid and due to the likelihood that the underlying construct we have in mind is reflected in their operationalizations, and that X is a likely driver of Y. In other cases, we might conclude that the observed X-Y relationship is due to something other than what we initially believed. Over time, this HIC approach will help us develop a deeper understanding of the underlying phenomena of interest.

An Example of Using the HIC Approach

The following hypothetical example illustrates how the HIC approach would be used in practice. Suppose my research team and I are interested in the impact of supervisor style on employee productivity. We would apply the 7-step procedure as follows:

  1. 1.

    Research question. We began with a general question about the impact of leadership style. This needs refining into something more specific and manageable, so we sharpen the focus on the impact of transformational leadership on sales performance.

  2. 2.

    Literature review. We would want to know the literature linking supervisor style and employee productivity, particularly the connection between transformational leadership and sales performance. We would also want to know if there is literature on factors that might affect the assessment of these variables or might serve as common causes of both. To accomplish this, we would conduct a thorough literature review, using keywords transformational leadership and sales performance.

  3. 3.

    Establish baseline. Once it is decided to study transformational leadership and sales performance, we must establish that they are related. We might find correlations in the literature review from step 2, or we might collect data in order to determine the baseline relationship. This could be a small pilot study, or we might piggy-back the assessment of our two variables of style and performance onto a larger study of something else.

  4. 4.

    Potential controls. Here we would spend time considering what is already known about transformational leadership and sales performance with a particular eye toward things that might serve as extraneous factors to control. We would do this within the research team in one or more brainstorming sessions, and we might ask colleagues who study leadership and those who study sales for input. What comes immediately to mind here if we are using employee reports is that ratings of leadership are likely influenced by how much the subordinate likes the supervisor. If self-reports of performance are used, they might be influenced by self-enhancement motives and social desirability. If supervisor reports are used, self-ratings of leadership might also be affected by self-enhancement motives and social desirability. Rating of employee performance might be affected by liking for the employee, but it should be kept in mind that liking of the employee can be influenced by performance, as well (Lefkowitz, 2000).

  5. 5.

    Empirical testing. We would design an investigation that would include measures of transformational leadership, sales performance, and the controls identified in the prior step. Quite likely, we would begin with a cross-sectional single-source study with all variables assessed via self-reports. We also might design a series of studies that use a variety of methods. Ultimately, we want to assess style and performance, and perhaps control variables, in multiple ways. Transformational leadership could be assessed via self-reports of the supervisor and reports from subordinates, peers, and supervisors. Observers could also be used who might spend time with the supervisors and report on how they interact with subordinates. Sales performance data can be collected from employees, supervisors, customers, and records.

  6. 6.

    Interpret. Once the studies of step 5 are completed, the results would be interpreted. Likely, some X-Y relationships will differ in the presence of some, but perhaps not all, control variables. The patterns would need to be considered, and new studies planned to test ideas generated at this stage. We would want to present results at step 5 at a conference and submit to a journal, paying particular attention to reviewer feedback.

  7. 7.

    Conduct more research. Efforts to this point will certainly raise more questions than were answered, so new studies would be planned to help explain the results. Suppose we find that when we control supervisor liking of subordinates, there is little relationship between supervisor style and performance, and this occurs for both subordinate and supervisor ratings of supervisor style. This suggests that there is likely a connection between style and liking that goes beyond employee idiosyncratic perceptions. Follow-up research could be conducted to understand why this complex pattern of results was found and to determine temporal precedence. This might be done with newly hired employees to trace their experience as relationships develop with supervisors and whether initial interaction patterns and liking lead to subsequent performance, or the reverse.

The HIC Approach in Practice

The HIC approach requires a programmatic mindset that approaches research questions from a systematic point of view. The main goal is to identify interesting and potentially useful relationships among observed variables and then, through a series of empirical tests, collect evidence that leads to an explanation of why those variables are related. This can be part of a unified inductive-abductive-deductive approach to research (Spector, 2017) that explores new phenomena (induction), comes up with theoretical explanations for why observed variables are related (abduction), and tests those theoretical explanations (deduction). Such endeavors cannot be adequately accomplished in a single study, or even by a single research team. Furthermore, results are typically not entirely clean, and conflicting results will typically arise. The availability of results from multiple studies using multiple approaches can help in resolving conflicts that might be due to power issues that affect significance across studies, boundary conditions, and differences in samples and methods across studies. Often discrepancies can be the impetus for studies designed to understand them that enhance understanding of the underlying phenomena.

The HIC Approach at the Field Level

The HIC approach is not just for individual researchers and research teams. It is a mindset for the organizational sciences and other fields to adopt. It means considering the big picture and always asking why particular relationships were found and what our results really tell us. This leads to a HIC approach of conducting a series of studies to rule in or out the effects of extraneous factors that spans different researchers and research teams. There are two main advantages of this field-level approach.

Reliability and Replicability

Issues concerning replication are certainly not new, but in recent years, there has been a lot of focus on the importance of making sure that published results are not just flukes. It is essential that researchers not consider their job to be done once a particular finding is published. As much as possible, researchers should repeat studies to be sure that observed relationships can be replicated. It should not be assumed that the same result would be found in a new sample. Replication can also be done with large samples through a cross-validation strategy. If samples are sufficiently large, a k-fold approach can be used to break samples randomly into multiple parts to be sure that findings can replicate. Finding additional samples conducted on different populations of workers (e.g., different industries or occupations) is needed to test for generalizability of findings. The need for replication is not only for the baseline X-Y relationships but also for analyses to test the effects of control variables. A final conclusion should not be reached until all findings can be shown to replicate.

Even if a researcher or research team can replicate a finding, it is still important that the finding is independently replicated by a different researcher/research team. This is necessary to demonstrate that the effect is not due to some bias or errors inadvertently introduced by the original team. Of course, from a research integrity standpoint, independent replication adds confidence that a given finding is not the result of research misconduct.

Fresh Eyes

It is easy for a researcher and even a research team to get locked into an approach and mode of thinking about a problem. People can become invested in an idea, especially when that idea pays off in terms of tangible and intangible rewards. A drawback of too much emphasis on theory is that it can limit our objectivity as we seek confirmation of the theory and overlook disconfirming evidence. Having the same problem attacked by different individuals working independently can help overcome these problems as individuals with different points of view and preconceived ideas might look at a problem in different ways. Younger researchers and those new to a particular problem can bring fresh eyes because they have not yet developed a vested interest in whether a particular approach, idea, or theory is correct.

A New Paradigm for Considering What Is Important

The organizational sciences, like many scientific fields, value novelty over rigor. Detecting a novel phenomenon by showing an interesting pattern of results is likely to be viewed by editors and reviewers as a contribution to knowledge. Merely ruling out that a feasible mechanism was responsible for those results can be met with distain because it does not address a novel phenomenon. A study that adds confidence to the original tentative conclusion that might be due to a variety of extraneous factors can be seen as unimportant. From a HIC perspective, the ruling in or ruling out of feasible alternative explanations can be just as important as the original novel finding.

Many scientific fields are in the midst of a replication crisis in part because the top journals define significant contribution in a very narrow way in terms of novelty. This discourages researchers from investing time in replications and even extensions that are unlikely to be good candidates for top journal publication. A HIC mindset would focus not just on novelty, but on increasing certainty about findings through replication and the repeated testing of alternative explanations. A systematic approach would assure that findings are replicable and potentially provide an expanding explanation for mechanisms that produced observed results.

In a real sense, what is in the top journals is tentative because science is tentative. Findings might not be replicable, and even if they are, there is a great deal of uncertainty about why those findings occurred. It is difficult to argue that papers that include results with higher novelty and lower certainty are important contributions, whereas papers with lower novelty and higher certainty are not. The HIC approach might begin with a novel finding, but it does not end there. It means following up to both replicate the original findings and then iteratively consider the possibility that extraneous factors can explain those findings. This is done through the systematic use of control variables.

Making the Most of Control Variables

Controls are used in science in order to isolate effects concerning variables of interest without the influence of extraneous factors. A control variable is one that we believe might be influencing either the assessment of a theoretical construct or the construct itself. If ignored, such a variable might be producing a pattern of relationships that could be misinterpreted. But blindly including control variables haphazardly, just because we have them on hand or because it is common practice to routinely include them in your domain, is not likely to be helpful. In fact, such blind controls have the potential to obscure results and lead to faulty conclusions. Such use might be considered little more than scientific rigor theater that provides a false sense of security about conclusions.

A better way to utilize control variables is with the HIC approach that involves identifying potential control variables based on empirical findings, theory, and even intuition and then testing the impact of those variables through a series of comparative tests. This process is hierarchical (as in the multiple regression sense) because it involves comparison of results from analyses that are conducted sequentially without and then with controls. It is iterative because it involves a series of repeated steps of generating potential mechanisms and empirically testing them. Results from each empirical step inform reconsideration of potential mechanisms that serve as the basis for further empirical work.

The HIC approach makes the most of control variables by treating them, not as a nuisance to be cast aside by including them in analyses, but as stars of investigations in their own right. Investigating control variables systematically not only provides a deeper understanding of the target variables and why they are related, but it also helps us understand underlying mechanisms that can affect measures and constructs. Keep in mind that one person’s control variable is another person’s target variable, and the distinction can be quite arbitrary. Making the most of our control variables is making the most of our research. HIC means taking a more systematic approach to our research, not only in how we use control variables, but also in how we design, carry out, and interpret our studies.