Introduction

Meta-analysis is a statistical procedure widely used by researchers to cumulate results from primary studies with the overall goal of drawing accurate inferences about relationships between constructs. Although meta-analysis is frequently conducted with data gathered at the individual level of analysis, relatively few studies have employed meta-analytic procedures with aggregated data to estimate relationships at the group, or business unit, level of analysis. Notable examples of group-level meta-analyses include Harter et al. (2002, 2010), Wallace et al. (2016), and Whitman et al. (2010, 2012). In most cases, group-level meta-analytic studies either have not corrected for measurement error in the predictor and criterion measures or have relied on assumed population predictor and criterion reliability information. As discussed by Raju et al. (1991), these forms of meta-analysis are not optimal when predictor and criterion reliability information is available from primary studies.

The use of suboptimal meta-analytic procedures is clearly understandable when it is the case that reliability information is unavailable. We contend, however, that sample-based reliability in studies involving aggregated data may be available in the form of reported intraclass correlation coefficients (ICCs). At present, primary studies and published meta-analyses have not used this sample-based ICC information to correct group-level correlations for measurement error in the predictor or criterion measure and to account for sampling error in such reliability estimates. The current paper illustrates the logic and method by which available ICCs can be employed as reliability estimates within primary and meta-analytic studies when group-level phenomena are of interest. In addition, this paper provides illustrative comparisons of meta-analytic scenarios where different types of group-level reliabilities (e.g., sample-based ICCs and literature-based, assumed population reliability estimates) are incorporated into tests of three research questions. We offer guidance on how to handle missing group-level reliabilities, and further, clarify how primary and meta-analytic study conclusions are affected by the type of reliability value used, sample-based versus assumed population, when such information is missing.

The structure of the current paper is as follows. First, examples of prior meta-analytic approaches using aggregated data and explanations concerning how reliability information was handled in these cases are identified and discussed. Following, a case is made for using available ICC information from primary studies as estimates of group-level predictor and criterion reliability. We then illustrate how ICC information can be used to correct correlations for unreliability in primary studies and for estimating the sampling variances (and standard errors) of individually corrected correlations. Sampling variance equations for individually corrected correlations, covering all possible situations related to the availability of reliability information, are presented in conjunction with the assumptions associated with each situation. Finally, we demonstrate how sample-based meta-analyses can be conducted when such predictor and criterion reliability information is sporadically available. Here, we contrast results from sample-based meta-analyses with findings from meta-analyses based on the use of assumed population reliabilities obtained from the literature. In addition, we present supplemental meta-analyses that artificially restrict the availability of reliability to further demonstrate how reductions in sample-based reliabilities affect meta-analytic parameter estimates.

Previous Meta-Analytic Approaches Using Aggregated Data

The organizational research literature is replete with meta-analyses conducted at the individual level of analysis. Markedly fewer studies, however, have applied meta-analytic procedures to data gathered at the group or organizational level of analysis. This is unfortunate given that conducting meta-analysis with aggregated data is important for understanding and making generalizations about group- and organizational-level phenomenon. Of the studies that have employed meta-analytic procedures with group-level data, most have treated reliability estimates as assumed, population-based reliabilities (as opposed to sample-based reliability values). For instance, Harter et al. (2010) examined the causal impact of employee work perceptions on an organization’s bottom line and corrected correlations for measurement error based on aggregated test–retest reliability estimates. In another study by Harter et al. (2002), similar correction procedures using test–retest reliabilities were employed in examining relationships between employee satisfaction, employee engagement, and business outcomes. In each of the preceding meta-analyses, sampling error in the reliability estimates was not taken into account.

Researchers have also included ICC(2) values when estimating the reliability of group-level variables in meta-analyses. For example, in a study examining relationships between satisfaction, citizenship behaviors, and performance in work units, Whitman et al. (2010) corrected correlations for unreliability using ICC(2) values. Although these values were sample-based, Whitman et al. employed meta-analytic procedures that treated the sample-based ICC(2) values as assumed, population values. Whitman et al. (2012) applied the same meta-analytic procedures. Interestingly, Hong et al. (2013) corrected observed correlations for different types of sample-based reliability estimates including ICC(2)s, but they did not employ meta-analytic procedures that account for sampling error in the sample-based reliabilities.

While the above studies certainly advance our understanding of relationships between group-level variables, the specific treatment and assumptions concerning reliability estimates that were employed in most of these studies are not optimal. Relying on assumed reliability values can produce inaccurate results because the degree to which assumed values mirror those of the population is often unknown (Raju et al. 1989). Related, when using sample-based reliability values, failing to account for sampling error in reliability estimates can lead to imprecision in the estimates of the mean and variance of corrected correlations (see Raju et al. 1991).

In sum, meta-analysts have corrected group-level correlations for unreliability using ICC(2) information. Despite this, however, none of these studies have accounted for sampling error in the sample-based reliability estimates. The current study departs from those described above by demonstrating how group-level meta-analyses can be improved by using all available reliability information from the primary group-level studies and at the same time take into account sampling error in the reliabilities reported within the primary studies.

ICCs as a Form of Group-Level Reliability

Two variations of ICCs, ICC(1) and ICC(2), are frequently used in organizational research. These coefficients are more generally interpreted as measures of the proportion of variance attributable to objects of measurement (McGraw and Wong 1996). The ICC(2) provides an estimate of the reliability of group means (Bartko 1976; Bliese 2000; Shrout and Fleiss 1979) and is calculated using the following formulaFootnote 1:

$${\text{ICC}}(2) = \frac{{{\text{MS}}_{\text{B}} - {\text{MS}}_{\text{W}} }}{{{\text{MS}}_{\text{B}} }}$$
(1)

where MSB is the mean square between groups and MSW is the mean square within groups.

ICC(2) values share several important features with traditional reliability indices. Reliability is an index of the variance of interest over the sum of the variance of interest plus error (Shrout and Fleiss 1979). Consistent with the preceding definition, ICCs have long been recognized as reliability estimates (Cronbach et al. 1972; Ebel 1951; Lahey et al. 1983). For ICCs, when within-subject variance is small relative to total variance, the coefficient is larger than when within-subject variance is relatively large. Importantly, this is also true of reliability coefficients (e.g., coefficient alpha) commonly used in meta-analysis conducted on individual-level phenomena.

From the preceding points, ICC(2) estimates are frequently considered indices of reliability for group-level data (Cronbach et al. 1972; Ebel 1951; Lahey et al. 1983; Stanley 1971) and are expressed similarly to traditional reliability indices (i.e., correlation coefficients). Bartko (1976) originally termed ICCs as intraclass correlation reliability coefficients and “use of the intraclass correlation (ICC) as an index of the reliability of ratings has been well documented and accepted in psychological research” (Lahey et al. 1983, p. 586). Most importantly, as discussed above, ICC(2)s are interpreted as reliability values in the organizational psychology literature (Hong et al. 2013; Whitman et al. 2010, 2012). Therefore, ICC(2) values may logically be used to correct observed, group-level correlations for measurement error.

Given ICC(2)s as estimates of group-level predictor and criterion reliability, the general equation for estimating the sampling variance of a corrected correlation (see Raju and Brand 2003; Raju et al. 1991) can be adjusted for use with ICC(2) values and other types of group-level reliability estimates (e.g., stability coefficients) whenever they are available for the predictor or criterion. We next discuss these equations and how they might be employed whenever group-level reliability information is available in primary studies.

Correcting Correlations for Sample-Based Unreliability

Despite the long-standing availability of procedures to correct correlations within primary studies for unreliability with appropriately defined standard errors (see Raju et al. 1991), there has not been a systematic application of such procedures using available ICC values. This section presents the relevant formulas for correcting a correlation for predictor and criterion unreliability, as well as the general formula and special cases for estimating the sampling variances of individually corrected correlations. We re-present equations from Raju et al. (1991, 2004) to illustrate their relevance to group-level studies and for corrections to correlations with ICC values and other group-level reliability estimates.

We begin by letting r xy represent the restricted and attenuated effect between the predictor (x) and criterion (y) in a sample, where r xx and r yy represent the sample-based predictor and criterion reliability values, respectively. In addition, the range restriction factor k is defined as 1/u where u is the ratio of the unattenuated, restricted standard deviation on x to the unattenuated, unrestricted standard deviation on x (Raju and Brand 2003). With the application of classical test theory (Lord and Novick 1968), an estimate of an unrestricted and unattenuated population correlation (ρ xy ) can be written as:

$$\hat{\rho }_{xy} = \frac{{kr_{xy} }}{{\sqrt {r_{xx} r_{yy } - r_{xy }^{2} + k^{2} r_{xy}^{2} } }}$$
(2)

When all sample-based artifact information is available, the general sampling variance formula associated with the corrected correlation as presented in Raju and Brand is:

$$\hat{V}\left( {\hat{\rho }_{xy} } \right) = \frac{{k^{2} r_{xx} r_{yy} \left( {r_{xx} - r_{xy}^{2} } \right)\left( {r_{yy} - r_{xy}^{2} } \right)}}{{\left( {n - 1} \right)\hat{W}^{3} }}$$
(3)

where

$$\hat{W} = r_{xx} r_{yy} - r_{xy}^{2} + k^{2} r_{xy}^{2}$$
(4)

As discussed below, Eq. 3 can be adjusted depending on the availability of group-level reliability information.

Availability of Group-Level Reliability Information

Individual studies vary in whether reliability information is provided for group-level variables. For instance, some studies report reliability (e.g., ICCs) information for the predictor (Chen et al. 2007; Dawson et al. 2008; Schmit and Allscheid 1995), criterion (Baer and Frese 2003), or both (Dietz et al. 2004; Salanova et al. 2005; Simons and Roberson 2003). In this section, we present sampling variance formulas, originally derived by Raju and Brand (2003), which apply to all possible situations concerning the availability of predictor and criterion reliabilities. In doing so, the assumptions and equations that allow for disattenuating group-level correlations and for conducting meta-analyses with sample-based reliabilities at the group-level of analysis are discussed.

The variance formulas presented below take into account sampling variance associated with r xx , r yy , r xy , and their intercorrelations. It is important to note that for each formula, k will be fixed at one because we are not concerned with corrections for range restriction in the present discussion. As a result, W (i.e., Eq. 4) reduces to the product of r xx and r yy .

Clarification of several conceptual terms is important before presenting the sampling variance formulas for special cases of Eq. 3. The term “assumption” relates to the treatment of reliability information based on its availability within a primary study. That is, when predictor or criterion reliability information is missing, the reliability value that is employed is treated as “assumed fixed,” meaning the reliability coefficient is treated as a parameter not having sampling error. In this case, the average reliability from the set of primary studies or a reliability value from other investigations can be used as the assumed reliability. In most cases, meta-analysts rely on an assumed population reliability obtained from the literature such as the frequently employed criterion reliability value of .52 (see LeBreton et al. 2014). In contrast, the term “sample-based” means the reliability estimate is reported in the primary study and is, therefore, treated as a parameter with known sampling error.

When the criterion reliability is missing, it is assumed fixed, and the sampling variance formula can be written as:

$$\hat{V}\left( {\hat{\rho }_{xy} } \right) = \frac{{r_{xx} r_{yy}^{2} \left( {r_{xx} - r_{xy}^{2} } \right)\left( {1 - r_{xy}^{2} } \right)}}{{\left( {n - 1} \right)\hat{W}^{3} }}$$
(5)

An assumed reliability value would need to be used for r yy in Eq. 5. In this case, a criterion reliability coefficient from prior investigations may be used as an estimate (Raju and Brand 2003), such as the .52 value noted above.

When the predictor reliability (e.g., ICC) is missing, it is assumed fixed. The sampling variance formula can be written as:

$$\hat{V}\left( {\hat{\rho }_{xy} } \right) = \frac{{r_{xx}^{2} r_{yy} \left( {1 - r_{xy}^{2} } \right)\left( {r_{yy} - r_{xy}^{2} } \right)}}{{\left( {n - 1} \right)\hat{W}^{3} }}$$
(6)

An assumed reliability value would need to be used for r xx in Eq. 6. In this case, a predictor reliability coefficient from prior investigations (e.g., .80) may be used as an estimate.

When both criterion and predictor reliabilities are missing from a primary study, they are assumed fixed, and the sampling variance formula for the corrected correlation can be written as:

$$\hat{V}\left( {\hat{\rho }_{xy} } \right) = \frac{{r_{xx}^{2} r_{yy}^{2} \left( {1 - r_{xy}^{2} } \right)^{2} }}{{\left( {n - 1} \right)\hat{W}^{3} }}$$
(7)

Assumed reliability values are required for r xx and r yy in Eq. 7. In this case, predictor and criterion reliability coefficients from prior investigations may be used as estimates (e.g., .80 and .52, respectively). The treatment of reliability information and sampling error in this situation is somewhat similar to how reliability and sampling error are handled within current group-level meta-analytic studies. That is, sampling error in the sample-based reliabilities is NOT taken into account.

The above sampling variance formulas cover all situations related to the availability of reliability information reported in primary studies. As noted in a previous section, ICC(2) values can logically be inserted for r xx and r yy whenever this reliability information is presented for the group-level predictor, criterion, or both. When ICC(2) values are inserted into the equations, based on assumptions associated with the availability of sample-based reliability information, the sampling variance formulas account for sampling error in the group-level reliability estimates.

We next illustrate how the preceding equations can be employed, with sporadically available reliability information, as part of sample-based meta-analyses to estimate the mean and variance of corrected correlations at the group level. As referenced within the literature, these procedures will be collectively referred to as the Raju–Burke–Normand–Langlois (RBNL) meta-analytic procedures.Footnote 2 When applying these procedures, we also illustrate how individually corrected correlations and their sampling variances as well as how meta-analytic findings are affected by the use of available ICCs and other group-level reliability values versus the use of assumed population reliability values.

Illustrative Examples

In an effort to provide realistic, illustrative examples of scenarios concerning both the availability of reliability information and the use of literature-based (assumed population) reliabilities for missing reliabilities, we focus on the organizational climate–work group performance literature. We do so for two reasons. First, the organizational climate domain has the largest number of published studies with aggregated group-level data within the fields of organizational psychology and management (Wallace et al. 2016). Second, performance is one of the more important criterion variables of interest in organizational psychology (Austin and Villanova 1992). For purposes of illustrating the preceding procedures with varying amounts of reliability information reported in primary studies, we focus on three distinct components of the criterion domain: behavioral-oriented, productivity-oriented, and health/safety outcomes. Based on past research (see Christian et al. 2009), we expect a climate focused on concern for employees to relate positively with behavior-oriented performance measures (RQ1) and productivity-oriented outcomes (RQ2), and we expect concern for employees to relate negatively with health/safety outcomes (RQ3).

Meta-Analytic Procedures

A recent meta-analysis on organizational climate–work group performance relationships (Wallace et al. 2016) provided the foundation for demonstrating the proposed procedures. The criteria for inclusion of primary studies were consistent with the approach of Wallace et al. such that to be included in the current meta-analyses, “we retained studies that (a) reported an effect size between one or more aggregated work climate variables and one or more performance indicators; (b) presented relationships for climate and criterion variables at the group, team, or unit levels; and (c) provided appropriate justification for aggregation of variables (or enough data to ascertain aggregation suitability)” (Wallace et al. 2016, pp. 847–848). With that said, not all studies in the Wallace et al. (2016) meta-analysis were included herein. For example, Wallace et al. (2016) examined relationships between multiple organizational climate variables and multiple criteria (i.e., worker attitudes, customer satisfaction). Although the Wallace et al. (2016) framework consisted of sorting climate into the higher-order factors of concern for customers and concern for employees, only the higher-order factor of concern for employees was considered for the purposes of the illustrative examples in this study. Also, because the purpose of the current study was to demonstrate a statistical procedure and for reasons of parsimony, only studies that reported relationships between concern for employees and performance were used.

Coding of Studies

Two of the authors independently coded all studies. Of the 62 samples, 16 did not fit the inclusion criteria. The reasons for excluding studies were that data were only reported at the individual level or the criterion was something other than performance (i.e., attitudes, customer satisfaction, etc.). In cases of disagreement, the coders arrived at consensus through an in-person discussion.

Group-level reliabilities (i.e., ICC(2)s and stability coefficients) were recorded when reported in primary studies. If only an ICC(1) was reported for the predictor, criterion, or both, it was converted into an ICC(2) using the Spearman–Brown formula to estimate the reliability of the group means (Bliese 2000). In the few instances where primary studies reported multiple climate or performance information, a composite correlation was computed using formulas presented in Hunter and Schmidt (2004). Arguably, computing correlations based on composites is more construct valid than correlations based on a single measure (Thoreson et al. 2003).

Organizational climate was coded such that the lower-level factors of concern for employees were coded when available. In addition, we coded for other climate variables that constituted more specific concerns for employees such as bullying, justice, safety, and support, to name a few.

As previously discussed, the criterion domain was separated into three dimensions: behavioral-oriented performance, productivity-oriented outcomes, and health/safety-oriented outcomes. The behavioral performance dimension consisted of any ratings of overall performance, contextual performance, counterproductive performance/deviance, as well as service and task performance. The productivity performance dimension consisted of studies that reported financial (i.e., ratio, sales, etc.) or productivity outcomes (i.e., piece-rate, counts, etc.). Finally, the health/safety dimension of performance included measures of accidents and injuries.

Meta-Analytic Calculations

Following the RBNL procedures for tests of each research question, the mean and variance of corrected correlations were estimated as well as the random effects standard error for the mean corrected correlation (see Burke and Landis 2003). Given the focus in the current study on correcting individual correlations with available sample-based reliabilities, we also computed the individually corrected correlations and their sampling variances. The RBNL meta-analytic procedures were optimal for this study, as reliability information was occasionally missing for the predictor, criterion, or both, in studies examining group-level phenomena. Again, corrections for range restriction were not considered in light of the overall goals of the current investigation.Footnote 3

For analyses with sample-based reliabilities, meta-analyses were conducted with alternative means for estimating missing predictor and criterion reliabilities. In Scenario 1 the average of the available ICC values was used for missing predictor reliabilities and the average of the available criterion reliabilities was used for missing criterion reliability values. These values were .79 for organizational climate, .76 for behavioral-oriented performance, and .84 for both health/safety and productivity outcomes. In sum, Scenario 1 is a situation where averages are used in place of missing predictor and criterion reliabilities and basic sampling error due to sample size (N) along with sampling error in the available sample-based predictor and criterion reliabilities is taken into account.

In the Scenario 2, we relied on assumed population reliabilities values for missing predictor and criterion reliabilities. The predictor reliability value was .80 for organizational climate. This value was the average group-level (test–retest) reliability for work perception measures reported in the large-scale studies by Harter et al. (2002, 2010). An assumed population reliability value of .52 was used for behavioral-oriented performance. This value was, in part, based on the average business-unit-level reliability for customer perceptions (which include perceptions of sales personnel performance) reported in the Harter et al. (2002) of .528, and the common assumed population reliability value of .52 for job performance measures (see LeBreton et al. 2014). The assumed population reliability of .85 for both health/safety and productivity outcomes was based on the average reliabilities for business-unit outcomes in the Harter et al. studies, which ranged from .78 to .93. That is, .85 is approximately at the mid-point of this range in average reliabilities for business-unit outcomes (i.e., productivity outcomes, financial performance measures, etc.). In sum, Scenario 2 is a situation where assumed population reliabilities are used in place of missing predictor and criterion reliabilities and basic sampling error due to sample size (N) along with sampling error in the available sample-based predictor and criterion reliabilities is taken into account.

Finally, analyses in Scenario 3 were conducted with only assumed (literature-based) predictor and criterion reliability values for all primary studies. That is, all reliabilities were assumed, fixed population values and set to .80, .52, and .85 for organizational climate, behavior-oriented performance, and health/safety/productivity outcomes, respectively. Scenario 3 is a situation where only sampling due to sample size (N) is taken into account.

These three scenarios were created to cover typical situations for how reliability estimates are currently handled in meta-analyses. Together, the alternative analyses allow for comparisons of how individually corrected correlations and their sampling variances are affected by the use of available ICCs and other types of reliabilities versus assumed population reliabilities as well as how meta-analytic results are affected by the use of alternative means for estimating reliabilities.

Supplemental Meta-Analyses

In addition to presenting meta-analytic results from scenarios reflecting natural nuances of group-level data, supplemental meta-analyses were conducted that restricted the reliability data that entered the meta-analysis. These supplemental analyses were intended to further demonstrate how the degree of available reliability information affects parameter estimates. While many possible restrictions could be imposed on the sample reliability data, we chose a 50 % reduction in available reliability information from what was originally reported. Given that supplemental analyses were employed to illustrate how findings can change based on a restricted condition, the retained reliabilities were from the lower portion of the reliability distribution. This restriction was only imposed on the predictor and criterion reliability distributions for studies that provided data for Research Question 1, where there was sufficient reliability data within the primary studies to make a comparison between findings with all sample-based reliability data and a condition with restricted reliability data. This reduction in reliabilities for RQ 1 resulted in a condition with roughly 40 and 15 % available predictor and criterion reliability information across all primary studies, respectively. These restricted reliability percentages are somewhat reflective of actual percentages of available reliability information for the relationships examined in RQ2 and RQ3.

The reductions in available reliabilities for the supplemental analyses resulted in adjustments to the sample-weighted reliability averages used when a study’s predictor and/or criterion reliability value was missing. For Scenario 1, where the average sample-based reliability was used for a missing reliability value, the new values were .58 and .55 for the organizational climate and behavioral performance, respectively. For Scenario 2 where assumed population values were used for missing reliabilities, the new assumed reliability value for organizational climate was .60. This value was chosen as it closely approximates the sample-weighted reliability from Scenario 1 (for restricted reliabilities) and it is reflective of group-level test–retest reliabilities for attitudinal measures reported in the literature (e.g., see Harter et al. 2002). Consistent with arguments made above, the criterion reliability value remained at .52 for behavioral performance, as it is both a literature-based value and one that is consistent with the sample-weighted reliability from Scenario 1 (for restricted reliabilities). For Scenario 3 where all reliabilities are assumed population values, .60 and .52 were employed for all predictor and criterion reliability values, respectively.

Primary and Meta-Analytic Results

Because the focal climate–performance research questions reflect situations wherein varying proportions of reliability information are reported, results are presented separately for each question. More specifically, a description of the treatment of reliability estimates (assumed fixed vs. sample-based) across the three scenarios and their resulting primary and meta-analytic outcomes will be presented for the test of each relationship. Organizing the results in this fashion is intended to clarify the type of reliability information reported in the primary studies for each meta-analysis and illustrates how reliability was treated when estimating a particular organizational climate–work group performance relationship. For each research question, we highlight how individually corrected correlations and their sampling variances differ (or by contrast, don’t change) for primary studies for each of the three scenarios. Subsequent to the presentation of primary results for each research question, we discuss the meta-analytic finding for that organizational climate–performance relationship.

Research Question 1

RQ1 focused on the relationship between concern for employees and behavioral-oriented performance. Of note, this relationship had the largest number of effect sizes (k = 37) relative to tests of the other expected relationships. Following the proposed correction procedures, Eq. 3 was employed when sample-based reliabilities were available for both the predictor and criterion. For Scenario 1, if only predictor reliability information (r xx ) was missing, then a sample-size-weighted reliability estimate was used assuming a fixed value (i.e., Eq. 6). Similarly, if only criterion reliability (r yy ) information was missing, then a sample-size-weighted reliability estimate was used (i.e., Eq. 5). If both predictor and criterion information were missing, the sample-size-weighted reliability estimates for both variables were assumed as fixed (i.e., Eq. 7). For Scenario 1, individually corrected correlations and their sampling variances for primary studies included in testing RQ1 are presented in Table 1. These statistics are also presented in Table 1 for Scenario 2 (use of assumed, literature-based reliabilities for missing reliability values) and Scenario 3 (use of assumed population reliabilities for all reliabilities).

Table 1 Primary study results related to Research Question 1: the relationship between concern for employees and behavioral-oriented outcomes

Table 1 indicates that the evaluation of RQ1 involves a situation with almost complete sample-based predictor reliabilities (81 %) and partially available criterion reliabilities (30 %). As shown in Table 1, the alternative means for estimating group-level reliabilities had a meaningful impact on the magnitudes of many corrected correlations and their sampling variances. These findings are important as they suggest that researchers could arrive at different conclusions about the relationship between organizational climate and workgroup performance. For instance, when confidence intervals are computed for each of the three scenarios for Primary Study # 2, these intervals do not overlap. Specifically, the confidence interval resulting from Scenario 1 was (.74, .86), whereas the confidence intervals for Scenarios 2 and 3 were (.87, 1.0). When these differences compile across all primary studies, conclusions at the meta-analytic level are likely affected.

Associated meta-analytic results for the three scenarios are presented in Table 2. As shown in Table 2, the relationship between concern for employees and behavioral-oriented performance was both positive for each scenario (i.e., .44, .48, and .51 for Scenarios 1, 2, and 3, respectively) and statistically significant, with all confidence intervals excluding zero. Notably, not only did the magnitudes of the relationship between climate and performance differ depending on how reliabilities were estimated, but the confidence intervals also varied across the three scenarios. Use of sample-based reliabilities resulted in more conservative findings about the relationship.

Table 2 Meta-analytic results for tests of relationships between concern for employees and criterion variables

Research Question 2

RQ2 focused on the relationship between concern for employees and productivity-oriented outcomes. This relationship had fourteen reported effects. Following the same procedures used to evaluate RQ1 for Scenario 1, the sample-size-weighted reliability indices were inserted into sampling variance equations wherever reliability information was missing for the predictor, criterion, or both. Again, if predictor and criterion reliability values were available, then Eq. 3 was employed for estimating the sampling variance of the corrected correlation. If only reliability information was missing for the predictor, then we treated the predictor as assumed fixed (i.e., Eq. 6). If only criterion reliability information was missing, then we treated the criterion as assumed fixed (i.e., Eq. 5). In the situation where both predictor and criterion information were missing, these values were treated as assumed fixed (i.e., Eq. 7). For Scenario 1, individually corrected correlations and their sampling variances associated with RQ2 are presented in Table 3. These statistics are also presented in Table 3 for Scenario 2 (use of assumed reliabilities for missing reliability values) and Scenario 3 (use of assumed population reliabilities for all reliabilities).

Table 3 Primary study results related to Research Question 2: the relationship between concern for employees and productivity-oriented outcomes

Table 3 illustrates a situation with a relatively large proportion (i.e., 79 %) of sample-based predictor reliabilities and a modest percentage of criterion reliabilities (i.e., 36 %). Consistent with findings from RQ1, the alternative approaches for estimating group-level reliabilities had a meaningful impact on the magnitudes of many corrected correlations and their sampling variances.

As shown in Table 2, the relationship between concern for employees and productivity-oriented outcomes was both positive and statistically significant across the three scenarios. These results suggest that greater concern for employees is related to greater levels of workgroup or organizational productivity. Notably, the findings varied little between the scenarios with respect to RQ2 reflecting the fact that the average sample-based reliabilities were comparable in magnitude to the literature-based, assumed population reliabilities for both the predictor and criterion.

Research Question 3

RQ3 focused on the relationship between concern for employees and health/safety-oriented outcomes. Only eight effects were available for this analysis. As in the previous analyses for Scenario 1, the sample-size-weighted reliability was inserted into sampling variance equations whenever this information was missing. Likewise for Scenario 1, the sample-size-weighted reliability for health/safety outcomes was inserted into sampling variance equations when this value was missing from primary studies. For Scenarios 2 and 3, the same treatments of reliability as described for RQ1 and RQ2 were also applied with these analyses.

Individually corrected correlations and their sampling variances are presented in Table 4 for Scenarios 1, 2, and 3. Table 4 illustrates a situation with relatively fewer sample-based predictor (63 %) and criterion reliabilities (13 %) than either of the prior analyses. As with findings for RQ1 and RQ2, the corrected correlations and sampling variances differed according to how reliability information was handled across the three scenarios.

Table 4 Primary study results for Research Question 3: the relationship between concern for employees and health/safety-oriented outcomes

The meta-analytic results reported in Table 2 regarding RQ3 indicate that conclusions about statistical significance differ depending on whether reliabilities are sample-based or literature-based. For Scenarios 1 and 2, the confidence intervals for M ρ include zero and, thus, indicate a statistically nonsignificant relationship between organizational climate and health/safety outcomes. However, in Scenario 3 when all reliabilities are assumed population reliabilities, the confidence interval for the relationship between organizational climate and health/safety outcomes excludes zero and becomes statistically significant. Again, we caution that the findings pertaining to RQ3 are based on a small number of studies, where there is likely to be more variability due to second-order sampling of studies.

Supplemental Meta-Analytic Findings

Results from the supplemental analyses for Research Question 1 with restricted reliabilities for Scenario 1 to Scenario 3 are presented in Table 2. As expected, the magnitudes of M ρ are substantially greater when reliability information is restricted to the lower portion of the reliability distribution. Moreover, for Scenario 1, the reduction in available reliability information greatly affects the standard error of M ρ (i.e., the square root of the sampling variance of M ρ ), increasing it by 33 % from the original estimate of \({\text{SE}}_{{M_{\rho } }}\) . As noted above, there are 50 % fewer original predictor and criterion reliability sampling variances included in the estimate of the \({\text{SE}}_{{M_{\rho } }}\) with restricted reliability data. Notably, this reduction in sample-based reliabilities and their respective sampling variances results in not only the 33 % increase in \({\text{SE}}_{{M_{\rho } }}\), but it also leads to a 95 % confidence interval for M ρ that very minimally overlaps with the original 95 % confidence interval for M ρ . Further reductions in sample-based reliabilities and their sampling variances, not reported here, produce even more marked changes in the original and revised estimates of \({\text{SE}}_{{M_{\rho } }}\) and substantive conclusions about the relationship between organizational climate and behavioral performance at the group-level of analysis.Footnote 4

For Scenarios 2 and 3 with restricted reliability data, the results are very similar to those for Scenario 1 with restricted reliability data. These findings are not surprising given that the primary difference between these scenarios was somewhat lower assumed population predictor reliability for missing reliabilities in Scenarios 2 and all predictor reliabilities in Scenario 3 (i.e., .6 vs. .8). However, a comparison of the original estimates of M ρ and \({\text{SE}}_{{M_{\rho } }}\) and the respective estimates based on restricted reliability data for Scenarios 2 and 3 indicates meaningful differences. Together, these differences result in original and revised confidence intervals for M ρ that are considerably divergent.

Discussion

This paper presented arguments for treating ICC(2) information in primary studies as an estimate of group-level reliability when correcting correlations for unreliability in measures of group- or business-unit-level phenomena, and employing sampling variance equations presented by Raju and colleagues (Raju and Brand 2003; Raju et al. 1991) to account for sampling error in group-level reliability estimates. Importantly, our illustrative analyses indicated how findings pertaining to both individually corrected correlations and meta-analytic results can change depending on assumptions about and usage of available ICCs versus assumed population reliability. These differences were evident in the magnitudes of corrected correlations and sampling variances in both primary studies and meta-analyses. Notably, the use of available ICCs and other types of group-level reliabilities tended to produce more conservative estimates of relationships between variables within primary studies. Although the differences in the magnitudes of findings were not as pronounced with respect to the meta-analyses, it is important to note that they could lead a researcher to draw different conclusions. For instance, conclusions about the strength of the relationship between organizational climate and behaviorally oriented performance and the statistical significance of the relationship between organizational climate and heath/safety outcomes differed depending on the use of sample-based versus literature-based (assumed population) reliabilities.

The point that conclusions about substantive relationships between variables can change based on availability of sample-based reliabilities was further illustrated in supplemental meta-analyses with restricted reliability data. Notably, a 50 % reduction in sample-based reliabilities produced an estimated organizational climate–behavioral performance relationship that not only differed considerably in magnitude from the original estimate, but also one where there was very minimal overlap in confidence intervals for M ρ . Importantly, these differences resulted solely from reducing the proportion of sample-based reliabilities from the original to the supplemental analyses. The differences between the original and supplemental analyses more conclusively illustrate how the availability and treatment of group-level reliabilities impact meta-analytic findings.

We note that while our paper is not directed toward the specific assumed reliability values to employ in the absence of sample-based group-level reliabilities, our demonstrations indicate the importance of using more conservative assumed reliability values in situations where either sample-based reliabilities are not reported or missing altogether. In particular, for RQ1, the findings across the three scenarios suggest that researchers should rely on the average sample-based reliability (as opposed to an assumed literature-based reliability) for missing reliability values when meaningful percentages (e.g., 30 % or more) of the sample-based predictor and criterion reliabilities are available. That is, the findings from Scenario 1, which relied on the average sample-based reliability value for missing reliabilities, were more conservative in comparison with the findings for Scenarios 2 and 3 that relied on assumed population values for missing reliabilities.

Importantly, our illustrative analyses provide guidance to primary and meta-analytic researchers in regard to how to correct group-level correlations for unreliability in the predictor, criterion, or both whenever and in whatever proportions the artifact information is available. As such, the conduct of these demonstrations is consistent with LeBreton et al. (2014) and Burke et al.’s (2014) calls for using more accurate and reasonable reliability estimates when correcting correlations for measurement error. Also, our work suggests a need for primary researchers to attend more to estimating and reporting the reliability of aggregated measures to facilitate the estimation of corrected correlations. This need is particularly evident in regard to the reporting of criterion reliabilities in group-level studies, as the largest percentage of available criterion reliabilities for any meta-analysis included herein was 36 %. As noted above, in scenarios where predictor and criterion reliability information is missing, our findings point to the need to give consideration to the use of available (average) ICCs and other sample-based reliabilities as opposed to the reliance on assumed, literature-based population reliabilities. While our findings do not address the accuracy of parameter estimates, they do point to the possible conservative nature of findings based on the use of sample-based reliabilities in both primary and meta-analytic studies.

In conclusion, this investigation presented a rationale and procedures for using sample-based predictor and criterion reliability information to estimate relationships between group-level variables within primary and meta-analytic studies. Given that lack of complete reliability data is a common problem in psychological and organizational research, a primary contribution of this study is the illustration of how to handle situations with varying levels of reported group reliability information and the implications of assumptions made (i.e., treatment of reliabilities as assumed fixed vs. sample-based) when correcting correlations based on aggregated data. Our understanding regarding the accuracy of meta-analytic findings at the group-level can be advanced through future simulation work examining different levels of availability of reliability information, different means for estimating population-level reliabilities for primary studies with missing reliability values, and with respect to the number of respondents per group. These recommendations are offered in the spirit of improving meta-analytic parameter estimates and our understanding of relationships between constructs at the group-level of analysis.