Abstract
This study describes extended Anderson and Hauck procedures for equivalence testing of slope coefficients and mean responses in one and two regression lines. The general formulation of asymmetric equivalence ranges permits a wide variety of equivalence questions to be tested for a target magnitude or a negligible value. Specifically, the equivalence tests are useful for assessing negligible trend and similar response in a single regression line, and for evaluating unimportant interaction-moderation effect and comparable simple effect between two linear regression lines. The associated power functions and sample size procedures are also derived and compared under the random and fixed model settings. According to the analytic justification and empirical assessment, the exact approaches have a clear advantage over the approximate formulas for accommodating the full stochastic nature of both the response and predictor variables. Computer algorithms are also provided for conducting the proposed equivalence tests, power calculations, and sample size determinations in simple linear regressions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Many studies are designed explicitly to show that there is an absence of effects of competing scenarios or theories. However, they sometimes base their findings on failing to reject a null hypothesis rather than confirming a hypothesis of equivalence. For comparison of treatment effects, the traditional hypothesis test of difference aims to determine whether the treatment effects differ from one another. Under such condition, the traditional difference tests are inappropriate to establish equivalence, because failing to reject a no-difference hypothesis test does not necessarily support the conclusion of equivalence. There has been a growing awareness and demand of appropriate techniques for assessing equivalence and similarity in the behavioral and managerial literature. For example, related discussions of theoretical perspectives and practical issues can be found in Cashen and Geifer [1], Cortina and Folger [2], Edward and Berry [3], Frick [4], Rogers, Howard, and Vessey [5], Seaman and Serlin [6], Stanton [7], Stegner, Bostrom, and Greenfield [8], and Steiger [9], among others.
To assess an observed effect size that is clinically negligible or practically non-important, the recommended equivalence test is to ascertain whether the observed effect size falls inside the selected equivalence range. The technical discussion and fundamental review of different types of mean equivalence tests were presented in Berger and Hsu [10], Meyners [11], and Schuirmann [12]. Despite there are more powerful tests, two prominent procedures have received considerable attention in the literature. They are the two one-sided tests (TOST) method of Schuirmann [13] and Westlake [14] and the equivalence approach of Anderson and Hauck [15] and Hauck and Anderson [16]. These two procedures of mean equivalence admit a simple methodological reform for assessing equivalence. Their flexible settings allow generalizations to more complex experimental designs. Accordingly, Dixon and Pechmann [17], and Schmidt and Meyer [18] have extended the TOST to assess whether the linear trend is practically negligible in linear regressions. Also, Counsell and Cribbie [19] described an extension of the Anderson and Hauck procedure for comparing the slope coefficients of two regression lines.
Despite the conservative nature, TOST maintains a good control of Type I error rate at the specified level. However, the actual Type error rate of TOST can be substantially less than the nominal level and the rejection region can be empty when the equivalence ranges are narrow, particularly with small sample sizes. Across the practical and diverse research designs for equivalence assessment, the undertaken equivalence bounds and associated sample sizes may not be all that large. Under such circumstances, it is of methodological concern to consider alternative procedures with proper rejection region and good Type I error control. On the other hand, the normal approximation presented in Counsell and Cribbie [19] for p-value calculations is only one of the three possible methods proposed in Anderson and Hauck [15]. Following the results of an extensive simulation study, Anderson and Hauck [15] recommended the central-t approach, instead of the least accurate normal approximation. In view of the absence of vital clarification for theory development and supportive technique, it is desirable to properly generalize the Anderson and Hauck procedure for linear regression analysis.
The present article aims to contribute to the development of equivalence methodology for linear regressions in three aspects. First, using the central-t approximation, extended Anderson and Hauck procedures are presented for equivalence testing of slope coefficient and mean response in one and two regression lines. The general formulation of asymmetric equivalence ranges permits a wide range of equivalence questions to be tested. Consequently, they are useful for assessing negligible trend and similar response in a single regression line, and for evaluating unimportant interaction-moderation effect and comparable simple effect between two linear regression lines. Second, the associated power functions and sample size procedures are also derived and compared under the random and fixed model settings. According to the analytic justification and empirical assessment, the exact approaches have a clear advantage over the approximate formulas for accommodating the full stochastic nature of both the response and predictor variables. It should be noted that exact power and sample size calculations were not addressed in Counsell and Cribbie [19]. Third, the proposed equivalence techniques are not available in popular software packages. Computer algorithms are provided for critical value computations, power calculations, and sample size determinations of the extended Anderson and Hauck procedures. The suggested power and sample size calculations should be useful for planning equivalence studies about the much-discussed appraisals of interaction-moderation effect and simple effect in behavioral and management research.
2 Single Regression Line
The simple linear regression model is of the form
where Yi is the response score of the ith subject, β0 is the intercept, β1 is the slope coefficient, Xi is the predictor score of the ith subject, and εi are iid N(0, σ2) random variables, i = 1, …, N. The least squares estimator \({\hat{\upbeta}}_{{1}}\) of slope coefficient β1 has the following distribution
where \(SSX = \sum\nolimits_{{i = 1}}^{N} {(X_{i} - {\bar{X}})^{2} }\) and \({\bar{X}} = \sum\nolimits_{{i = 1}}^{N} {X_{i} /N}\). Also, \({\hat{\upsigma}}^{2} = SSE/\nu\) is the usual unbiased estimator of σ2 where SSE is the error sum of squares and ν = N – 2. Moreover, V = SSE/σ2 ~ χ2(ν), where χ2(ν) are chi-square distribution with ν degrees of freedom.
To detect the difference of slope coefficient in terms of H0: β1 = β10 versus H1: β1 ≠ β10, the test statistic has the form
The null hypothesis is rejected at the significance level α if
where \(t_{\nu ,\alpha /2}\) is the 100(1 – α/2) percentile of t(ν) and t(ν) is a t distribution with degrees of freedom ν.
2.1 Equivalence Test of Linear Trend
The primary focus of this article is the test of equivalence, the null and alternative hypotheses are expressed as
where ΔL and ΔU are a priori constants that represent the minimal range for declaring equivalence effect size. The hypotheses with asymmetric equivalence thresholds can be readily rewritten in terms of symmetric equivalence bounds as
where \(\upbeta_{1}^{*}\) = β1 – ΔM, ΔM = (ΔL + ΔU)/2, and Δ = (ΔU – ΔL)/2. An important scenario is to detect a negligible trend by setting ΔU = Δ and ΔL = – Δ so that ΔM = 0 for a bound Δ.
For the given value of the predictor quantity SSX, it is essential to note that
where t(ν, λS) is the noncentral t distribution with degrees of freedom ν and noncentrality parameter λS = (β1 – ΔM)/(σ2/SSX)1/2. To claim the slope coefficient β1 is within the interval (ΔL, ΔU), a natural rejection region to the null hypothesis is
where the two critical values τSL and τSU are chosen to simultaneously attain the nominal Type I error rate
Following the properties of a noncentral t distribution as in Johnson, Kotz and Balakrishnan [20], it can be shown that the two conditions can be simultaneously satisfied by the choice of critical values τSL = –τS and τSU = τS where τS > 0. Hence, the rejection region is of the form
where τS is determined by the condition P{–τS < TS < τS| β1 = ΔL} = α or P{–τS < TS < τS| β1 = ΔU} = α. Note that the error variance is generally unknown and the exact distribution of TS cannot be specified. Following the suggestion in Anderson and Hauck [15], a feasible and accurate approach is to find the critical value τS through the approximation \(T_{S}\text{ } \dot\sim \text{ }T + {\hat{\lambda }}_{S}\) where \(T\sim t(\nu )\) , \({\hat{\lambda }}_{S} = \Delta /({\hat{\upsigma }}^{2} /SSX)^{{{1}/{2}}}\), and
Thus, the optimal quantity τS can be computed by a simple iterative search. Note that the critical value τS is a function of α, Δ, N, \({\hat{\upsigma }}^{{2}}\), and SSX. It does not have an explicit analytic expression and requires a computer program to calculate the actual value. An efficient algorithm is developed for computing the critical value and rejection region for the suggested procedure. Also, the p-value associated with the observed slope estimate \({\hat{\upbeta}}_{1O}\) can be calculated as
where \(T_{O} = ({\hat{\upbeta}}_{{{1}O}} {-}\Delta_{M} )/({\hat{\upsigma }}^{2} /SSX)^{{{1}/{2}}}\). It is apparent that the p-value is computationally easier to obtain than the critical value.
Note that similar discussion was described in Anderson and Hauck [15] for testing two-group mean equivalence. Because of the computational ease of the p-value, they recommend the p-value approach to conclude the decision. Hence, they did not address the calculation and implementation issues of the rejection region and corresponding power function. Accordingly, the sample size procedure for mean equivalence in Hauck and Anderson [16] is less transparent and cannot be readily adopted as a general tool in linear regressions. Moreover, the Anderson and Hauck procedure has an unbounded rejection region as other more powerful tests. The counterintuitive rejection of nonequivalence with arbitrarily large values of sample variance has been debated extensively in Berger and Hsu [10] and the discussions therein. As a constructive response, they proposed to specify an upper bound on the sample variance beyond which the null hypothesis will never be rejected. Moreover, unlike the TOST, the advantage of the Anderson and Hauck procedure in the Type I error protection for small sample sizes and tight equivalence bounds should also be taken into consideration. The contrasting behavior of the two test procedures is also demonstrated in the subsequent numerical examples.
2.2 Equivalence Test of Mean Response
The equivalence appraisal can also be applied to the mean response μ = β0 + Xβ1 at a focal predictor value XF. The null and alternative hypotheses are presented as
where ΔL and ΔU are a priori constants that represent the threshold range for declaring practical equivalence. With the least squares estimators \(({\hat{\upbeta}}_{0} , \, {\hat{\upbeta}}_{1} )\) of (β0, β1), the linear estimator \(\hat{\upmu } \, = \, {\hat{\upbeta}}_{0} + {X_{F}\hat{\upbeta}}_{1}\) has the distribution
where \(H_{M} = { 1}/N + \, (X_{F} {-}\overline{X})^{2} /SSX\). It is useful to note that
where the noncentrality parameter λM = (μ – ΔM)/(σ2HM)1/2 and ΔM = (ΔL + ΔU)/2.
Following the same principle for slope coefficient assessment, a potential rejection region to the null hypothesis is of the form
where the critical value τM is chosen to attain the nominal Type I error rate when μ = ΔL and ΔU. The proposed approach is to find the critical value through the approximate evaluation
where T ~ t(ν), \({\hat{\lambda }}_{M} = \Delta /({\hat{\upsigma }}^{2} H_{M} )^{1/2}\), and Δ = (ΔU – ΔL)/2. Note that the critical value τM is a function of α, Δ, N, \({\hat{\upsigma }}^{2}\), and HM. Moreover, an iterative algorithm is required to compute the critical value.
2.3 A Numerical Example
The numerical details for the equivalence tests of slope coefficient and mean response are demonstrated with the data of training study described in Table 6.1 of Huitema [21] about the relation between the response variable (Y: achievement) and the predictor variable (X: aptitude) for three types of training program.
For the first training group with N = 10, the sample means of the predictor and response variables are \(\overline{X} = 52.00\) and \({\overline{Y}} = 30.00\), respectively. Moreover, the least squares estimates of the linear regression line between achievement and aptitude measurements are obtained as \(\{ {\hat{\upbeta}}_{0}, \, {\hat{\upbeta}}_{1}\} \, = \, \{ 4.1033, \, 0.4980\}\), and the sample variance of error is \({\hat{\upsigma }}^{{2}} { = 7}0.{5615}\). For illustration, an equivalence test of slope coefficient is performed in terms of H0: β1 ≤ 0.25 or 0.75 ≤ β1 versus H1: 0.25 < β1 < 0.75 (ΔM = 0.50 and Δ = 0.25). With SSX = 2014.00 and α = 0.05, the test statistic and critical value are computed as TS = –0.0106 and τS = 0.1598, respectively. Thus, the nonequivalence null hypothesis is rejected at the significance level 0.05. The conclusion indicates that the slope coefficient is essentially equivalent to 0.50 with no more than 0.25 difference.
The equivalence test of mean response can also be performed with the estimated mean response \(\hat{\upmu }{ = 29}.00{4}0\) at XF = 50. Using ΔM = 29 and Δ = 4, the equivalence test of mean response is conducted in terms of H0: μ ≤ 25 □or 33 ≤ μ versus H1: 25 < μ < 33. The test statistic and critical value can be computed as TM = 0.0015 and τM = 0.1966, respectively for α = 0.05. Hence, the nonequivalence null hypothesis is rejected at the significance level 0.05. The analysis suggests that the mean response at XF = 50 is nearly within a bound of 4 around 29. Moreover, it can be shown that the resulting rejection regions of the TOST procedures are empty sets and there is no chance to reject the nonequivalence null hypothesis of the slope coefficient and mean response. Apparently, the TOST approach may not be a reliable procedure when the sample size is small, especially for a tight equivalence range. Such deficiency agrees with the explication of TOST for assessing mean equivalence in Schuirmann [12].
2.4 Power and Sample Size Calculations
When planning and conducting a research, the actual values of the continuous measurements of response and predictor variable for each subject are available only after the observations are obtained. In addition to the randomness of normal responses, the stochastic nature of predictor variables has to be taken into account in power analysis under the random and unconditional context in linear regression study. A useful and convenient framework is to assume the continuous predictor variables {Xi, i = 1, …, N} have the independent and identical normal distribution N(μX, \(\upsigma_{X}^{2}\)) as in Shieh [22, 23] within the context of ANCOVA.
Under the prescribed stochastic consideration of {Xi, i = 1, …, N}, it can be readily established that K = SSX/\(\upsigma_{X}^{2}\) ~ χ2(κ) where κ = N – 1. The power function of the equivalence procedure for slope coefficient can be expressed as
Note that the critical value τS depends on the two quantities \({\hat{\upsigma}}^{2}\) and SSX. With \({\hat{\upsigma}}^{2} = \upsigma^{{2}} (V/\nu )\) and \(Hs = { 1}/SSX = { 1}/(\upsigma_{X}^{2} K)\), the power function ΠS can be rewritten as
where BS = (ΔM – β1)/(σ2Hs)1/2 + τS(V/ν)1/2, AS = (ΔM – β1)/(σ2Hs)1/2 – τS(V/ν)1/2, Φ(⋅) is the cumulative density function of the standard normal distribution, and the expectation E(K,V) is taken with respect to the chi-square distributions of K and V.
Under the random predictor framework, the normality assumption implies that
where \({{\hat{\upsigma }}}_{X}^{2} = SSX/\upkappa\) and \(\lambda _{{\text{X}}} = (\mu _{{\text{X}}} - X_{{\text{F}}} )/(\sigma _{{\text{X}}}^{2} /N)^{{1/2}}\) Also, the power function of the equivalence procedure for mean response is of the form
In this case, the critical value τM depends on the two terms \({\hat{\upsigma}}^{2}\) and HM. With \({\hat{\upsigma}}^{2} = \upsigma^{{2}} (V/\nu )\) and \(H_{M} = \, 1/N + T_{X}^{2} /(\upkappa N)\), it follows that the power function ΠM can be expressed as
where BM = (ΔM – μ)/(σ2HM)1/2 + τM(V/ν)1/2, AM = (ΔM – μ)/(σ2HM)1/2 – τM(V/ν)1/2, and E(TX,V) is taken with respect to the joint distribution of TX and V.
The prescribed power functions ΠS and ΠM for slope coefficient and mean response involve a mixture of noncentral t distributions through the distribution K and TX of the predictor variables, respectively. It is appealing to simplify these power functions because of computational complexity. Under the normal assumption N(μX, \(\upsigma_{X}^{2}\)) for the predictors {Xi, i = 1, …, N}, the standard results show that \(E[\overline{X}] = \upmu_{X}\) and E[SSX] = \(\upkappa \upsigma_{X}^{2}\). Hence, an approximation of unconditional distribution can be obtained for the test statistic \(T_{S}\text{ } \dot\sim \text{ } t(\nu ,\lambda_{SA} )\) where λSA = (β1 – ΔM)/(σ2HSA)\(^{1/2}\) and HSA = 1/(\(\upkappa \upsigma_{X}^{2}\)). It yields a simplified power function for the equivalence test of linear trend
Moreover, following similar arguments, the test statistic of mean response has the approximate distribution \(T_{M} {\dot{\sim}} t(\nu ,\lambda _{{MA}} )\) where λMA = (μ – ΔM)/(σ2HMA)\(^{1/2}\) and HMA = 1/N + (μX – XF)2/(\(\upkappa \upsigma_{X}^{2}\)). Then, an approximate power function for the equivalence test of mean response is denoted by
The approximate power functions of the equivalence procedures provide computational shortcuts to the exact formulas. The simple formulations can be readily implemented with the embedded probability functions of a noncentral t distribution in standard software systems. On the other hand, the prescribed analytic justifications provide statistical support for the exact power functions. An immediate application of the power functions is to compute optimal sample sizes needed for the equivalence procedure to attain the specified power under the designated model configurations. The fundamental discrepancy between the exact and simplified power and sample size calculations will be further assessed in the succeeding numerical investigations.
2.5 Numerical Assessments
As an exemplifying framework, the model configurations follow that of the prescribed training study in Huitema [21]. Accordingly, the sample estimated of regression coefficients and variance component of the first training group are designated the working configurations: {β0, β1} = {4.1033, 0.4980}, and σ2 = 70.5615, respectively. The mean and variance of the normal predictors are chosen as {μX, \(\upsigma_{X}^{2}\)} = {52.00, 223.7778}. The equivalence thresholds (ΔL, ΔU) are defined as ΔL = ΔM – Δ, ΔU = ΔM + Δ, and various magnitudes of ΔM and Δ are evaluated. For the equivalence tests of linear trend, the selected values are ΔM = 0.5 with Δ = 0.2, 0.3, and 0.4. The equivalence tests of mean response are examined at XF = 50 with ΔM = 29 for μ = 29.0040 under three equivalence bounds Δ = 4, 5, and 6.
With these specifications, the required sample sizes of both exact and approximate methods were computed for the chosen power value 1 – β = 0.80 and significance level α = 0.05. The estimated sample sizes for the equivalence tests of linear trend and mean response are presented in Table 1. Note that the resulting sample sizes cover a reasonable range of magnitudes without being unrealistic or excessively large. More importantly, the estimated sample sizes of the exact approach are consistently larger than or equal to those of the approximate procedure for all 6 cases. For ease comparing the accuracy of power functions, the estimated power or attained power are also summarized in Table 1. Because of the underlying metric of integer sample sizes, the estimated values of both exact and approximate procedures are marginally larger than the nominal level for all cases.
In the second stage, Monte Carlo simulation studies were performed to justify the performance of power and sample size calculations. With the computed sample sizes, parameter configurations, and nominal alpha level, estimates of the true power were computed via Monte Carlo simulation of 10,000 independent data sets. For each replicate, the sample size N predictor values were generated from the selected normal distributions. The outcome values of predictor variables are then designated to determine the mean responses for generating the normal responses with the specified linear regression model. Next, the equivalence test statistics were computed and the simulated power was the proportion of the 10,000 replicates whose null hypothesis was rejected at the significance level 0.05. Accordingly, the adequacy of the approximate and exact sample size procedures is determined by the error (= simulated power – estimated power) between the simulated power of Monte Carlo study and the estimated power computed from analytic power function. The simulated power and error are also presented in Table 1.
The results reveal that the exact approaches are extremely accurate because the associated errors of the 6 cases are all within the small range of –0.0055 to 0.0075. Accordingly, there exists a close agreement between the simulated power and the estimated power of the exact approaches for these settings. On the other hand, the simulated powers for the approximate methods are constantly less than the estimated powers. Specifically, the resulting errors are {–0.0167, –0.0210, –0.0306} and {–0.0057, –0.0069, –0.0177} for the linear trend and mean response, respectively. Although some of the differences are not substantial, it implies that the approximate power functions do not give reliable results for small sample sizes. In short, the adequacy of the approximate power and sample size calculations varies with model configurations. It is clear that the exact techniques are more reliable and accurate than the approximate methods for all cases of linear trend and mean response considered here.
3 Two Regression Lines
The two-group nonparallel simple linear regression model is expressed as
where ε1i and ε2j are iid N(0, σ2) random variables, i = 1, …, N1, and j = 1, …, N2. Note that a traditional ANCOVA model assumes that the regression slopes are equivalent β11 = β12. Accordingly, a test of slope equality is generally required to justify the use of ANCOVA.
Standard results that the least squares estimators \({\hat{\upbeta}}_{{{11}}}\) and \({\hat{\upbeta}}_{{{12}}}\) of slope coefficients β11 and β12 have the following distributions
where \(SSX_{1} = \sum\nolimits_{i = 1}^{{N_{1} }} {(X_{{{1}i}} - \overline{X}} )^{2},\)\(SSX_{2} = \sum\nolimits_{j = 1}^{{N_{2} }} {(X_{{{2}j}} \!-\! {\overline{X}}_{2} } )^{2},{\overline{X}}_{{1}} \! =\! \sum\nolimits_{i = 1}^{{N_{1} }}, {X_{{{1}i}} /N_{{1}} }\) and \({\overline{X}}_{{2}} = \sum\nolimits_{i = 1}^{{N_{2} }} {X_{{{2}i}} /N_{{2}} }.\) The difference of two slope estimators has the distribution
where βD = β11 – β12 and HDS = 1/SSX1 + 1/SSX2. In this case, \({\hat{\upsigma }}^{{2}} = SSE/\nu_{D}\) is the usual unbiased estimator of σ2 and V = SSE/σ2 ~ χ2(νD) where SSE is the error sum of squares and νD = N1 + N2 – 4.
To detect the difference between two slope coefficients in terms of H0: βD = βD0 versus H1: βD ≠ βD0, the test statistic has the form
The null hypothesis is rejected at the significance level α if
3.1 Equivalence Test of Trend Effect
To conduct equivalence test of trend effect or slope difference, the null and alternative hypotheses are expressed as
where ΔL and ΔU are a priori constants that denote the minimal magnitude for declaring equivalence for trend effect. Under the model assumption, it follows that
where the noncentrality parameter λDS = (βD – ΔM)/(σ2HDS)1/2 and ΔM = (ΔL + ΔU)/2. To justify the slope difference βD is within the interval (ΔL, ΔU), a feasible rejection region to the null hypothesis is
where the critical value τDS is chosen to simultaneously attain the nominal Type I error rate when βD = ΔL and ΔU. In practice, the exact distribution of TDS is practically unknown and the critical value τDS can be determined through the approximation
where T ~ t(νD), \(\hat{\lambda }\) DS = Δ/(\(\hat{\sigma }^{{2}}\) HDS)1/2, and Δ = (ΔU – ΔL)/2. The optimal quantity τDS is a function of α, Δ, N1, N2, \({\hat{\upsigma }}^{2}\), and HDS. Although the critical value does not have a closed-form expression, it can be computed by a simple iterative search.
As emphasized in Huitema [21], Kutner et al. [24], Rencher and Schaalje [25], and related texts of research methods, the traditional ANCOVA assumes that the slope coefficients associating the predictor variables with the response variables are the same for each treatment group. The assertion of homogeneous regression slopes implies a lack of interaction effects between a categorical moderator and a continuous predictor in moderation study. Note that the conventional difference test purports to show the regression lines are nonparallel. Hence, the suggested equivalence procedure for trend effect is more appropriate for supporting the equality or comparability of slope coefficients assumption in ANCOVA.
3.2 Equivalence Test of Simple Effect
A related and practical scheme for comparing two regression lines is to assess the difference between two mean responses at a designated predictor value. The simple effect or the mean response difference between two regression lines at XF is defined as
The equivalence test of simple effect is conducted under the null and alternative hypotheses:
where ΔL and ΔU are a priori constants that represent the minimal threshold for declaring essential equivalence.
Using the least squares estimators {\({\hat{\upbeta}}_{{{01}}}\), \({\hat{\upbeta}}_{11}\) \({\hat{\upbeta}}_{{{02}}}\), \({\hat{\upbeta}}_{12}\)} of for the intercept and slope coefficients {β01, β11, β02, β12}, the estimated mean response \(\hat{\upmu }_{{1}}\) and \(\hat{\upmu }_{{2}}\) for mean values μ1 = β01 + Xβ11 and μ2 = β02 + Xβ12 at a specified value XF are
respectively. A natural and unbiased estimator of μD is \(\hat{\upmu }_{{D}} = \hat{\upmu }_{1} - \hat{\upmu }_{2}\) and
where HDM = 1/N1 + 1/N2 + (XF – \({\overline{X}}_{{1}}\))2/SSX1 + (XF –\({\overline{X}}_{{2}}\))2/SSX2. It is important to note under the model assumption that
where the noncentrality parameter λDM = (μD – ΔM)/(σ2HDM)1/2 and ΔM = (ΔL + ΔU)/2. To evaluate whether the simple effect μD is within the interval (ΔL, ΔU), the suggested rejection region is
where the critical value τDM is chosen to simultaneously attain the nominal Type I error rate when μD = ΔL and ΔU. The assessments can be calculated through the approximation
where T ~ t(νD), \(\hat{\lambda }\) DM = Δ/(\({\hat{\upsigma }}^{2}\) HDM)1/2, and Δ = (ΔU – ΔL)/2. The optimal quantity τDM is a function of α, Δ, N1, N2, \({\hat{\upsigma }}^{2}\), and HDM, and it needs to be calculated by an iterative search algorithm.
It should be noted that the equivalence analysis of simple effect or response difference between two linear regression lines is closely related to the Johnson–Neyman problem of Johnson and Neyman [26] and Potthoff [27]. The Johnson–Neyman regions of significance and non-significance are identified with the conclusion to reject or the failure to reject the conventional hypothesis of no difference between mean responses. Technical illustrations and implications can be found in Hunka [28], Rogosa [29], and Spiller, et al. [30], among others. Contrastly, the proposed equivalence test of simple effect can be used to identify the regions of equivalence and nonequivalence or the ranges of predictor values that the simple effect is equivalent and nonequivalent.
3.3 An Application
The prescribed example about training study in Table 6.1 of Huitema [21] is utilized to demonstrate the suggested equivalence testing of trend and simple effects between the first two treatments. In addition to the summary information of the first group, the second group of training type has N2 = 10, \(\overline{Y}_{2}\) = 39.0000 and \({\overline{X}}_{2}\) = 47.0000, and SSX2 = 1798.00. The regression coefficient estimates are \({\text{\{ }}{\hat{\upbeta}}_{{{02}}} {, }{\hat{\upbeta}}_{12} \} = \, \left\{ {{15}.{1863}, \, 0.{5}0{67}} \right\}\) and the sample variance of error is \({\hat{\upsigma }}_{2}^{2}\) = 54.3025. It is readily obtained that \({\hat{\upbeta}_D} = {\hat{\upbeta}}_{11} {-}{\hat{\upbeta}}_{12} = {-}0.00{87}\) and the pooled sample variance is \({\hat{\upsigma }}^{2}\) = 62.4320. The equivalence hypothesis testing of trend effect is presented as H0: βD ≤ –0.25 or 0.25 ≤ βD versus H1: –0.25 < βD < 0.25 (ΔM = 0 and Δ = 0.25). For ν = 16 and α = 0.05, the test statistic TDS = –0.0338 and the critical value τDS = 0.1048. Hence, the nonequivalence null hypothesis is rejected at the significance level 0.05. It suggests that the slope coefficient is virtually equivalent and their difference is within the range (–0.25, 0.25).
It is of practical importance to assess the simple effect or the mean response difference between two regression lines. At the particular predictor value XF = 50, the mean response difference is computed as \(\hat{\upmu }_{{\text{D}}} = \hat{\upmu }_{{1}} - \hat{\upmu }_{{2}}\) = –11.5161. For illustration, the equivalence thresholds is set as ΔM = –11 and Δ = 5 and the equivalence test of simple effect is conducted for the hypotheses H0: μD ≤ –16 or –6 ≤ μD versus H1: –16 < μD < –6. With ν = 16 and α = 0.05, the test statistic and critical value can be obtained as TDM = –0.1436 and τM = 0.1684, respectively. Consequently, the nonequivalence null hypothesis is rejected at the significance level 0.05 and the mean response difference is practically –11 with the threshold of 5 at XF = 50. In view of the limited features of available software packages, computer programs are developed to facilitate the usage of the proposed equivalence procedures for trend and simple effects.
3.4 Power and Sample Size Calculations
In order to elucidate the critical notion of accommodating the distributional properties of the predictor variables, the continuous covariate variables {X1i, i = 1, …, N1} and {X2j, j = 1, …, N2} are assumed to have the independent normal distributions N(μX1, \(\upsigma_{X1}^{2}\)) and N(μX2, σ \(_{X2}^{2}\)), respectively. It can be readily established that K1 = SSX1/\(\upsigma_{X1}^{2}\) ~ χ2(κ1) and K2 = SSX2/\(\upsigma_{X2}^{2}\)~ χ2(κ2) where κ1 = N1 – 1 and κ2 = N2 – 1.
Under the unconditional setting, the power function for trend effect is expressed as
Note that the critical value τDS depends on the two statistics \({\hat{\upsigma }}^{2}\) and HDS. With \({\hat{\upsigma }}^{2}\) = σ2(V/ν) and HDS = 1/(\(\upsigma_{X1}^{2}\) K1) + 1/(\(\upsigma_{X2}^{2}\)K2), the power function ΠDS can be rewritten as
where BDS = (ΔM – βD)/(σ2HDS)1/2 + τDS(V/νD)1/2, ADS = (ΔM – βD)/(σ2HDS)1/2 – τDS(V/νD)1/2, and E(K1, K2, V) is taken with respect to the joint distribution of K1, K2 and V.
Moreover, the normality assumptions of predictor variables imply that
where \(\hat{\upsigma}_{Xg}^{2}\) = SSXg/κg and λXg = (μXg – XF)/(\(\upsigma_{Xg}^{2}\)/Ng)1/2 for g = 1 and 2. Following the prescribed power function ΠDS, the power function for mean response difference is presented as
Note that the critical value τDM depends on the two terms \({\hat{\upsigma }}^{{2}}\) and HDM. With \({\hat{\upsigma }}^{{2}}\) = σ2(V/νD), HDM = 1/N1 + 1/N2 + \(T_{X1}^{2}\)/(κ1N1) + \(T_{X2}^{2}\) /(κ2N2), the power function has the alternative form
where BDM = (ΔM – μD)/(σ2HDM)1/2 + τDM(V/νD)1/2, ADM = (ΔM – μD)/(σ2HDM)1/2 – τDM(V/νD)1/2, and E(TX1, TX2, V) is taken with respect to the joint distribution of TX1, TX2 and V.
It is also temping to simplify the unconditional distributions for the equivalence test statistics for comparing slope coefficients and mean responses. Conceivably, a straightforward approach is to replace the two means {\({\overline{X}}_{{1}}\), \({\overline{X}}_{2}\)} and sum of squares {SSX1, SSX2} with the corresponding expected values E[\({\overline{X}}_{{1}}\)] = μX1, E[\({\overline{X}}_{2}\)] = μX2, E[SSX1] = κ1\(\upsigma_{X1}^{2}\), and E[SSX2] = κ2\(\upsigma_{X2}^{2}\). Thus, an approximate power function for the equivalence test of trend effect is
where λDSA = (βD – ΔM)/(σ2HDSA)\(^{1/2}\) and HDSA = 1/(κ1\(\upsigma_{X1}^{2}\)) + 1/(κ2\(\upsigma_{X2}^{2}\)). Moreover, the power function of equivalence test of simple effect is expressed as
where λDMA = (μD – ΔM)/(σ2HDMA)\(^{1/2}\) and HDMA = 1/N1 + 1/N2 + (μX1 – XF)2/(κ1\(\upsigma_{X1}^{2}\)) + (μX2 – XF)2/(κ2\(\upsigma_{X2}^{2}\)). Empirical examinations will be conducted to demonstrate the critical differences between the exact and approximate power functions using different levels of information of predictor variables.
3.5 Numerical Investigations
The model configurations of the first two groups of the training study in Huitema [21] provide a convenient framework for the subsequent simulation study of trend effect and simple effect. For illustration, the key statistics of response and predictor variables are treated as population parameters as potential settings of future investigations for power calculations and sample size determinations. Specifically, the regression coefficients are {β01, β11} = {4.1033, 0.4980}, {β02, β12} = {15.1863, 0.5067}, and common error variance σ2 = 62.4320. The means and variances of the two predictor variables are {μX1, \(\upsigma_{X1}^{2}\)} = {52.00, 223.7778} and {μX2, \(\upsigma_{X2}^{2}\)} = {47.00, 199.7778}.
Similar to the prescribed scenario of linear trend and mean response, numerical investigations contain the determination of optimal sample sizes and the simulation study of power calculations. Through the empirical examinations, the Type I error rate and nominal power are fixed as α = 0.05 and 1 – β = 0.80, respectively. First, the trend effect or the slope difference between two regression lines is βD = –0.0087. Thus, the equivalence tests of trend effect have ΔM = 0 and Δ = 0.2, 0.3, and 0.4 for the equivalence bounds. Second, the mean response of the two levels of treatment at XF = 50 are μ1 = 29.0040 and μ2 = 40.5200, respectively, and their difference is μD = –11.5161. Accordingly, the equivalence tests of simple effect are performed for ΔM = –11 and Δ = 4, 5, and 6. The optimal sample sizes of both exact approach and approximate method were determined for the chosen power value and significance level with balanced and unbalanced structures r = N1/N2 = 1 and 2. The computed sample sizes for the equivalence tests of trend effect and simple effect are presented in Tables 2 and 3, respectively. The results suggest the general pattern that the approximate formulas tend to give smaller sample sizes than the exact techniques. Balanced designs require fewer samples to achieve the nominal power than the unbalanced structures. Also, the computed sample size decreases with increasing threshold bound Δ.
To elucidate the accuracy of sample size calculations, Monte Carlo simulation study of 10,000 replications were conducted to obtain the simulated powers and they are compared to the estimated powers for the optimal sample sizes. These power values and associated errors are also presented in the tables. As can been from the reported deviations, the exact approaches of trend effect and simple effect maintain small errors in power computations. Whereas the approximate methods are not as good as the exact counterparts and their performance deteriorates as the sample size decreases. Specifically, the two errors associated with Δ = 0.4 are {–0.0301, –0.0360} and {–0.0172, –0.0157} in Tables 2 and 3, respectively. The overall usefulness of the approximate methods is affected by the undesirable properties of underestimation of sample sizes and over-calculation of power levels. According to the findings, the exact power functions and sample size procedures are recommended for general use. The implementation of the suggested power evaluation and sample size determination involves specialized programs not currently available in prevailing statistical packages. Thus, the accompanying computer algorithms are presented for conducting the suggested power and sample size calculations.
4 Conclusions
The concept and theory of equivalence have been widely practiced in pharmaceutical sciences and related medical fields. Equivalence testing procedures are also potentially useful in behavioral and psychological sciences. The technical intuition and computational simplicity of TOST provide an important motivation to apply appropriate statistical tools for equivalence assessment, rather than the traditional hypothesis tests that purport to detect whether treatment groups significantly differ from one another. Despite the ready applicability, the TOST is generally conservative and the true Type I error rate can be substantially less than the nominal level for close equivalence bounds and small sample sizes. In contrast, the Anderson and Hauck procedure and other more powerful equivalence tests always have a rejection region with reasonably controlled significance level.
Within the context of linear regressions, one and two regression lines represent two major scenarios of regression slope appraisal research. Accordingly, the TOST has been applied to assess whether the linear trend is practically negligible in ecological and environmental issues. In view of the potential limitation of TOST, this study presents extended Anderson and Hauck procedures for equivalence assessment in linear regression analysis. Specifically, equivalence tests are proposed for evaluating the linear trend and mean response of a single regression line, and the trend effect and simple effect between two regression lines. The hypotheses are constructed with asymmetric equivalence bounds and therefore, can be readily applied to all equivalence problems about regression slopes and mean responses.
Moreover, to enhance the usefulness of the suggested procedures, the advanced issues of power and sample size calculations are also investigated. The proposed power and sample size procedures are derived under the random regression framework and have the distinct features to account for the imbedded uncertainty of predictor variables. It is essential to note that the recommended approaches involve statistical evaluations and iterative algorithms not currently available in statistical package. A full set of computer programs are developed for implementing the suggested equivalence tests and sample size determinations. These research findings expand the conceptual understanding and theoretical development of Anderson and Hauck procedure for equivalence assessments in linear regression analysis.
Availability of Data and Materials
The data are presented in the article.
References
Cashen LH, Geiger SW (2004) Statistical power and the testing of null hypotheses: a review of contemporary management research and recommendations for future studies. Organ Res Methods 7:151–167. https://doi.org/10.1177/1094428104263676
Cortina JM, Folger RG (1998) When is it acceptable to accept a null hypothesis: no way, Jose? Organ Res Methods 1:334–350. https://doi.org/10.1177/109442819813004
Edwards JR, Berry JW (2010) The presence of something or the absence of nothing: increasing theoretical precision in management research. Organ Res Methods 13:668–689. https://doi.org/10.1177/1094428110380467
Frick RW (1995) Accepting the null hypothesis. Mem Cognit 23:132–138. https://doi.org/10.3758/bf03210562
Rogers JL, Howard KI, Vessey JT (1993) Using significance tests to evaluate equivalence between two experimental groups. Psychol Bull 113:553–565. https://doi.org/10.1037/0033-2909.113.3.553
Seaman MA, Serlin RC (1998) Equivalence confidence intervals for two-group comparisons of means. Psychol Methods 3:403–411. https://doi.org/10.1037/1082-989x.3.4.403
Stanton JM (2021) Evaluating equivalence and confirming the null in the organizational sciences. Organ Res Methods 24:491–512. https://doi.org/10.1177/1094428120921934
Stegner BL, Bostrom AG, Greenfield TK (1996) Equivalence testing for use in psychological and service research: an introduction with examples. Eval Program Plann 19:193–198. https://doi.org/10.1016/0149-7189(96)00011-0
Steiger JH (2004) Beyond the F test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychol Methods 9:164–182. https://doi.org/10.1037/1082-989x.9.2.164
Berger RL, Hsu JC (1996) Bioequivalence trials, intersection-union tests and equivalence confidence sets (with discussion). Stat Sci 11:283–319. https://doi.org/10.1214/ss/1032280304
Meyners M (2012) Equivalence tests-a review. Food Qual Prefer 26:231–245. https://doi.org/10.1016/j.foodqual.2012.05.003
Schuirmann DJ (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm 15:657–680. https://doi.org/10.1007/bf01068419
Schuirmann DL (1981) On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics 37:617
Westlake WJ (1981) Response to T.B.L. Kirkwood: bioequivalence testing–a need to rethink. Biometrics 37:589–594
Anderson S, Hauck WW (1983) A new procedure for testing equivalence in comparative bioavailability and other clinical trials. Communications in Statistics-Theory and Methods 12:2663–2692. https://doi.org/10.1080/03610928308828634
Hauck WW, Anderson S (1984) A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. J Pharmacokinet Biopharm 12:83–91. https://doi.org/10.1007/bf01063612
Dixon PM, Pechmann JHK (2005) A statistical test to show negligible trend. Ecology 86:1751–1756. https://doi.org/10.1890/04-1343
Schmidt BR, Meyer AH (2008) On the analysis of monitoring data: testing for no trend in population size. J Nat Conserv 16:157–163. https://doi.org/10.1016/j.jnc.2008.05.001
Counsell A, Cribbie RA (2015) Equivalence tests for comparing correlation and regression coefficients. Br J Math Stat Psychol 68:292–309. https://doi.org/10.1111/bmsp.12045
Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, vol 2. Wiley, New York
Huitema B (2011) The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies, vol 608. Wiley, New York, NY
Shieh G (2017) On tests of treatment-covariate interactions: An illustration of appropriate power and sample size calculations. PLoS ONE 12:e0177682. https://doi.org/10.1371/journal.pone.0177682
Shieh G (2020) Power analysis and sample size planning in ANCOVA designs. Psychometrika 85:101–120. https://doi.org/10.1007/s11336-019-09692-3
Kutner MH, Nachtsheim CJ, Neter J, Li W (2005) Applied linear statistical models, 5th edn. McGraw Hill, New York, NY
Rencher AC, Schaalje GB (2007) Linear models in statistics, 2nd edn. Wiley, Hoboken, NJ
Johnson PO, Neyman J (1936) Tests of certain linear hypotheses and their application to some educational problems. Stat Res Mem 1:57–93
Potthoff RF (1964) On the Johnson-Neyman technique and some extensions thereof. Psychometrika 29:241–256. https://doi.org/10.1007/bf02289721
Hunka S (1995) Identifying regions of significance in ANCOVA problems having non-homogeneous regressions. Br J Math Stat Psychol 48:161–188. https://doi.org/10.1111/j.2044-8317.1995.tb01056.x
Rogosa D (1980) Comparing nonparallel regression lines. Psychol Bull 88:307–321. https://doi.org/10.1037/0033-2909.88.2.307
Spiller SA, Fitzsimons GJ, Lynch JG Jr, McClelland GH (2013) Spotlights, floodlights, and the magic number zero: simple effects tests in moderated regression. J Mark Res 50:277–288. https://doi.org/10.1509/jmr.12.0420
Funding
Open Access funding enabled and organized by National Yang Ming Chiao Tung University. This work was supported by a grant from the Ministry of Science and Technology (MOST-111–2410-H-A49-034-MY3).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shieh, G. The Extended Anderson and Hauck Tests and Sample Size Procedures for Equivalence Assessment in Simple Linear Regressions. J Stat Theory Pract 18, 36 (2024). https://doi.org/10.1007/s42519-024-00382-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s42519-024-00382-7