The Extended Anderson and Hauck Tests and Sample Size Procedures for Equivalence Assessment in Simple Linear Regressions

Shieh, Gwowen

doi:10.1007/s42519-024-00382-7

The Extended Anderson and Hauck Tests and Sample Size Procedures for Equivalence Assessment in Simple Linear Regressions

Original Article
Open access
Published: 26 June 2024

Volume 18, article number 36, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

The Extended Anderson and Hauck Tests and Sample Size Procedures for Equivalence Assessment in Simple Linear Regressions

Download PDF

Gwowen Shieh ORCID: orcid.org/0000-0001-8611-4495¹

399 Accesses
Explore all metrics

Abstract

This study describes extended Anderson and Hauck procedures for equivalence testing of slope coefficients and mean responses in one and two regression lines. The general formulation of asymmetric equivalence ranges permits a wide variety of equivalence questions to be tested for a target magnitude or a negligible value. Specifically, the equivalence tests are useful for assessing negligible trend and similar response in a single regression line, and for evaluating unimportant interaction-moderation effect and comparable simple effect between two linear regression lines. The associated power functions and sample size procedures are also derived and compared under the random and fixed model settings. According to the analytic justification and empirical assessment, the exact approaches have a clear advantage over the approximate formulas for accommodating the full stochastic nature of both the response and predictor variables. Computer algorithms are also provided for conducting the proposed equivalence tests, power calculations, and sample size determinations in simple linear regressions.

Moderator effects differ on alternative effect-size measures

Article 29 April 2016

Effect size measures for multilevel models: definition, interpretation, and TIMSS example

Article Open access 23 July 2018

semPower: General power analysis for structural equation models

Article Open access 10 November 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Many studies are designed explicitly to show that there is an absence of effects of competing scenarios or theories. However, they sometimes base their findings on failing to reject a null hypothesis rather than confirming a hypothesis of equivalence. For comparison of treatment effects, the traditional hypothesis test of difference aims to determine whether the treatment effects differ from one another. Under such condition, the traditional difference tests are inappropriate to establish equivalence, because failing to reject a no-difference hypothesis test does not necessarily support the conclusion of equivalence. There has been a growing awareness and demand of appropriate techniques for assessing equivalence and similarity in the behavioral and managerial literature. For example, related discussions of theoretical perspectives and practical issues can be found in Cashen and Geifer [1], Cortina and Folger [2], Edward and Berry [3], Frick [4], Rogers, Howard, and Vessey [5], Seaman and Serlin [6], Stanton [7], Stegner, Bostrom, and Greenfield [8], and Steiger [9], among others.

To assess an observed effect size that is clinically negligible or practically non-important, the recommended equivalence test is to ascertain whether the observed effect size falls inside the selected equivalence range. The technical discussion and fundamental review of different types of mean equivalence tests were presented in Berger and Hsu [10], Meyners [11], and Schuirmann [12]. Despite there are more powerful tests, two prominent procedures have received considerable attention in the literature. They are the two one-sided tests (TOST) method of Schuirmann [13] and Westlake [14] and the equivalence approach of Anderson and Hauck [15] and Hauck and Anderson [16]. These two procedures of mean equivalence admit a simple methodological reform for assessing equivalence. Their flexible settings allow generalizations to more complex experimental designs. Accordingly, Dixon and Pechmann [17], and Schmidt and Meyer [18] have extended the TOST to assess whether the linear trend is practically negligible in linear regressions. Also, Counsell and Cribbie [19] described an extension of the Anderson and Hauck procedure for comparing the slope coefficients of two regression lines.

Despite the conservative nature, TOST maintains a good control of Type I error rate at the specified level. However, the actual Type error rate of TOST can be substantially less than the nominal level and the rejection region can be empty when the equivalence ranges are narrow, particularly with small sample sizes. Across the practical and diverse research designs for equivalence assessment, the undertaken equivalence bounds and associated sample sizes may not be all that large. Under such circumstances, it is of methodological concern to consider alternative procedures with proper rejection region and good Type I error control. On the other hand, the normal approximation presented in Counsell and Cribbie [19] for p-value calculations is only one of the three possible methods proposed in Anderson and Hauck [15]. Following the results of an extensive simulation study, Anderson and Hauck [15] recommended the central-t approach, instead of the least accurate normal approximation. In view of the absence of vital clarification for theory development and supportive technique, it is desirable to properly generalize the Anderson and Hauck procedure for linear regression analysis.

The present article aims to contribute to the development of equivalence methodology for linear regressions in three aspects. First, using the central-t approximation, extended Anderson and Hauck procedures are presented for equivalence testing of slope coefficient and mean response in one and two regression lines. The general formulation of asymmetric equivalence ranges permits a wide range of equivalence questions to be tested. Consequently, they are useful for assessing negligible trend and similar response in a single regression line, and for evaluating unimportant interaction-moderation effect and comparable simple effect between two linear regression lines. Second, the associated power functions and sample size procedures are also derived and compared under the random and fixed model settings. According to the analytic justification and empirical assessment, the exact approaches have a clear advantage over the approximate formulas for accommodating the full stochastic nature of both the response and predictor variables. It should be noted that exact power and sample size calculations were not addressed in Counsell and Cribbie [19]. Third, the proposed equivalence techniques are not available in popular software packages. Computer algorithms are provided for critical value computations, power calculations, and sample size determinations of the extended Anderson and Hauck procedures. The suggested power and sample size calculations should be useful for planning equivalence studies about the much-discussed appraisals of interaction-moderation effect and simple effect in behavioral and management research.

2 Single Regression Line

The simple linear regression model is of the form

$$Y_{i} = {\upbeta }_{0} + X_{i} {\upbeta }_{{1}} + \varepsilon_{i},$$

(1)

where Y_i is the response score of the ith subject, β₀ is the intercept, β₁ is the slope coefficient, X_i is the predictor score of the ith subject, and ε_i are iid N(0, σ²) random variables, i = 1, …, N. The least squares estimator ${\hat{\upbeta}}_{{1}}$ of slope coefficient β₁ has the following distribution

$$\hat{\beta} _{1} \sim N(\beta _{1} ,\sigma ^{2} /SSX),$$

(2)

where $SSX = \sum\nolimits_{{i = 1}}^{N} {(X_{i} - {\bar{X}})^{2} }$ and ${\bar{X}} = \sum\nolimits_{{i = 1}}^{N} {X_{i} /N}$. Also, ${\hat{\upsigma}}^{2} = SSE/\nu$ is the usual unbiased estimator of σ² where SSE is the error sum of squares and ν = N – 2. Moreover, V = SSE/σ² ~ χ²(ν), where χ²(ν) are chi-square distribution with ν degrees of freedom.

To detect the difference of slope coefficient in terms of H₀: β₁ = β₁₀ versus H₁: β₁ ≠ β₁₀, the test statistic has the form

$$T_{{S0}} = \frac{\hat{\beta}_{1} - \beta_{10}}{(\hat{\sigma}^2/SSX)^{1/2}}$$

(3)

The null hypothesis is rejected at the significance level α if

$$|T_{S0} | \, > t_{\nu ,\alpha /2}$$

(4)

where $t_{\nu ,\alpha /2}$ is the 100(1 – α/2) percentile of t(ν) and t(ν) is a t distribution with degrees of freedom ν.

2.1 Equivalence Test of Linear Trend

The primary focus of this article is the test of equivalence, the null and alternative hypotheses are expressed as

$${\text{H}}_{0} :{\upbeta }_{{1}} \le \Delta_{L} {\text{ or }}\Delta_{U} \le {\upbeta }_{{1}} {\text{versus H}}_{{1}} :\Delta_{L} < {\upbeta }_{{1}} < \Delta_{U},$$

(5)

where Δ_L and Δ_U are a priori constants that represent the minimal range for declaring equivalence effect size. The hypotheses with asymmetric equivalence thresholds can be readily rewritten in terms of symmetric equivalence bounds as

$${\text{H}}_{0} :\upbeta_{1}^{*} \le {-}\Delta {\text{ or }}\Delta \le \upbeta_{1}^{*} {\text{ versus H}}_{{1}} : \, {-}\Delta < \upbeta_{1}^{*} < \Delta ,$$

(6)

where $\upbeta_{1}^{*}$ = β₁ – Δ_M, Δ_M = (Δ_L + Δ_U)/2, and Δ = (Δ_U – Δ_L)/2. An important scenario is to detect a negligible trend by setting Δ_U = Δ and Δ_L = – Δ so that Δ_M = 0 for a bound Δ.

For the given value of the predictor quantity SSX, it is essential to note that

$$T_{S} = \frac{\hat{\upbeta}_{1} - \Delta_M } { ({\hat{\upsigma}}^{2} /SSX)^{1/2} } \sim t(\nu ,\lambda s),$$

(7)

where t(ν, λ_S) is the noncentral t distribution with degrees of freedom ν and noncentrality parameter λ_S = (β₁ – Δ_M)/(σ²/SSX)^1/2. To claim the slope coefficient β₁ is within the interval (Δ_L, Δ_U), a natural rejection region to the null hypothesis is

$$\{ \uptau_{SL} < T_{S} < \uptau_{SU} \} ,$$

where the two critical values τ_SL and τ_SU are chosen to simultaneously attain the nominal Type I error rate

$$P\{ \uptau_{SL} < T_{S} < \uptau_{SU} |{\upbeta }_{{1}} = \Delta_{L} \} \, = \alpha {\text{ and }}P\{ \uptau_{SL} < T_{S} < \uptau_{SU} |{\upbeta }_{{1}} = \Delta_{U} \} \, = \alpha .$$

Following the properties of a noncentral t distribution as in Johnson, Kotz and Balakrishnan [20], it can be shown that the two conditions can be simultaneously satisfied by the choice of critical values τ_SL = –τ_S and τ_SU = τ_S where τ_S > 0. Hence, the rejection region is of the form

$$AH_{S} = \, \{ {-}\uptau_{S} < T_{S} < \uptau_{S} \} ,$$

(8)

where τ_S is determined by the condition P{–τ_S < T_S < τ_S| β₁ = Δ_L} = α or P{–τ_S < T_S < τ_S| β₁ = Δ_U} = α. Note that the error variance is generally unknown and the exact distribution of T_S cannot be specified. Following the suggestion in Anderson and Hauck [15], a feasible and accurate approach is to find the critical value τ_S through the approximation $T_{S}\text{ } \dot\sim \text{ }T + {\hat{\lambda }}_{S}$ where $T\sim t(\nu )$ , ${\hat{\lambda }}_{S} = \Delta /({\hat{\upsigma }}^{2} /SSX)^{{{1}/{2}}}$, and

$$P\{ {-}\uptau_{S} < T + {\hat{\lambda }} < \uptau_{S} \} \, = \alpha.$$

(9)

Thus, the optimal quantity τ_S can be computed by a simple iterative search. Note that the critical value τ_S is a function of α, Δ, N, ${\hat{\upsigma }}^{{2}}$, and SSX. It does not have an explicit analytic expression and requires a computer program to calculate the actual value. An efficient algorithm is developed for computing the critical value and rejection region for the suggested procedure. Also, the p-value associated with the observed slope estimate ${\hat{\upbeta}}_{1O}$ can be calculated as

$$p{\text{-value }} = P\{ {-}\left| {T_{O} } \right| \, {-}{\hat{\lambda }}_{S} < T < \, \left| {T_{O} } \right| \, {-}{\hat{\lambda }}_{S} \},$$

(10)

where $T_{O} = ({\hat{\upbeta}}_{{{1}O}} {-}\Delta_{M} )/({\hat{\upsigma }}^{2} /SSX)^{{{1}/{2}}}$. It is apparent that the p-value is computationally easier to obtain than the critical value.

Note that similar discussion was described in Anderson and Hauck [15] for testing two-group mean equivalence. Because of the computational ease of the p-value, they recommend the p-value approach to conclude the decision. Hence, they did not address the calculation and implementation issues of the rejection region and corresponding power function. Accordingly, the sample size procedure for mean equivalence in Hauck and Anderson [16] is less transparent and cannot be readily adopted as a general tool in linear regressions. Moreover, the Anderson and Hauck procedure has an unbounded rejection region as other more powerful tests. The counterintuitive rejection of nonequivalence with arbitrarily large values of sample variance has been debated extensively in Berger and Hsu [10] and the discussions therein. As a constructive response, they proposed to specify an upper bound on the sample variance beyond which the null hypothesis will never be rejected. Moreover, unlike the TOST, the advantage of the Anderson and Hauck procedure in the Type I error protection for small sample sizes and tight equivalence bounds should also be taken into consideration. The contrasting behavior of the two test procedures is also demonstrated in the subsequent numerical examples.

2.2 Equivalence Test of Mean Response

The equivalence appraisal can also be applied to the mean response μ = β₀ + Xβ₁ at a focal predictor value X_F. The null and alternative hypotheses are presented as

$$\text{H}_0 :\upmu \le \Delta_{L} {\text{ or }}\Delta_{U} \le \upmu \;{\text{versus}}\;\text{H}_1 :\Delta_{L} < \upmu < \Delta_{U},$$

(11)

where Δ_L and Δ_U are a priori constants that represent the threshold range for declaring practical equivalence. With the least squares estimators $({\hat{\upbeta}}_{0} , \, {\hat{\upbeta}}_{1} )$ of (β₀, β₁), the linear estimator $\hat{\upmu } \, = \, {\hat{\upbeta}}_{0} + {X_{F}\hat{\upbeta}}_{1}$ has the distribution

$$\hat{\upmu }\sim N(\upmu ,\upsigma^{{2}} H_{M} ),$$

(12)

where $H_{M} = { 1}/N + \, (X_{F} {-}\overline{X})^{2} /SSX$. It is useful to note that

$$T_{M} = \frac{\hat{\upmu } - \Delta_{M}} { (\hat{\sigma}^2H_M ) ^{1/2}}\sim t(\nu ,\lambda_{M} ),$$

(13)

where the noncentrality parameter λ_M = (μ – Δ_M)/(σ²H_M)^1/2 and Δ_M = (Δ_L + Δ_U)/2.

Following the same principle for slope coefficient assessment, a potential rejection region to the null hypothesis is of the form

$$AH_{M} = \, \{ {-}\uptau_{M} < T_{M} < \uptau_{M} \} ,$$

(14)

where the critical value τ_M is chosen to attain the nominal Type I error rate when μ = Δ_L and Δ_U. The proposed approach is to find the critical value through the approximate evaluation

$$P\{ {-}\uptau_{M} < T + {\hat{\lambda }}_{M} < \uptau_{M} \} \, = \alpha$$

(15)

where T ~ t(ν), ${\hat{\lambda }}_{M} = \Delta /({\hat{\upsigma }}^{2} H_{M} )^{1/2}$, and Δ = (Δ_U – Δ_L)/2. Note that the critical value τ_M is a function of α, Δ, N, ${\hat{\upsigma }}^{2}$, and H_M. Moreover, an iterative algorithm is required to compute the critical value.

2.3 A Numerical Example

The numerical details for the equivalence tests of slope coefficient and mean response are demonstrated with the data of training study described in Table 6.1 of Huitema [21] about the relation between the response variable (Y: achievement) and the predictor variable (X: aptitude) for three types of training program.

For the first training group with N = 10, the sample means of the predictor and response variables are $\overline{X} = 52.00$ and ${\overline{Y}} = 30.00$, respectively. Moreover, the least squares estimates of the linear regression line between achievement and aptitude measurements are obtained as $\{ {\hat{\upbeta}}_{0}, \, {\hat{\upbeta}}_{1}\} \, = \, \{ 4.1033, \, 0.4980\}$, and the sample variance of error is ${\hat{\upsigma }}^{{2}} { = 7}0.{5615}$. For illustration, an equivalence test of slope coefficient is performed in terms of H₀: β₁ ≤ 0.25 or 0.75 ≤ β₁ versus H₁: 0.25 < β₁ < 0.75 (Δ_M = 0.50 and Δ = 0.25). With SSX = 2014.00 and α = 0.05, the test statistic and critical value are computed as T_S = –0.0106 and τ_S = 0.1598, respectively. Thus, the nonequivalence null hypothesis is rejected at the significance level 0.05. The conclusion indicates that the slope coefficient is essentially equivalent to 0.50 with no more than 0.25 difference.

The equivalence test of mean response can also be performed with the estimated mean response $\hat{\upmu }{ = 29}.00{4}0$ at X_F = 50. Using Δ_M = 29 and Δ = 4, the equivalence test of mean response is conducted in terms of H₀: μ ≤ 25 □or 33 ≤ μ versus H₁: 25 < μ < 33. The test statistic and critical value can be computed as T_M = 0.0015 and τ_M = 0.1966, respectively for α = 0.05. Hence, the nonequivalence null hypothesis is rejected at the significance level 0.05. The analysis suggests that the mean response at X_F = 50 is nearly within a bound of 4 around 29. Moreover, it can be shown that the resulting rejection regions of the TOST procedures are empty sets and there is no chance to reject the nonequivalence null hypothesis of the slope coefficient and mean response. Apparently, the TOST approach may not be a reliable procedure when the sample size is small, especially for a tight equivalence range. Such deficiency agrees with the explication of TOST for assessing mean equivalence in Schuirmann [12].

2.4 Power and Sample Size Calculations

When planning and conducting a research, the actual values of the continuous measurements of response and predictor variable for each subject are available only after the observations are obtained. In addition to the randomness of normal responses, the stochastic nature of predictor variables has to be taken into account in power analysis under the random and unconditional context in linear regression study. A useful and convenient framework is to assume the continuous predictor variables {X_i, i = 1, …, N} have the independent and identical normal distribution N(μ_X, $\upsigma_{X}^{2}$) as in Shieh [22, 23] within the context of ANCOVA.

Under the prescribed stochastic consideration of {X_i, i = 1, …, N}, it can be readily established that K = SSX/$\upsigma_{X}^{2}$ ~ χ²(κ) where κ = N – 1. The power function of the equivalence procedure for slope coefficient can be expressed as

$$\Pi_{S} = P\{ {-}\uptau_{S} < T_{S} < \uptau_{S} |\Delta_{L} < \upbeta_{{1}} < \Delta_{U} \} .$$

(16)

Note that the critical value τ_S depends on the two quantities ${\hat{\upsigma}}^{2}$ and SSX. With ${\hat{\upsigma}}^{2} = \upsigma^{{2}} (V/\nu )$ and $Hs = { 1}/SSX = { 1}/(\upsigma_{X}^{2} K)$, the power function Π_S can be rewritten as

$$\Pi_{S} = E_{(K,V)} [\Phi \left( {B_{S} } \right) \, {-}\Phi \left( {A_{S} } \right)],$$

(17)

where B_S = (Δ_M – β₁)/(σ²Hs)^1/2 + τ_S(V/ν)^1/2, A_S = (Δ_M – β₁)/(σ²Hs)^1/2 – τ_S(V/ν)^1/2, Φ(⋅) is the cumulative density function of the standard normal distribution, and the expectation E_(K,V) is taken with respect to the chi-square distributions of K and V.

Under the random predictor framework, the normality assumption implies that

$$T_{X} = \frac{{{\bar{X}} - X_{F} }}{{{\mkern 1mu} ({\widehat{\sigma}}_{X}^{2} /N)^{{1/2}} }}\sim t(\kappa,\lambda _{X} ),$$

(18)

where ${{\hat{\upsigma }}}_{X}^{2} = SSX/\upkappa$ and $\lambda _{{\text{X}}} = (\mu _{{\text{X}}} - X_{{\text{F}}} )/(\sigma _{{\text{X}}}^{2} /N)^{{1/2}}$ Also, the power function of the equivalence procedure for mean response is of the form

$$\Pi_{M} = P\{ {-}\uptau_{M} < T_{M} < \uptau_{M} |\Delta_{L} < \upmu < \Delta_{U} \}.$$

(19)

In this case, the critical value τ_M depends on the two terms ${\hat{\upsigma}}^{2}$ and H_M. With ${\hat{\upsigma}}^{2} = \upsigma^{{2}} (V/\nu )$ and $H_{M} = \, 1/N + T_{X}^{2} /(\upkappa N)$, it follows that the power function Π_M can be expressed as

$$\Pi_{M} = E_{(TX,V)} [\Phi (B_{M} ) \, {-}\Phi (A_{M} )],$$

(20)

where B_M = (Δ_M – μ)/(σ²H_M)^1/2 + τ_M(V/ν)^1/2, A_M = (Δ_M – μ)/(σ²H_M)^1/2 – τ_M(V/ν)^1/2, and E_(TX,V) is taken with respect to the joint distribution of T_X and V.

The prescribed power functions Π_S and Π_M for slope coefficient and mean response involve a mixture of noncentral t distributions through the distribution K and T_X of the predictor variables, respectively. It is appealing to simplify these power functions because of computational complexity. Under the normal assumption N(μ_X, $\upsigma_{X}^{2}$) for the predictors {X_i, i = 1, …, N}, the standard results show that $E[\overline{X}] = \upmu_{X}$ and E[SSX] = $\upkappa \upsigma_{X}^{2}$. Hence, an approximation of unconditional distribution can be obtained for the test statistic $T_{S}\text{ } \dot\sim \text{ } t(\nu ,\lambda_{SA} )$ where λ_SA = (β₁ – Δ_M)/(σ²H_SA)$^{1/2}$ and H_SA = 1/($\upkappa \upsigma_{X}^{2}$). It yields a simplified power function for the equivalence test of linear trend

$$\Pi_{SA} = P\{ {-}\uptau_{S} < t(\nu ,\lambda_{SA} ) \, < \uptau_{S} \} .$$

(21)

Moreover, following similar arguments, the test statistic of mean response has the approximate distribution $T_{M} {\dot{\sim}} t(\nu ,\lambda _{{MA}} )$ where λ_MA = (μ – Δ_M)/(σ²H_MA)$^{1/2}$ and H_MA = 1/N + (μ_X – X_F)²/($\upkappa \upsigma_{X}^{2}$). Then, an approximate power function for the equivalence test of mean response is denoted by

$$\Pi_{MA} = P\{ {-}\uptau_{M} < t(\nu ,\lambda_{MA} ) \, < \uptau_{M} \} .$$

(22)

The approximate power functions of the equivalence procedures provide computational shortcuts to the exact formulas. The simple formulations can be readily implemented with the embedded probability functions of a noncentral t distribution in standard software systems. On the other hand, the prescribed analytic justifications provide statistical support for the exact power functions. An immediate application of the power functions is to compute optimal sample sizes needed for the equivalence procedure to attain the specified power under the designated model configurations. The fundamental discrepancy between the exact and simplified power and sample size calculations will be further assessed in the succeeding numerical investigations.

2.5 Numerical Assessments

As an exemplifying framework, the model configurations follow that of the prescribed training study in Huitema [21]. Accordingly, the sample estimated of regression coefficients and variance component of the first training group are designated the working configurations: {β₀, β₁} = {4.1033, 0.4980}, and σ² = 70.5615, respectively. The mean and variance of the normal predictors are chosen as {μ_X, $\upsigma_{X}^{2}$} = {52.00, 223.7778}. The equivalence thresholds (Δ_L, Δ_U) are defined as Δ_L = Δ_M – Δ, Δ_U = Δ_M + Δ, and various magnitudes of Δ_M and Δ are evaluated. For the equivalence tests of linear trend, the selected values are Δ_M = 0.5 with Δ = 0.2, 0.3, and 0.4. The equivalence tests of mean response are examined at X_F = 50 with Δ_M = 29 for μ = 29.0040 under three equivalence bounds Δ = 4, 5, and 6.

With these specifications, the required sample sizes of both exact and approximate methods were computed for the chosen power value 1 – β = 0.80 and significance level α = 0.05. The estimated sample sizes for the equivalence tests of linear trend and mean response are presented in Table 1. Note that the resulting sample sizes cover a reasonable range of magnitudes without being unrealistic or excessively large. More importantly, the estimated sample sizes of the exact approach are consistently larger than or equal to those of the approximate procedure for all 6 cases. For ease comparing the accuracy of power functions, the estimated power or attained power are also summarized in Table 1. Because of the underlying metric of integer sample sizes, the estimated values of both exact and approximate procedures are marginally larger than the nominal level for all cases.

Table 1 Computed sample size, estimated power, and simulated power for the Anderson and Hauck test of linear trend and mean response at X_F = 50 when Type I error α = 0.05 and nominal power 1 – β = 0.80

Full size table

In the second stage, Monte Carlo simulation studies were performed to justify the performance of power and sample size calculations. With the computed sample sizes, parameter configurations, and nominal alpha level, estimates of the true power were computed via Monte Carlo simulation of 10,000 independent data sets. For each replicate, the sample size N predictor values were generated from the selected normal distributions. The outcome values of predictor variables are then designated to determine the mean responses for generating the normal responses with the specified linear regression model. Next, the equivalence test statistics were computed and the simulated power was the proportion of the 10,000 replicates whose null hypothesis was rejected at the significance level 0.05. Accordingly, the adequacy of the approximate and exact sample size procedures is determined by the error (= simulated power – estimated power) between the simulated power of Monte Carlo study and the estimated power computed from analytic power function. The simulated power and error are also presented in Table 1.

The results reveal that the exact approaches are extremely accurate because the associated errors of the 6 cases are all within the small range of –0.0055 to 0.0075. Accordingly, there exists a close agreement between the simulated power and the estimated power of the exact approaches for these settings. On the other hand, the simulated powers for the approximate methods are constantly less than the estimated powers. Specifically, the resulting errors are {–0.0167, –0.0210, –0.0306} and {–0.0057, –0.0069, –0.0177} for the linear trend and mean response, respectively. Although some of the differences are not substantial, it implies that the approximate power functions do not give reliable results for small sample sizes. In short, the adequacy of the approximate power and sample size calculations varies with model configurations. It is clear that the exact techniques are more reliable and accurate than the approximate methods for all cases of linear trend and mean response considered here.

3 Two Regression Lines

The two-group nonparallel simple linear regression model is expressed as

$$Y_{{{1}i}} = \upbeta_{{0{1}}} + X_{{{1}i}} \upbeta_{{{11}}} + \varepsilon_{{{1}i}} {\text{ and }}Y_{{{2}j}} = \upbeta_{{0{2}}} + X_{{{2}j}} \upbeta_{{{12}}} + \varepsilon_{{{2}j}} ,$$

(23)

where ε_1i and ε_2j are iid N(0, σ²) random variables, i = 1, …, N₁, and j = 1, …, N₂. Note that a traditional ANCOVA model assumes that the regression slopes are equivalent β₁₁ = β₁₂. Accordingly, a test of slope equality is generally required to justify the use of ANCOVA.

Standard results that the least squares estimators ${\hat{\upbeta}}_{{{11}}}$ and ${\hat{\upbeta}}_{{{12}}}$ of slope coefficients β₁₁ and β₁₂ have the following distributions

$${\hat{\upbeta}}_{11}\sim N(\upbeta_{{{11}}} ,\upsigma^{{2}} /SSX_{{1}} ){\text{ and }}{\hat{\upbeta}}_{12}\sim N(\upbeta_{{{12}}} ,\upsigma^{{2}} /SSX_{{2}} ),$$

where $SSX_{1} = \sum\nolimits_{i = 1}^{{N_{1} }} {(X_{{{1}i}} - \overline{X}} )^{2},$$SSX_{2} = \sum\nolimits_{j = 1}^{{N_{2} }} {(X_{{{2}j}} \!-\! {\overline{X}}_{2} } )^{2},{\overline{X}}_{{1}} \! =\! \sum\nolimits_{i = 1}^{{N_{1} }}, {X_{{{1}i}} /N_{{1}} }$ and ${\overline{X}}_{{2}} = \sum\nolimits_{i = 1}^{{N_{2} }} {X_{{{2}i}} /N_{{2}} }.$ The difference of two slope estimators has the distribution

$${\hat{\upbeta}_D} = {\hat{\upbeta}}_{{{11}}} { - } {\hat{\upbeta}}_{{{12}}} \sim N\{ \upbeta_{D} ,\upsigma^{{2}} H_{DS} \} ,$$

(24)

where β_D = β₁₁ – β₁₂ and H_DS = 1/SSX₁ + 1/SSX₂. In this case, ${\hat{\upsigma }}^{{2}} = SSE/\nu_{D}$ is the usual unbiased estimator of σ² and V = SSE/σ² ~ χ²(ν_D) where SSE is the error sum of squares and ν_D = N₁ + N₂ – 4.

To detect the difference between two slope coefficients in terms of H₀: β_D = β_D0 versus H₁: β_D ≠ β_D0, the test statistic has the form

$$T_{DS0} = \frac{ {\hat{\upbeta}_D} - \upbeta_{D0}}{{{ (}{\hat{\upsigma }}^{{2}} {H}_{{{DS}}} {)}^{1/2} }}$$

(25)

The null hypothesis is rejected at the significance level α if

$$|T_{DS0} | \, > t_{\nu_{D} \text{, } \alpha /2}$$

(26)

3.1 Equivalence Test of Trend Effect

To conduct equivalence test of trend effect or slope difference, the null and alternative hypotheses are expressed as

$$\text{H}_0 :\upbeta_{D} \le \Delta_{L} \text{ or } \Delta_{U} \le \upbeta_{D} \text{ versus } \text{H}_1 :\Delta_{L} < \upbeta_{D} < \Delta_{U},$$

(27)

where Δ_L and Δ_U are a priori constants that denote the minimal magnitude for declaring equivalence for trend effect. Under the model assumption, it follows that

$$T_{{DS}} = \frac{ \hat{\beta }_{D} - \Delta _{M}} {{(\hat{\sigma }^{2} H_{{{DS}}} )^{{1/2}} }}\sim t(\nu _{D} ,\lambda _{{DS}} ),$$

(28)

where the noncentrality parameter λ_DS = (β_D – Δ_M)/(σ²H_DS)^1/2 and Δ_M = (Δ_L + Δ_U)/2. To justify the slope difference β_D is within the interval (Δ_L, Δ_U), a feasible rejection region to the null hypothesis is

$$AH_{{DS}} = \{ - \tau _{{DS}} < T_{{DS}} < \tau _{{DS}} \} ,$$

(29)

where the critical value τ_DS is chosen to simultaneously attain the nominal Type I error rate when β_D = Δ_L and Δ_U. In practice, the exact distribution of T_DS is practically unknown and the critical value τ_DS can be determined through the approximation

$$P\{ - {\uptau }_{{DS}} < T + \hat{\lambda }_{{DS}} < {\uptau }_{{DS}} \} = \alpha ,$$

(30)

where T ~ t(ν_D), $\hat{\lambda }$ _DS = Δ/($\hat{\sigma }^{{2}}$ H_DS)^1/2, and Δ = (Δ_U – Δ_L)/2. The optimal quantity τ_DS is a function of α, Δ, N₁, N₂, ${\hat{\upsigma }}^{2}$, and H_DS. Although the critical value does not have a closed-form expression, it can be computed by a simple iterative search.

As emphasized in Huitema [21], Kutner et al. [24], Rencher and Schaalje [25], and related texts of research methods, the traditional ANCOVA assumes that the slope coefficients associating the predictor variables with the response variables are the same for each treatment group. The assertion of homogeneous regression slopes implies a lack of interaction effects between a categorical moderator and a continuous predictor in moderation study. Note that the conventional difference test purports to show the regression lines are nonparallel. Hence, the suggested equivalence procedure for trend effect is more appropriate for supporting the equality or comparability of slope coefficients assumption in ANCOVA.

3.2 Equivalence Test of Simple Effect

A related and practical scheme for comparing two regression lines is to assess the difference between two mean responses at a designated predictor value. The simple effect or the mean response difference between two regression lines at X_F is defined as

$$\upmu_{D} = \upmu_{{1}} {-}\upmu_{{2}} = \, (\upbeta_{{0{1}}} {-}\upbeta_{{0{2}}} ) \, + X_{F} (\upbeta_{{{11}}} {-}\upbeta_{{{12}}} )$$

(31)

The equivalence test of simple effect is conducted under the null and alternative hypotheses:

$${\text{H}}_{0} :\upmu_{D} \le \Delta_{L} {\text{ or }}\Delta_{U} \le \upmu_{D} {\text{ versus H}}_{{1}} :\Delta_{L} < \upmu_{D} < \Delta_{U} ,$$

(32)

where Δ_L and Δ_U are a priori constants that represent the minimal threshold for declaring essential equivalence.

Using the least squares estimators {${\hat{\upbeta}}_{{{01}}}$, ${\hat{\upbeta}}_{11}$ ${\hat{\upbeta}}_{{{02}}}$, ${\hat{\upbeta}}_{12}$} of for the intercept and slope coefficients {β₀₁, β₁₁, β₀₂, β₁₂}, the estimated mean response $\hat{\upmu }_{{1}}$ and $\hat{\upmu }_{{2}}$ for mean values μ₁ = β₀₁ + Xβ₁₁ and μ₂ = β₀₂ + Xβ₁₂ at a specified value X_F are

$$\hat{\upmu }_{1} = {\hat{\upbeta}}_{{{01}}} { + }X_F{\hat{\upbeta}}_{{11}} \, \text{and}\, \hat{\upmu }_{2} = {\hat{\upbeta}}_{02} { + }X_F{\hat{\upbeta}}_{{12}} $$

respectively. A natural and unbiased estimator of μ_D is $\hat{\upmu }_{{D}} = \hat{\upmu }_{1} - \hat{\upmu }_{2}$ and

$$\hat{\upmu }_{D} \sim N(\upmu_{D,}\text{ } \sigma^{2} H_{DM} ),$$

(33)

where H_DM = 1/N₁ + 1/N₂ + (X_F – ${\overline{X}}_{{1}}$)²/SSX₁ + (X_F –${\overline{X}}_{{2}}$)²/SSX₂. It is important to note under the model assumption that

$$T_{{DM}} = \frac{\hat{\mu}_D - \Delta_M} {(\hat{\sigma }^2 H_{DM} )^{1/2} } \sim t(\nu _{D} ,\lambda _{{DM}} ),$$

(34)

where the noncentrality parameter λ_DM = (μ_D – Δ_M)/(σ²H_DM)^1/2 and Δ_M = (Δ_L + Δ_U)/2. To evaluate whether the simple effect μ_D is within the interval (Δ_L, Δ_U), the suggested rejection region is

$$AH_{DM} = \, \{ {-}\uptau_{DM} < T_{DM} < \uptau_{DM} \} ,$$

(35)

where the critical value τ_DM is chosen to simultaneously attain the nominal Type I error rate when μ_D = Δ_L and Δ_U. The assessments can be calculated through the approximation

$$P\{ {-}\uptau_{DM} < T + {\hat{\lambda }}_{DM} < \uptau_{DM} \} \, = \alpha ,$$

(36)

where T ~ t(ν_D), $\hat{\lambda }$ _DM = Δ/(${\hat{\upsigma }}^{2}$ H_DM)^1/2, and Δ = (Δ_U – Δ_L)/2. The optimal quantity τ_DM is a function of α, Δ, N₁, N₂, ${\hat{\upsigma }}^{2}$, and H_DM, and it needs to be calculated by an iterative search algorithm.

It should be noted that the equivalence analysis of simple effect or response difference between two linear regression lines is closely related to the Johnson–Neyman problem of Johnson and Neyman [26] and Potthoff [27]. The Johnson–Neyman regions of significance and non-significance are identified with the conclusion to reject or the failure to reject the conventional hypothesis of no difference between mean responses. Technical illustrations and implications can be found in Hunka [28], Rogosa [29], and Spiller, et al. [30], among others. Contrastly, the proposed equivalence test of simple effect can be used to identify the regions of equivalence and nonequivalence or the ranges of predictor values that the simple effect is equivalent and nonequivalent.

3.3 An Application

The prescribed example about training study in Table 6.1 of Huitema [21] is utilized to demonstrate the suggested equivalence testing of trend and simple effects between the first two treatments. In addition to the summary information of the first group, the second group of training type has N₂ = 10, $\overline{Y}_{2}$ = 39.0000 and ${\overline{X}}_{2}$ = 47.0000, and SSX₂ = 1798.00. The regression coefficient estimates are ${\text{\{ }}{\hat{\upbeta}}_{{{02}}} {, }{\hat{\upbeta}}_{12} \} = \, \left\{ {{15}.{1863}, \, 0.{5}0{67}} \right\}$ and the sample variance of error is ${\hat{\upsigma }}_{2}^{2}$ = 54.3025. It is readily obtained that ${\hat{\upbeta}_D} = {\hat{\upbeta}}_{11} {-}{\hat{\upbeta}}_{12} = {-}0.00{87}$ and the pooled sample variance is ${\hat{\upsigma }}^{2}$ = 62.4320. The equivalence hypothesis testing of trend effect is presented as H₀: β_D ≤ –0.25 or 0.25 ≤ β_D versus H₁: –0.25 < β_D < 0.25 (Δ_M = 0 and Δ = 0.25). For ν = 16 and α = 0.05, the test statistic T_DS = –0.0338 and the critical value τ_DS = 0.1048. Hence, the nonequivalence null hypothesis is rejected at the significance level 0.05. It suggests that the slope coefficient is virtually equivalent and their difference is within the range (–0.25, 0.25).

It is of practical importance to assess the simple effect or the mean response difference between two regression lines. At the particular predictor value X_F = 50, the mean response difference is computed as $\hat{\upmu }_{{\text{D}}} = \hat{\upmu }_{{1}} - \hat{\upmu }_{{2}}$ = –11.5161. For illustration, the equivalence thresholds is set as Δ_M = –11 and Δ = 5 and the equivalence test of simple effect is conducted for the hypotheses H₀: μ_D ≤ –16 or –6 ≤ μ_D versus H₁: –16 < μ_D < –6. With ν = 16 and α = 0.05, the test statistic and critical value can be obtained as T_DM = –0.1436 and τ_M = 0.1684, respectively. Consequently, the nonequivalence null hypothesis is rejected at the significance level 0.05 and the mean response difference is practically –11 with the threshold of 5 at X_F = 50. In view of the limited features of available software packages, computer programs are developed to facilitate the usage of the proposed equivalence procedures for trend and simple effects.

3.4 Power and Sample Size Calculations

In order to elucidate the critical notion of accommodating the distributional properties of the predictor variables, the continuous covariate variables {X_1i, i = 1, …, N₁} and {X_2j, j = 1, …, N₂} are assumed to have the independent normal distributions N(μ_X1, $\upsigma_{X1}^{2}$) and N(μ_X2, σ $_{X2}^{2}$), respectively. It can be readily established that K₁ = SSX₁/$\upsigma_{X1}^{2}$ ~ χ²(κ₁) and K₂ = SSX₂/$\upsigma_{X2}^{2}$~ χ²(κ₂) where κ₁ = N₁ – 1 and κ₂ = N₂ – 1.

Under the unconditional setting, the power function for trend effect is expressed as

$$\Pi_{DS} = P\{ {-}\uptau_{DS} < T_{DS} < \uptau_{DS} |\Delta_{L} < \upbeta_{D} < \Delta_{U} \}.$$

(37)

Note that the critical value τ_DS depends on the two statistics ${\hat{\upsigma }}^{2}$ and H_DS. With ${\hat{\upsigma }}^{2}$ = σ²(V/ν) and H_DS = 1/($\upsigma_{X1}^{2}$ K₁) + 1/($\upsigma_{X2}^{2}$K₂), the power function Π_DS can be rewritten as

$$\Pi_{DS} = E_{{(K{1},K{2},V)}} [\Phi \left( {B_{DS} } \right) \, {-}\Phi \left( {A_{DS} } \right)],$$

(38)

where B_DS = (Δ_M – β_D)/(σ²H_DS)^1/2 + τ_DS(V/ν_D)^1/2, A_DS = (Δ_M – β_D)/(σ²H_DS)^1/2 – τ_DS(V/ν_D)^1/2, and E_{(K1, K2, V)} is taken with respect to the joint distribution of K₁, K₂ and V.

Moreover, the normality assumptions of predictor variables imply that

$$T_{Xg} = \frac{{\overline{X}}_g - X_F } { ({\hat{\upsigma }}_{X_g}^{2} /N_g)^{1/2} } \sim t(\upkappa_g ,\lambda_{X_g} )$$

(39)

where $\hat{\upsigma}_{Xg}^{2}$ = SSX_g/κ_g and λ_Xg = (μ_Xg – X_F)/($\upsigma_{Xg}^{2}$/N_g)^1/2 for g = 1 and 2. Following the prescribed power function Π_DS, the power function for mean response difference is presented as

$$\Pi_{DM} = P\{ {-}\uptau_{DM} < T_{DM} < \uptau_{DM} |\Delta_{L} < \upmu_{D} < \Delta_{U} \} .$$

(40)

Note that the critical value τ_DM depends on the two terms ${\hat{\upsigma }}^{{2}}$ and H_DM. With ${\hat{\upsigma }}^{{2}}$ = σ²(V/ν_D), H_DM = 1/N₁ + 1/N₂ + $T_{X1}^{2}$/(κ₁N₁) + $T_{X2}^{2}$ /(κ₂N₂), the power function has the alternative form

$$\Pi_{DM} = E_{{(TX{1},TX{2},V)}} [\Phi \left( {B_{DM} } \right) \, {-}\Phi \left( {A_{DM} } \right)],$$

(41)

where B_DM = (Δ_M – μ_D)/(σ²H_DM)^1/2 + τ_DM(V/ν_D)^1/2, A_DM = (Δ_M – μ_D)/(σ²H_DM)^1/2 – τ_DM(V/ν_D)^1/2, and E_{(TX1, TX2, V)} is taken with respect to the joint distribution of T_X1, T_X2 and V.

It is also temping to simplify the unconditional distributions for the equivalence test statistics for comparing slope coefficients and mean responses. Conceivably, a straightforward approach is to replace the two means {${\overline{X}}_{{1}}$, ${\overline{X}}_{2}$} and sum of squares {SSX₁, SSX₂} with the corresponding expected values E[${\overline{X}}_{{1}}$] = μ_X1, E[${\overline{X}}_{2}$] = μ_X2, E[SSX₁] = κ₁$\upsigma_{X1}^{2}$, and E[SSX₂] = κ₂$\upsigma_{X2}^{2}$. Thus, an approximate power function for the equivalence test of trend effect is

$$\Pi_{DSA} = P\{ {-}\uptau_{DS} < t(\nu ,\lambda_{DSA} ) \, < \uptau_{DS} \} ,$$

(42)

where λ_DSA = (β_D – Δ_M)/(σ²H_DSA)$^{1/2}$ and H_DSA = 1/(κ₁$\upsigma_{X1}^{2}$) + 1/(κ₂$\upsigma_{X2}^{2}$). Moreover, the power function of equivalence test of simple effect is expressed as

$$\Pi_{DMA} = P\{ {-}\uptau_{DM} < t(\nu ,\lambda_{DMA} ) \, < \uptau_{DM} \} ,$$

(43)

where λ_DMA = (μ_D – Δ_M)/(σ²H_DMA)$^{1/2}$ and H_DMA = 1/N₁ + 1/N₂ + (μ_X1 – X_F)²/(κ₁$\upsigma_{X1}^{2}$) + (μ_X2 – X_F)²/(κ₂$\upsigma_{X2}^{2}$). Empirical examinations will be conducted to demonstrate the critical differences between the exact and approximate power functions using different levels of information of predictor variables.

3.5 Numerical Investigations

The model configurations of the first two groups of the training study in Huitema [21] provide a convenient framework for the subsequent simulation study of trend effect and simple effect. For illustration, the key statistics of response and predictor variables are treated as population parameters as potential settings of future investigations for power calculations and sample size determinations. Specifically, the regression coefficients are {β₀₁, β₁₁} = {4.1033, 0.4980}, {β₀₂, β₁₂} = {15.1863, 0.5067}, and common error variance σ² = 62.4320. The means and variances of the two predictor variables are {μ_X1, $\upsigma_{X1}^{2}$} = {52.00, 223.7778} and {μ_X2, $\upsigma_{X2}^{2}$} = {47.00, 199.7778}.

Similar to the prescribed scenario of linear trend and mean response, numerical investigations contain the determination of optimal sample sizes and the simulation study of power calculations. Through the empirical examinations, the Type I error rate and nominal power are fixed as α = 0.05 and 1 – β = 0.80, respectively. First, the trend effect or the slope difference between two regression lines is β_D = –0.0087. Thus, the equivalence tests of trend effect have Δ_M = 0 and Δ = 0.2, 0.3, and 0.4 for the equivalence bounds. Second, the mean response of the two levels of treatment at X_F = 50 are μ₁ = 29.0040 and μ₂ = 40.5200, respectively, and their difference is μ_D = –11.5161. Accordingly, the equivalence tests of simple effect are performed for Δ_M = –11 and Δ = 4, 5, and 6. The optimal sample sizes of both exact approach and approximate method were determined for the chosen power value and significance level with balanced and unbalanced structures r = N₁/N₂ = 1 and 2. The computed sample sizes for the equivalence tests of trend effect and simple effect are presented in Tables 2 and 3, respectively. The results suggest the general pattern that the approximate formulas tend to give smaller sample sizes than the exact techniques. Balanced designs require fewer samples to achieve the nominal power than the unbalanced structures. Also, the computed sample size decreases with increasing threshold bound Δ.

Table 2 Computed sample size, estimated power, and simulated power for the Anderson and Hauck test of trend effect when Type I error α = 0.05 and nominal power 1 – β = 0.80

Full size table

Table 3 Computed sample size, estimated power, and simulated power for the Anderson and Hauck test of simple effect when X_F = 50, Type I error α = 0.05 and nominal power 1 – β = 0.80

Full size table

To elucidate the accuracy of sample size calculations, Monte Carlo simulation study of 10,000 replications were conducted to obtain the simulated powers and they are compared to the estimated powers for the optimal sample sizes. These power values and associated errors are also presented in the tables. As can been from the reported deviations, the exact approaches of trend effect and simple effect maintain small errors in power computations. Whereas the approximate methods are not as good as the exact counterparts and their performance deteriorates as the sample size decreases. Specifically, the two errors associated with Δ = 0.4 are {–0.0301, –0.0360} and {–0.0172, –0.0157} in Tables 2 and 3, respectively. The overall usefulness of the approximate methods is affected by the undesirable properties of underestimation of sample sizes and over-calculation of power levels. According to the findings, the exact power functions and sample size procedures are recommended for general use. The implementation of the suggested power evaluation and sample size determination involves specialized programs not currently available in prevailing statistical packages. Thus, the accompanying computer algorithms are presented for conducting the suggested power and sample size calculations.

4 Conclusions

The concept and theory of equivalence have been widely practiced in pharmaceutical sciences and related medical fields. Equivalence testing procedures are also potentially useful in behavioral and psychological sciences. The technical intuition and computational simplicity of TOST provide an important motivation to apply appropriate statistical tools for equivalence assessment, rather than the traditional hypothesis tests that purport to detect whether treatment groups significantly differ from one another. Despite the ready applicability, the TOST is generally conservative and the true Type I error rate can be substantially less than the nominal level for close equivalence bounds and small sample sizes. In contrast, the Anderson and Hauck procedure and other more powerful equivalence tests always have a rejection region with reasonably controlled significance level.

Within the context of linear regressions, one and two regression lines represent two major scenarios of regression slope appraisal research. Accordingly, the TOST has been applied to assess whether the linear trend is practically negligible in ecological and environmental issues. In view of the potential limitation of TOST, this study presents extended Anderson and Hauck procedures for equivalence assessment in linear regression analysis. Specifically, equivalence tests are proposed for evaluating the linear trend and mean response of a single regression line, and the trend effect and simple effect between two regression lines. The hypotheses are constructed with asymmetric equivalence bounds and therefore, can be readily applied to all equivalence problems about regression slopes and mean responses.

Moreover, to enhance the usefulness of the suggested procedures, the advanced issues of power and sample size calculations are also investigated. The proposed power and sample size procedures are derived under the random regression framework and have the distinct features to account for the imbedded uncertainty of predictor variables. It is essential to note that the recommended approaches involve statistical evaluations and iterative algorithms not currently available in statistical package. A full set of computer programs are developed for implementing the suggested equivalence tests and sample size determinations. These research findings expand the conceptual understanding and theoretical development of Anderson and Hauck procedure for equivalence assessments in linear regression analysis.

Availability of Data and Materials

The data are presented in the article.

References

Cashen LH, Geiger SW (2004) Statistical power and the testing of null hypotheses: a review of contemporary management research and recommendations for future studies. Organ Res Methods 7:151–167. https://doi.org/10.1177/1094428104263676
Article Google Scholar
Cortina JM, Folger RG (1998) When is it acceptable to accept a null hypothesis: no way, Jose? Organ Res Methods 1:334–350. https://doi.org/10.1177/109442819813004
Article Google Scholar
Edwards JR, Berry JW (2010) The presence of something or the absence of nothing: increasing theoretical precision in management research. Organ Res Methods 13:668–689. https://doi.org/10.1177/1094428110380467
Article Google Scholar
Frick RW (1995) Accepting the null hypothesis. Mem Cognit 23:132–138. https://doi.org/10.3758/bf03210562
Article Google Scholar
Rogers JL, Howard KI, Vessey JT (1993) Using significance tests to evaluate equivalence between two experimental groups. Psychol Bull 113:553–565. https://doi.org/10.1037/0033-2909.113.3.553
Article Google Scholar
Seaman MA, Serlin RC (1998) Equivalence confidence intervals for two-group comparisons of means. Psychol Methods 3:403–411. https://doi.org/10.1037/1082-989x.3.4.403
Article Google Scholar
Stanton JM (2021) Evaluating equivalence and confirming the null in the organizational sciences. Organ Res Methods 24:491–512. https://doi.org/10.1177/1094428120921934
Article Google Scholar
Stegner BL, Bostrom AG, Greenfield TK (1996) Equivalence testing for use in psychological and service research: an introduction with examples. Eval Program Plann 19:193–198. https://doi.org/10.1016/0149-7189(96)00011-0
Article Google Scholar
Steiger JH (2004) Beyond the F test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychol Methods 9:164–182. https://doi.org/10.1037/1082-989x.9.2.164
Article Google Scholar
Berger RL, Hsu JC (1996) Bioequivalence trials, intersection-union tests and equivalence confidence sets (with discussion). Stat Sci 11:283–319. https://doi.org/10.1214/ss/1032280304
Article Google Scholar
Meyners M (2012) Equivalence tests-a review. Food Qual Prefer 26:231–245. https://doi.org/10.1016/j.foodqual.2012.05.003
Article Google Scholar
Schuirmann DJ (1987) A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokinet Biopharm 15:657–680. https://doi.org/10.1007/bf01068419
Article Google Scholar
Schuirmann DL (1981) On hypothesis testing to determine if the mean of a normal distribution is contained in a known interval. Biometrics 37:617
Google Scholar
Westlake WJ (1981) Response to T.B.L. Kirkwood: bioequivalence testing–a need to rethink. Biometrics 37:589–594
Article Google Scholar
Anderson S, Hauck WW (1983) A new procedure for testing equivalence in comparative bioavailability and other clinical trials. Communications in Statistics-Theory and Methods 12:2663–2692. https://doi.org/10.1080/03610928308828634
Article Google Scholar
Hauck WW, Anderson S (1984) A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. J Pharmacokinet Biopharm 12:83–91. https://doi.org/10.1007/bf01063612
Article Google Scholar
Dixon PM, Pechmann JHK (2005) A statistical test to show negligible trend. Ecology 86:1751–1756. https://doi.org/10.1890/04-1343
Article Google Scholar
Schmidt BR, Meyer AH (2008) On the analysis of monitoring data: testing for no trend in population size. J Nat Conserv 16:157–163. https://doi.org/10.1016/j.jnc.2008.05.001
Article Google Scholar
Counsell A, Cribbie RA (2015) Equivalence tests for comparing correlation and regression coefficients. Br J Math Stat Psychol 68:292–309. https://doi.org/10.1111/bmsp.12045
Article Google Scholar
Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, vol 2. Wiley, New York
Google Scholar
Huitema B (2011) The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies, vol 608. Wiley, New York, NY
Book Google Scholar
Shieh G (2017) On tests of treatment-covariate interactions: An illustration of appropriate power and sample size calculations. PLoS ONE 12:e0177682. https://doi.org/10.1371/journal.pone.0177682
Article Google Scholar
Shieh G (2020) Power analysis and sample size planning in ANCOVA designs. Psychometrika 85:101–120. https://doi.org/10.1007/s11336-019-09692-3
Article MathSciNet Google Scholar
Kutner MH, Nachtsheim CJ, Neter J, Li W (2005) Applied linear statistical models, 5th edn. McGraw Hill, New York, NY
Google Scholar
Rencher AC, Schaalje GB (2007) Linear models in statistics, 2nd edn. Wiley, Hoboken, NJ
Book Google Scholar
Johnson PO, Neyman J (1936) Tests of certain linear hypotheses and their application to some educational problems. Stat Res Mem 1:57–93
Google Scholar
Potthoff RF (1964) On the Johnson-Neyman technique and some extensions thereof. Psychometrika 29:241–256. https://doi.org/10.1007/bf02289721
Article Google Scholar
Hunka S (1995) Identifying regions of significance in ANCOVA problems having non-homogeneous regressions. Br J Math Stat Psychol 48:161–188. https://doi.org/10.1111/j.2044-8317.1995.tb01056.x
Article Google Scholar
Rogosa D (1980) Comparing nonparallel regression lines. Psychol Bull 88:307–321. https://doi.org/10.1037/0033-2909.88.2.307
Article Google Scholar
Spiller SA, Fitzsimons GJ, Lynch JG Jr, McClelland GH (2013) Spotlights, floodlights, and the magic number zero: simple effects tests in moderated regression. J Mark Res 50:277–288. https://doi.org/10.1509/jmr.12.0420
Article Google Scholar

Download references

Funding

Open Access funding enabled and organized by National Yang Ming Chiao Tung University. This work was supported by a grant from the Ministry of Science and Technology (MOST-111–2410-H-A49-034-MY3).

Author information

Authors and Affiliations

Department of Management Science, National Yang Ming Chiao Tung University, Hsinchu, 300093, Taiwan
Gwowen Shieh

Authors

Gwowen Shieh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gwowen Shieh.

Ethics declarations

Conflict of interest

The author declares that he has no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 92 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shieh, G. The Extended Anderson and Hauck Tests and Sample Size Procedures for Equivalence Assessment in Simple Linear Regressions. J Stat Theory Pract 18, 36 (2024). https://doi.org/10.1007/s42519-024-00382-7

Download citation

Accepted: 12 May 2024
Published: 26 June 2024
DOI: https://doi.org/10.1007/s42519-024-00382-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Extended Anderson and Hauck Tests and Sample Size Procedures for Equivalence Assessment in Simple Linear Regressions

Abstract

Similar content being viewed by others

Moderator effects differ on alternative effect-size measures

Effect size measures for multilevel models: definition, interpretation, and TIMSS example

semPower: General power analysis for structural equation models

1 Introduction

2 Single Regression Line

2.1 Equivalence Test of Linear Trend

2.2 Equivalence Test of Mean Response

2.3 A Numerical Example

2.4 Power and Sample Size Calculations

2.5 Numerical Assessments

3 Two Regression Lines

3.1 Equivalence Test of Trend Effect

3.2 Equivalence Test of Simple Effect

3.3 An Application

3.4 Power and Sample Size Calculations

3.5 Numerical Investigations

4 Conclusions

Availability of Data and Materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 92 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Extended Anderson and Hauck Tests and Sample Size Procedures for Equivalence Assessment in Simple Linear Regressions

Abstract

Similar content being viewed by others

Moderator effects differ on alternative effect-size measures

Effect size measures for multilevel models: definition, interpretation, and TIMSS example

semPower: General power analysis for structural equation models

1 Introduction

2 Single Regression Line

2.1 Equivalence Test of Linear Trend

2.2 Equivalence Test of Mean Response

2.3 A Numerical Example

2.4 Power and Sample Size Calculations

2.5 Numerical Assessments

3 Two Regression Lines

3.1 Equivalence Test of Trend Effect

3.2 Equivalence Test of Simple Effect

3.3 An Application

3.4 Power and Sample Size Calculations

3.5 Numerical Investigations

4 Conclusions

Availability of Data and Materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 92 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation