Introduction

Two-factorial studies are common in nutrition and other biomedical researches, as evidenced in papers published in influential journals, including Amino Acids (e.g., Jobgen et al. 2009a; Willoughby et al. 2007), Journal of Nutrition (e.g., Eller and Reimer 2010; Vicario et al. 2007), and Journal of Nutritional Biochemistry (e.g., Liu et al. 2010; Mochizuki et al. 2001). This is primarily because scientists are interested not only in the main effects of two independent factors (e.g., amino acids and fats; protein and age; food intake and exercise), but also in their interaction (Fu et al. 2010). Take the study of amino acids in Jobgen et al. (2009a, b) as an example. These authors reported that dietary l-arginine (Arg) supplementation reduced white-fat gain in diet-induced obese (DIO) rats (Jobgen et al. 2009b) and modulated gene expression in adipose tissue (Jobgen et al. 2009a). In their study, both diet (high or low fat) and amino acids (with or without Arg) are the factors that affect white-fat gains in DIO rats, but whether these two factors exert effects separately or additively depends on whether an interaction effect exists.

Neter et al. (1996) have constructed a basic strategy for the analyses of two-factorial studies. Here, we create a flowchart (Fig. 1) of this method for a simple, parsimonious explanation. It is widely recognized that the analysis of factor effects is generally based on treatment means when the interaction of the factors is statistically significant, and involves multiple comparisons of treatment means (e.g., Neter et al. 1996; Steel et al. 1997). However, when the two factors do not interact in analysis of variance (ANOVA), multiple comparisons of treatment means are not generally indicated in the classic statistics textbooks (e.g., Steel et al. 1997). Therefore, a common understanding among biologists is that comparisons among treatment means cannot or should not be made when interaction of two factors is statistically nonsignificant (e.g., Eller and Reimer 2010; Liu et al. 2010; Mochizuki et al. 2001; Willoughby et al. 2007; Vicario et al. 2007). In the present article, we bring this misconception into the attention of biomedical and life science researchers. Additionally, we indicate what kind of comparisons among the treatment means can be performed when there is a nonsignificant interaction among two factors.

Fig. 1
figure 1

Strategy for analysis of two-factorial studies. The analysis of factor effects is generally based on treatment means when the interaction of the factors is statistically significant, and involves multiple comparisons of treatment means. However, when the two factors do not interact in analysis of variance, multiple comparisons of treatment means are not generally indicated in the classic statistics textbooks

General considerations

When factors do not interact

Based on Fig. 1, the first step in two-way ANOVA is to validate the assumptions of normality and constant variance for the experimental data using such a statistical method as the Levene's test (Levene 1960) or the Brown-Forsythe test (Brown and Forsythe 1974). If the data do not have homogenous variance, they should undergo transformation (e.g., logarithm transformation) to meet the necessary assumptions of ANOVA so that an appropriate statistical inference can be obtained (Fu et al. 2010). When the two factors do not interact (e.g., P > 0.05) in ANOVA, the main effects of these two factors should be determined, with the effect of one factor being averaged across the levels of the other factor. Similar strategies for analysis of two-factor studies have been proposed by some authors (e.g., Milliken and Johnson 1984). In all these methods, the analyses are focused on the main effects when the interaction is statistically nonsignificant. However, scientists may be interested in the comparison among treatment means even when the two factors do not interact. As in the nutritional study of Jobgen et al. (2009b), although there is no significant interaction between the two factors on white-fat mass, it is still interesting and important to determine whether the high-fat (HF) diet along with Arg supplementation has the same effect as the low-fat (LF) diet alone on adiposity.

Let the two factors be called A and B, and suppose each of them has two levels: A1, A2 and B1, B2. If there is no evidence of a statistically significant interaction between A and B, the analysis of factor effects usually involves only the factor level means. Inferences about the treatment means can be made by comparing the mean treatment responses across the levels of A averaged over the levels of B. This is possible because when there is no interaction between A and B, the difference in the response across the levels of A is the same at all levels of B.

Figure 2a plots the treatment means for different levels of A and B when there is no interaction between A and B. On the solid line are the mean treatment responses μ11 and μ21 for groups with (A1, B1), and (A2, B1), respectively. On the dashed line are the mean treatment responses μ12 and μ22 for groups with (A1, B2), and (A2, B2), respectively. The two lines are parallel when the two factors do not interact, thus μ11 − μ12 = μ21 − μ22. Hence, we can examine the differences in the levels of A averaged across the levels of B, and vice versa.

Fig. 2
figure 2

Treatment means when the two factors do (a) or do not (b) interact significantly. On the solid lines are the mean treatment responses μ11 and μ21 for groups with (A1, B1), and (A2, B1), respectively. On the dashed lines are the mean treatment responses μ12 and μ22 for groups with (A1, B2), and (A2, B2), respectively

When the interactions of factors are significant

If there is evidence of a significant interaction between A and B, inferences concerning the differences in the mean treatment responses for A must be conducted separately for each level of B, because the differences in the treatment mean responses across the levels of A may differ, depending on the level of B. Figure 2b plots the treatment means for different levels of A and B when there is an interaction. The two lines are not parallel when the interaction is significant, thus μ11 − μ12 ≠ μ21 − μ22. Hence, comparisons among the treatment means can be constructed to answer particular questions posed by the researchers. Another salient point in Fig. 2b is that μ11 + μ12 = μ21 + μ22, and μ11 + μ21 = μ12 + μ22, indicating that none of the main effects is significant. In such a case, it is justified to present statistically significant interaction, which may have a highly relevant biological interpretation. In statistical analysis, this case is also very interesting and important. Although μ11 − μ12 ≠ μ21 − μ22, μ11 − μ12 = −(μ21 − μ22), which means that the differences in the treatment means for the level A1 is the opposite of that for the level A2, depending on the level of B.

Comparisons of treatment means when factors do not interact

Factor A has two levels, factor B has two levels

As noted previously, based on the general analysis of factor effects, no comparison among the treatment means is usually suggested when the two factors do not interact. However, this does not mean that comparisons among treatment means cannot or should not be made. For simplicity, we first develop our strategy for the case when both A and B have two levels, and then extend to the general case in which A has a levels and B has b levels. From the preceding discussion, it is obvious that the detailed multiple comparisons of μ11 versus μ12 should not be done, as with the analysis of the main effects for B, namely μ11 versus μ21, μ12 versus μ22 and μ21 versus μ22. The only comparisons not covered by the main-effect analysis are μ12 versus μ21 and μ11 versus μ22. In this section, we discuss whether the post hoc test to compare μ12 and μ21, μ11 and μ22 can be performed when there is no significant interaction.

In the two-factor studies with two levels in each factor, there are four different situations that might occur when the two factors do not interact (Table 1). In the first case, there is no main effect in A and no main effect in B; in this case, no comparisons among μ12 and μ21, μ11 and μ22 are needed because the four treatment means do not differ. In the second case, there is a main effect in A but not in B; thus, no comparisons between μ12 and μ21 and μ11 and μ22 are needed because logically μ11 ≠ μ22 and μ21 ≠ μ12. The third case is similar to the second case, when there is a main effect in B but not in A. The fourth case is the most complicated, when there are main effects in both A and B. Comparison among μ12 and μ21 or μ11 and μ22 is needed because we do not know whether μ11 = μ22 or μ21 = μ12. Specifically, if the treatments means are increasing from A1 to A2 at all levels of B (Fig. 2a), then μ11 < μ22, and only the comparison between μ12 and μ21 is needed. Similarly, if the treatments means are decreasing from A1 to A2 at all levels of B, then μ12 > μ21, and only the comparison between μ11 and μ22 is needed.

Table 1 Treatment means when the two factors do not interact

Factor A has a levels, factor B has b levels

In the two-factorial studies, there are sometimes more than two levels in each factor. Suppose factor A has a levels and factor B has b levels, and let μ ij be the treatment means for ith level in A and jth level in B. Then, based on a procedure similar to that in the previous section, we have the following possible situations. If there is no main effect in either factor, no multiple comparisons among treatment means are needed. If there is a main effect in only one factor, we may only conclude that at least two of the treatment means in this factor are significantly different. Further, multiple comparisons among the treatment means are required to identify which specific means differed. If there are main effects in both A and B, and the treatments means are increasing from A1 to Aa at all levels of B, then multiple comparisons of (μ1,j+1, …, μ i−1,j+1, ……, μ1,b , …, μ i−1,b ) versus μ ij for i = 2,…, a; j = 1, …, b − 1 are needed. Similarly, if the treatments means are decreasing from A1 to Aa at all levels of B, then multiple comparisons of (μ i+1,1, …, μ i+1,j−1, ……, μ a,1, …, μ a,j−1) versus μ ij for i = 1, …, a − 1; j = 2, …, b are needed.

Example for multiple comparisons of treatment means when two factors do not interact

We use the experimental data of Jobgen et al. (2009b) to illustrate our strategy in performing multiple comparisons of treatment means when two factors do not interact. Rats were fed a LF or HF diet, and those which were fed the HF diet became obese. After a 15-week period of LF or HF feeding, the rats were unsupplemented or supplemented with Arg. Thus, there were four treatment groups: LF − Arg (LF without Arg), LF + Arg (LF with Arg), HF − Arg (HF without Arg), and HF + Arg (HF with Arg). Note that there were two levels of dietary fat (low vs. high) and two levels of Arg (+ vs. −). The relative weight of the white adipose tissue (% of body weight) is our outcome variable, and we are interested in whether the response is the same in the HF + Arg and LF − Arg groups, which, if true, would indicate that Arg supplementation prevents HF-induced obesity in adult rats.

Based on two-way ANOVA, there is no significant interaction (P = 0.69), and the main effects of dietary fat and Arg supplementation are both significant (P < 0.01). Figure 3 shows the mean relative weights of mesenteric adipose tissue in the four groups of rats. The treatment means clearly increase from LF to HF at both levels of Arg. The only comparison which is not clear is that between LF − Arg and HF + Arg, and differences between all other means are significant. Therefore, we use the t test, which allows us to determine if there is any difference between treatment means of the LF − Arg and HF + Arg groups. This is exactly what we are interested in, and from the post hoc t test, there is no statistically significant difference (P = 0.997) between these two treatment means. These results indicate that supplementing Arg to adult obese rats can effectively reduce the white-fat mass to the level observed in lean rats that are not supplemented with Arg (Jobgen et al. 2009b). This finding has important implications for preventing and treating obesity in both humans and animals (McKnight et al. 2010; Wu et al. 2009).

Fig. 3
figure 3

Mean relative weights of mesenteric adipose tissue in rats fed high- or low-fat diets and either unsupplemented or supplemented with Arg. Data are adapted with permission from Jobgen et al. (2009b)

In the study of Jobgen et al. (2009b), the Tukey multiple comparison test was performed to identify which specific means differed. This is sound in the statistical principles and has been adopted in the subsequent studies (Lassala et al. 2010; Satterfield et al. 2010; Tan et al. 2010). Note that, in the work of Jobgen et al. (2009b), the only comparison needed is that between LF − Arg and HF + Arg. Therefore, a simple post hoc t test is sufficient to achieve the P value and also provides a greater power in statistical analysis. This new strategy can be used in the future studies involving multiple comparisons of treatment means when there is a nonsignificant interaction between two factors.

Conclusion

Comparison among treatment means when there is no interaction is meaningful for some specific situations. When we analyze the main effects of the two factors, no comparison among treatment means is needed if there is no main effect for either factor. If there is a main effect for only one factor, multiple comparisons among the treatment means for this factor are required to identify which specific means differ. If there are main effects for both factors and each factor has two levels, comparison among μ12 and μ21, or μ11 and μ22 is needed. Similar conclusions are made for the general case in which A has a levels and B has b levels. In a two-factorial study, the basic principles of statistical analysis allows for comparison among treatment means when the two factors do not interact. This clarification will help the biomedical and life science researchers to analyze their experimental data and answer specific scientific questions.