1 Introduction

When researchers use regression models, they typically assume that the regression coefficients are constant and that the same regression equation can satisfactorily describe all the population members. More often than not, this assumption of homogeneous data structures does not hold (Wedel et al. 1999). If not addressed adequately, heterogeneity leads to inaccurate results and inappropriate conclusions (Jedidi et al. 1997). To account for heterogeneity, researchers usually rely on observable characteristics, which allow for partitioning data into subsets, or they introduce interaction terms to consider group-related effects. However, in the light of the manifold limitations of those approaches—especially in the context of model-based clustering (Wedel and Kamakura 2000; Jedidi et al. 1997)—research interest during the last decade has been devoted to uncovering unobserved heterogeneity. Numerous researchers emphasize the importance of accounting for unobserved heterogeneity and call for a routine application of appropriate analysis techniques (e.g., Hutchinson et al. 2000; Grewal et al. 2010).

In this context, mixture regression models have gained increasing prominence. They help researchers uncover and treat unobserved heterogeneity and decide on a suitable number of segments to retain from the data. This method clusters observations of a pre-determined number of segments, while simultaneously estimating the parameters, thus avoiding well-known biases, which occur when segment-specific models are evaluated independently (e.g., DeSarbo and Cron 1988; McLachlan and Peel 2000). Not surprisingly, studies using mixture regression models are abundant in marketing research and other business disciplines (Andrews et al. 2007; Grewal et al. 2008; Mantrala et al. 2007). The increasing dissemination of finite mixture conjoint models (Andrews et al. 2002a, b; Jagpal et al. 2007) and the finite mixture partial least squares approach (Hahn et al. 2002; Sarstedt and Ringle 2010), both of which are based on the mixture regression concept, has further bolstered their popularity. With recent research supporting mixture regression models’ ability to reveal segmentation structures in markets and the nature of regression relationships in segments (Andrews et al. 2010), their importance for empirical research is expected to further increase.

In the application of mixture regression models, a fundamental challenge is the selection of the number of segments to retain from the data. A priori, this number is unknown in most empirical applications, but has a substantial effect on results’ interpretation. A misspecified number of segments results in under- or oversegmentation, which easily leads to inaccurate management decisions regarding, for example, customer targeting, product positioning, or determining the optimal marketing mix (Andrews and Currim 2003a). Specifically, if managers underestimate the number of customer segments in a market, they fail to identify distinct segments. If certain segments, such as lucrative niches, are ignored, companies may miss the opportunity to gain revenues from these customers due to their inability to address them separately and to satisfy their varying needs more precisely (Boone and Roehm 2002). In contrast, oversegmentation is likely to cause specific marketing activities that target non-relevant segments, resulting in a misallocation of resources. Given that companies spend considerable sums developing and targeting market segments, the costs of overestimating the appropriate number of segments can be substantial.

In order to determine the number of segments to retain from the data, researchers can use a broad range of segment retention criteria (i.e., information and classification criteria) to compare different segmentation solutions in terms of their model fit (based on likelihood or entropy measures; Claeskens and Hart 2009; Hawkins et al. 2001). Previous research has provided simulation studies that compare the efficacy of different segment retention criteria to identify a pre-specified number of segments in different situations that could potentially affect the criteria’s performance (e.g., Hawkins et al. 2001). In this line of research, Andrews and Currim (2003b) provide the most comprehensive study by investigating the performance of seven segment retention criteria. Their results reveal that researchers should revert to AIC3 (Akaike’s Information Criterion with a penalty factor of 3; Bozdogan 1994), as it yields the highest success rate and only a minor underfitting across a wide variety of data constellations.

A key aspect that neither Andrews and Currim (2003b) nor other studies in this research stream (e.g., Hawkins et al. 2001; Sarstedt 2008) have considered is how collinearity affects the retention criteria’s performance. Researchers working with regression-based marketing models usually face situations with collinearity between two or more predictor variables. Collinearity leads to analytical problems such as unstable estimates of the regression coefficients and inflated standard errors (Mason and Perreault 1991; Ofir and Khuri 1986). Mixture regression models intensify these collinearity effects, since estimating the regression model in each segment relies on fewer observations than estimating it at the aggregate data level (DeSarbo et al. 2004). However, collinearity not only affects the mixture regression coefficients and the estimates’ standard errors—a characteristic that is problematic in studies that aim at identifying segment structures—but also the likelihood of the model and the observations’ probabilities of segment membership, because collinearity between the predictor variables masks some of the population heterogeneity (DeSarbo et al. 2004). Since information criteria primarily build on the model’s likelihood values, increased levels of collinearity negatively affect their ability to determine the underlying number of segments. The same holds true for classification criteria that primarily rely on the entropy, which is a function of the segment membership probabilities (McLachlan and Peel 2000). Consequently, in the presence of collinearity, segment retention criteria are likely to misspecify the number of segments, fostering inaccurate management implications.

Against this background, we provide the first study on the effect of collinearity in mixture regression models. Specifically, we analyze several segment retention criteria and the impact that increased collinearity levels between the predictor variables have on their performance. The results show that collinearity has a substantial impact on the segment retention criteria’s performance and identify four particularly well-performing criteria, as well as a critical level of collinearity for their effective application. Our study’s findings contribute to the knowledge of mixture regression models and their application in the presence of collinearity levels that marketing researchers and practitioners are likely to encounter in empirical applications.

2 Study and simulation design

Reverting to prior studies (e.g., Claeskens and Hart 2009; Hawkins et al. 2001) allows us to identify and select nine information criteria (i.e., AIC, AIC3, AIC4, BIC, CAIC, HQ, ICOMP, MDL2, and MDL5) and eight classification criteria (i.e., AWE, CLC, EN, ICL-BIC, PC, PE, NFI, and NEC) to decide on the number of segments in mixture regression models.Footnote 1 To evaluate their performance, we systematically manipulate seven data characteristics (i.e., factors) on different levels. The selection of factors and factor levels draws primarily on Andrews and Currim’s (2003b) seminal study on the performance of segment retention criteria. Specifically, we consider the following factors and levels: number of segments [2; 3; 4], sample size [5 × 100; 10 × 100; 5 × 300; 10 × 300],Footnote 2 explained variance R 2 [40 %; 60 %; 80 %], mean separation between (standardized) segment-specific coefficients [0.2; 0.3; 0.4],Footnote 3 and relative segment size [balanced; unbalanced; very unbalanced].Footnote 4 In line with Mason and Perreault’s (1991) seminal study on the effects of collinearity in regular regression models, we consider four continuous predictor variables and one dependent variable.

Most importantly, our research considers the effects of collinearity between the predictor variables on the segmentation outcome. We generate data with different collinearity levels, following Mason and Perreault’s (1991) study on regular regression models and Grewal et al.’s (2004) study on structural equation models. To summarize these authors’ approaches, data is generated on the basis of a pre-specified correlation matrix of the four independent variables in the regression model. The higher the correlation between the independent variables, the more severe the presence of collinearity (Kim et al. 2013). In line with Mason and Perreault (1991), we use the variance inflation factor (VIF) to characterize the level of collinearity. Our study considers the situation without collinearity and eight levels of collinearity, representing levels that marketing researchers and practitioners regularly encounter in (mixture) regression studies. In particular, the pre-specified correlation matrices of these nine situations, which we use for data generation, result in VIF values of [1.00; 1.35; 1.80; 2.40; 3.00; 3.80; 5.50; 7.30; 10.70].Footnote 5

In addition, we consider whether the segments exhibit the same correlation matrix in terms of the independent variables or not. Usually, heterogeneity between segments is represented in terms of the regression coefficients’ heterogeneity and, therefore, in terms of different correlations between the independent and the dependent variables. However, heterogeneity can also occur in the correlations between the independent variables, thus resulting in different correlation matrices across segments in terms of the independent variables. This situation is not unlikely, because the entire covariance matrix between all the variables can be segment-specific and not only the covariance between the dependent and the independent variables (Marcoulides et al. 2012). To account for this phenomenon, we distinguish between two situations: consistent versus inconsistent between-segment correlation matrices [consistent; inconsistent]. In the first condition, the correlation matrix is the same across all segments, independently of the true segment differences. Thus, the independent variables (e.g., X 1 and X 2) always exhibit the same correlation in all the segments that depend on the collinearity level. In contrast, in the second condition, the correlation matrix differs across segments, because it is aligned with the pre-specified, segment-specific differences; consequently, high correlations occur between those variables that separate the segments more strongly. Hence, two specific independent variables (e.g., X 1 and X 2) could have a high correlation in one segment and a low correlation in another. Even though we use the same level of collinearity (i.e., the same correlation pattern, but assigned to different independent variables) in both situations (i.e., consistent and inconsistent), we expect that it is more difficult to identify the segments in the second condition, leading to a decline in performance of the segment retention criteria.Footnote 6

The full factorial design of the study results in 2 × 34 × 4 × 9 = 5,832 different combinations of the design factors. We created a program with the statistical software R (R Core Team 2014) to generate sets of data for the pre-specified factor levels. To analyze the simulated data, we use the R package FlexMix (Grün and Leisch 2008), which uses an expectation-maximizatio (EM) algorithm to solve the maximum likelihood estimation. To ensure the robustness of the results, 30 datasets per factor combination are generated and analyzed. Moreover, since the EM algorithm does not always converge to the global optimum, we select the best log-likelihood result of ten FlexMix runs with different starting values per dataset (Wedel and Kamakura 2000).

3 Results

3.1 Performance of segment retention criteria

Table 1 shows the average success (S), the underestimation (U), and overestimation (O) rates of the information and classification criteria. According to these results, AIC4, for example, identifies the correct number of segments in 82 % of all simulation runs where the number of segments is held constant at S = 2 (Factor 1) and all the other factors are varied according to the factor levels described above. Similarly, at this factor level, AIC4 underestimates (overestimates) the correct number of segments in 11 % (6 %)Footnote 7 of all simulation runs.

Table 1 Performance results

Overall, our results are similar to those of Andrews and Currim (2003b) when no collinearity is present (i.e., VIF = 1.0). For instance, Andrews and Currim (2003b) find that AIC identifies the pre-specified number of segments in 59 % of the cases, while it underestimates (overestimates) this number in 10 % (30 %) of the cases. These results compare well with ours, where AIC’s success rates, as well as its under- and overestimation rates, are 52, 3, and 44 % when no collinearity is present. More pronounced differences can be observed regarding Bayesian information criterion (BIC) and consistent Akaike information criterion (CAIC).

Taking the full range of design factors into account, the results show that AIC3, AIC4, HQ, and ICOMP are the best-performing criteria with overall success rates of 57 to 58 %. With regard to deviations from the pre-specified number of segments, information complexity criterion (ICOMP) exhibits both over- and underestimation tendencies, whereas AIC3, AIC4, and HQ (Hannan-Quinn criterion) show clear underestimation tendencies. AIC, BIC, and CAIC achieve success rates of between 43 and 47 %, whereas all the other criteria have overall success rates of only 38 % and lower (Table 1).

The classification criteria that do not allow for computing a one-segment solution (i.e., NEC, PE, EN, NFI, and PC) also achieve relatively high overall success rates regarding two groups, which is not surprising, as no underestimation is possible in this situation. However, the success rates of these criteria regarding three and four groups decline strongly, regardless of the other data characteristics. Hence, it is usually not advisable to apply these criteria in mixture regression model analyses to determine the number of segments.

In the following, we will focus our analysis of the collinearity effects on the best-performing information and classification criteria in this study; that is, AIC3, AIC4, ICOMP, and HQ. Our study reveals two major results. First, the segment retention criteria’s overall performance declines significantly at higher levels of collinearity. Second, higher levels of collinearity substantially increase the criteria’s underestimation rates, while their overestimation rates generally remain unaffected.

On average, the success rates of the best-performing criteria (i.e., AIC3, AIC4, ICOMP, and HQ) decrease by 21, 32, and 47 percentage points when the VIF increases from 1.0 to low (VIF = 2.4), intermediate (VIF = 3.8), and large (VIF = 10.7) values. While the success rates of AIC3 and ICOMP decrease by 44 and 45 percentage points, those of AIC4 and HQ even decline by 50 percentage points. These declines in the success rates go hand in hand with pronounced increases in the criteria’s underestimation tendencies. Thus, when collinearity is present, the criteria offer only limited guidance in terms of clearly pinpointing the number of segments to extract; instead, their underestimation tendency should be taken as evidence of a higher number of segments than actually indicated.

In addition, we also analyze the effect of inconsistent versus consistent correlation matrices across segments. When correlation matrices are inconsistent across segments, the segment retention criteria’s ability to correctly detect the pre-specified number of segments is considerably reduced. On average, the success rate of the best-performing criteria (i.e., AIC3, AIC4, HQ, and ICOMP) declines by 18 percentage points when the correlation matrix is inconsistent instead of consistent across segments (Table 1).

Moreover, we find a strong interaction effect between collinearity and the consistency of the correlation matrix across segments (Fig. 1). Correlation matrices that are inconsistent across segments reinforce collinearity’s negative effect on the success rates, while correlation matrices that are consistent across segments only moderately affect the success rates. A possible explanation for collinearity’s strong effect in situations with inconsistent correlation matrices across segments might be that these models have not been properly identified (Hennig 2000). One of the identification conditions for mixture regression models requires the independent variable matrix in each segment to not be singular (DeSarbo et al. 2007). Moreover, the predictor variables should not be collinear but should also be prevented from being too close to collinearity to derive consistent estimates (Hennig 2000).

Fig. 1
figure 1

Success rates for different levels of collinearity and consistent vs. inconsistent correlation matrix (four best-performing Criteria)

To shed further light on the simulation design factors’ effects on the criteria’s performance, we meta-analyzed the results of the four best-performing criteria (AIC3, AIC4, ICOMP, and HQ) by fitting multinomial logistic regression models to the success data. We coded the dependent variable as −1 if the criterion underestimated the correct number of segments, 0 if the criterion identified the correct number of segments (reference category), and 1 if the criterion overestimated the correct number of segments. We confirmed the above results as increasing the number of segments, or the level of collinearity, significantly increases the log-odds of underestimating the correct number of segments, while increasing the sample size, the explained variance, or the mean separation between segment-specific coefficients, significantly reduces the log-odds of underestimating the correct number of segments. Moreover, an inconsistent correlation matrix across segments significantly increases the log-odds of underestimating the correct number of segments.Footnote 8

3.2 Quality of correctly extracted segments

The next step of the analysis involves assessing the clustering accuracy of the identified segments.Footnote 9 A well-known criterion for this purpose is the adjusted Rand index (ARI; Hubert and Arabi 1985), which depicts the similarity between the clustering of the identified segments and the pre-specified groups. An ARI value of 0 indicates the expected value of random (uniform) clustering solutions, whereas an ARI value close to its maximum value of 1 indicates that the segmentation solution substantially matches the expected data grouping, and thus has a high clustering accuracy (Hubert and Arabi 1985).

Table 2 shows the average ARI values of the four best-performing segment retention criteria and all the factor levels analyzed in this study.Footnote 10 We computed ARI values for (1) the number of segments that each segment retention criterion indicated and (2) for the pre-specified number of segments. The ARI values of the pre-specified number of segments serve as benchmarks for the criteria’s ARI values, as this value would result if the correct number of segments were chosen. Analogous to the success rates, AIC3, AIC4, ICOMP, and HQ show the best overall performance with ARI values between 0.465 and 0.471, which are close to the benchmark value of 0.486.

Table 2 Adjusted rand index for the four best-performing criteria

Our analyses show that those factors that negatively affect the success rates of the segment retention criteria also negatively affect their ARI values. Most notably, we find that at higher levels of collinearity, the ARI declines considerably. For example, the ARI value of AIC3 declines from 0.657 in situations without collinearity (VIF = 1.0) to 0.322 in situations with very high collinearity (VIF = 10.7). On average, the ARI values of the best-performing criteria (i.e., AIC3, AIC4, ICOMP, and HQ) show an overall decline (i.e., the difference between the situation without collinearity and the highest collinearity level) of 0.336. Furthermore, settings with inconsistent correlation matrices entail considerably lower ARI values.

However, collinearity also affects the benchmark ARI values that one would obtain if the correct pre-specified number of segments were chosen. This result implies that even if one identified the correct number of segments, the collinearity between the predictors would negatively influence the mixture regression results. Hence, collinearity not only influences the segment retention negatively, but also the quality of the mixture regression results in general.

We also ran separate ANCOVAs for each of the four best-performing criteria, using the ARI values as dependent variables. The analysis results are highly consistent regarding AIC3, AIC4, ICOMP, and HQ, showing that all factors have a significant impact on the ARI.Footnote 11 Moreover, the vast majority of the two-way interaction effects between the factors are also significant (p ≤ 0.01), but of minor relevance (based on their partial η 2, which is generally below 0.05), with the exception of the interaction between the number of segments and the consistent vs. inconsistent correlation matrix, which has a partial η 2 of around 0.065 as well as VIF and the consistent vs. inconsistent correlation matrix, which has a partial η 2 of around 0.17. This result further illustrates the importance of the interaction between the level of collinearity and the consistency of the correlation matrix between segments, which we also show in Fig. 1.

3.3 Parameter recovery

To further investigate the effect of collinearity on the mixture regression results, we examine the parameter recovery errors that collinearity induces by comparing the estimated mixture regression coefficients with their pre-specified values. Table 3 shows the mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) of the parameter estimates.

Table 3 Parameter recovery accuracy

We find a slightly negative ME (systematic negative bias) across all factor levels, indicating an underestimation of the mixture regression coefficients. In terms of collinearity effects, the ME remains relatively stable across all levels of collinearity but increases considerably in size when the correlation matrices are inconsistent rather than consistent. Furthermore, we find that the MAE and RMSE, which indicate the magnitude of the estimation error, increase considerably with stronger levels of collinearity. Specifically, the MAE (RMSE) increases from 0.027 (0.035) in situations without collinearity to 0.140 (0.190) in situations with very high collinearity (VIF = 10.7). This result implies that the variance of the estimated regression coefficients strongly increases with increased levels of collinearity. This result is one we would also expect from a normal regression case (Mason and Perreault 1991). While the difference between the MAE and RMSE is relatively small in situations of low levels of collinearity, it increases with higher levels of collinearity. Hence, at higher levels of collinearity, the likelihood of obtaining segmentation results that differ significantly from the true population results increases.

4 Conclusion and recommendations

This study extends the research on mixture regression models, most notably the study by Andrews and Currim (2003b), by examining the effect of collinearity on the performance of a broad range of segment retention criteria. When using mixture regression models, researchers should not apply (popular) segment retention criteria blindly—even at collinearity levels much smaller than those that popular textbooks on research methods judge as critical (e.g., a VIF value of 10; Hair et al. 2010). In particular, our analysis reveals that VIF levels as low as 2.4 affect the segment retention criteria’s performance by dramatically increasing their underestimation tendencies. Our results also show that collinearity effects differ clearly across the criteria. Collinearity affects some criteria, such as AIC3, AIC4, BIC, CAIC, ICOMP, and HQ, more strongly than it does others (e.g., AIC, MDL5, and CLC).

The segment retention criteria’s underfitting tendencies in the presence of high collinearity are particularly problematic when the segment retention criteria indicate a one-segment solution, although the group-specific regression coefficients differ significantly across the segments. In these cases, ignoring heterogeneity and analyzing the data on the aggregate level can have adverse consequences for any conclusions drawn from such marketing models.

The negative effect of collinearity is even more pronounced if the correlation matrices are inconsistent across segments (i.e., when high correlations occur between those variables that separate the segments more strongly). This kind of situation is not unlikely, as the entire covariance matrix between all the variables can be segment-specific—not only the covariance between the dependent and the independent variables (Marcoulides et al. 2012). The overall sample covariance (correlation) matrix might not reveal the strong underlying group-specific collinearity because the correlations are equaled out when the two groups are combined. Researchers should therefore not only analyze the collinearity on the aggregate-data level when conducting a mixture regression analysis, but also analyze the group-specific correlations to assess whether collinearity threatens the segment retention. Only if the collinearity diagnostics suggest that multicollinearity is not an issue for the aggregate data level or the segment-specific outcomes, should researchers further interpret the mixture regression results and draw conclusions from them (Grewal et al. 2013).

It seems common practice for authors to justify their segment decision solely on the basis of segment retention criteria (e.g., Cortiñas et al. 2010; Dubois et al. 2005). Our results show that such an approach is not without problems and suggest three main recommendations for working with mixture regression models. First, given the criteria’s increased underestimation tendencies even in the presence of low collinearity levels, researchers are well-advised to consider higher segment solutions than the criteria actually indicate, even if all criteria point at the same number of segments. This especially holds true when criteria point at a one-segment solution, thus suggesting that heterogeneity is not a problem. In such a situation, researchers should carefully evaluate whether using a segment-specific model provides plausible insights and an increased predictive ability (Andrews et al. 2007). Second, the risk of collinearity increases especially for survey-based research studies that do not use an orthogonal (experimental) design (e.g., cross-sectional studies with complex models that feature three- or four-way interactions), as higher-order interaction terms correlate increasingly with lower-order interaction terms and the main effects (Grewal et al. 2013). Hence, in mixture regression models with interaction terms (e.g., in moderated regression analyses), researchers should be extra cautious about increasing levels of collinearity. Third, the criteria should be perceived as providing a reasonable range of segment solutions that researchers can evaluate by following guidelines such as those that Kotler and Keller (2012) provide (i.e., the segments should be actionable, differentiable, and substantial). In doing so, researchers need to separately evaluate the collinearity between the predictors in each segment when comparing different segment solutions. The corresponding results need to be routinely provided (e.g., the VIF values of each segment regarding different segment solutions), along with the segment retention criteria values, to increase confidence in the findings.

This research is subject to limitations, which future studies should address. First, researchers should develop segment retention criteria that perform well under varying levels of those design factors that are relevant for mixture regression models, including collinearity. Second, our study only provides evidence of collinearity effects on the performance of segment retention criteria regarding the multivariate normal mixture case. Future research could examine how collinearity affects their performance in other types of exponential family mixture models. Third, examining the effects of collinearity not only on parameter recovery but also on statistical power would be a fruitful research endeavor. Replicating Mason and Perreault’s (1991) work in a mixture regression context might be a promising first step toward calibrating the conditions under which collinearity affects the interpretation of mixture regression results. Fourth, recent publications have presented alternative methods for applying mixture regression models that are more robust against irregularities in the data, such as strong collinearity between the predictor variables (Kim et al. 2012, 2013).Footnote 12 However, future research should assess the effect of collinearity on the segment retention capabilities of such novel approaches in greater detail.Footnote 13 For example, as extant research on these approaches solely relies on the log marginal likelihood (LML) for model selection. Future studies should more broadly examine the efficacy of the LML under different model conditions (including collinearity), and, if necessary, develop alternative model selection heuristics.