1 Introduction

Market segmentation is a critical building block of strategic marketing (Iacobucci 2013) and “essential for marketing success” (Lilien and Rangaswamy 2003, p. 61). Conceptually, there are two possible approaches to market segmentation. Segments can be defined by using one single segmentation variable. For example, profitability can be used to split existing customers into a high, medium, and low profit potential segments. These three market segments can then be profiled using descriptor variables such as benefits sought from the product, socio-demographics, or media behavior. This approach has been referred to as a priori (Myers and Tauber 1977; Mazanec 2000; Wedel and Kamakura 2000), convenience-group (Lilien and Rangaswamy 2003), or commonsense (Dolnicar 2004) segmentation.

Alternatively, multiple segmentation variables can be used. For example, benefits people seek when buying food in a fast food restaurant (save time, save money, keep kids happy, …) may have been collected in a survey. The full set of benefits is used to extract market segments. As opposed to segmentations based on one variable, it is therefore not known in advance what the defining features of each of the market segments may be. Once the segments have been extracted from the data, they also need to be profiled in detail using descriptor variables, just like the high, medium, and low profit potential segments in the previous example. This approach where multiple segmentation variables are used is referred to as a posteriori (Mazanec 2000), response-based (Myers and Tauber 1977), post-hoc (Wedel and Kamakura 2000) or data-driven market segmentation (Dolnicar 2004). Throughout the manuscript, we will use the terms commonsense and data-driven segmentation because they are most intuitive in terms of what each of those concepts means.

When commonsense segmentation is conducted, it is typically obvious from the start which the most attractive target market will be. If profitability is used as the segmentation criterion, the high profit potential segment is undoubtedly the most attractive and should be chosen as the target segment.

When data-driven market segmentation is conducted, however, the decision which market segment to choose as the target segment is not at all obvious. The state of the art approach to data-driven market segmentation involves the following steps (Wedel and Kamakura 2000; Lilien and Rangaswamy 2003): First, a managerial decision is made about which set of variables will be used as segmentation variables. Second, these segmentation variables are collected, frequently by means of a survey study. Third, the empirical data forms the basis of extracting market segments. A wide range of distance- or model-based methods is available to achieve this. At this stage, it is common that segmentation solutions for a range of numbers of segments are calculated to determine which of these global market segmentation solutions (each containing multiple segments) performs best on statistical criteria. The best performing solution is selected. Next, all the market segments contained in this particular market segmentation solution are described in detail using both the segmentation variables and additional descriptive variables. Finally, based on this information, a target segment is selected using criteria such as how similar segment members are to one another, how distinct the segment is with respect to the segmentation variables, whether it is large enough, whether it matches the firm’s strengths, whether it is identifiable, and whether it can be reached with the tools of the marketing mix (Wedel and Kamakura 2000; Lilien and Rangaswamy 2003; McDonald and Dunbar 2012).

This state of the art approach is prone to making one critical mistake: selecting a global market segmentation solution which does not contain the most attractive individual segment or segments. This can happen because the statistical criteria used to select the global market segmentation solution are not aimed at identifying the most attractive individual market segments contained in the global solution.

This problem is exacerbated by the fact that data-driven market segmentation analysis—irrespective of the algorithm used—leads to different results if repeated (Dolnicar and Leisch 2010). As a consequence, experts in multivariate analysis for marketing research recommend to analyze data more than once with more than one algorithm. Iacobucci (2013, p. 15), for example, explicitly suggests to “Choose one of these algorithms (and play with more than one).” This advice reflects the fact that data-driven market segmentation analysis—whether it is done using distance based methods such as cluster analysis or model-based methods such as finite mixture models—is essentially exploratory in nature: these methods are nothing more than sophisticated fishing rods. But they can make mistakes. This paper presents an approach that prevents the mistake of choosing a global market segmentation solution that does not contain any attractive market segments from happening. Two illustrations with consumer data show how the most attractive target segments would indeed have gone undetected if segment level criteria had not been inspected. To stick with the fishing analogy: the proposed segment level stability measures proposed here make fishing less random. They ensure that no baby fish get caught when really we are after a big fat salmon. To date, no other approach has been proposed that can achieve this aim.

2 Traditional global criteria

Different indices can be used to assess the goodness of fit of the global market segmentation solution. Most cluster algorithms using Euclidean distance try to optimize a functional of between- and within- cluster sum of squares (Everitt et al. 2011; Kaufman and Rousseeuw 1990). Let T be the total scatter matrix of a data set of size n in p dimensions, that is, the covariance matrix multiplied by n−1. Let W be the within-cluster scatter matrix, and B be the between-cluster scatter matrix, such that T=W+B. Examples for target functions of cluster algorithms include trace and determinant of W, or trace of BW −1 (Everitt 1974). The most common target is the sum of squares within clusters SSW=trace(W). Some also consider the sum of squares between clusters SSB=trace(B). To choose a specific number of clusters, one can either search for an elbow in the within sum of squares criterion, or use more refined criteria. The seminal paper by Milligan and Cooper (1985) lists over a dozen indices which can be used. Most search for minima, maxima, or elbows in functionals of the above, such as SSW/k (Ball and Hall 1965), [(SSB/(k−1))/(SSW/(nk))] (Calinski and Harabasz 1974), or log(SSB/SSW) (Hartigan 1975). Calinski-Harabasz performs best in the simulations by Milligan and Cooper (1985). There are more recent additions to the list, but the classic indices are still most popular. For model-based clustering procedures, information criteria such as AIC or BIC relate the goodness of fit measured by the likelihood to the number of estimated parameters, which are a function of the number of clusters (Fraley and Raftery 1998).

Recent simulation studies have compared cluster indices on large simulation designs. Most focus on correct identification of the number of clusters (Chiang and Mirkin 2010; Steinley and Brusco 2011), others on validity or stability of cluster solutions (Brock et al. 2008; Steinley 2008; Vinh et al. 2010). Another line of research explores using resampling methods to evaluate cluster stability and choosing the right number of clusters (Dudoit and Fridlyand 2002; Grün and Leisch 2004; Lange et al. 2004; Tibshirani and Walther 2005). Stability of segmentation solutions has been proposed as a key evaluation criterion (Breckenridge 1989, 2000; Dolnicar and Leisch 2010; Putler and Krider 2012) where high levels of stability are interpreted as indicative of existing data structure, be it actual cluster structure or any other kind of data structure which enables similar clusters to be identified across repeated computations. Solutions with high global stability are preferable.

The difficulty with all these traditional criteria is that they evaluate the global segmentation solution, thus potentially discarding segmentation solutions that are globally suboptimal, but may contain the single most interesting segment for a particular organization.

To assess the current practice of evaluating alternative segmentation solutions, twenty-nine applied data-driven market segmentation studies published after 2006 across different disciplines were reviewed (reference list available upon request). Because applied market segmentation studies conducted by organizations are not accessible, all applied segmentation studies published in academic journals in the last decade which could be found were included. The search was very wide and included studies segmenting wine customers, tourists, households, green consumers, generation Y females, shoppers, mothers, university students, primary care patients, smokers and entrepreneurs. Studies which conducted data-driven market segmentation were included. A detailed inspection of the methodology indicates that, in all studies, the segmentation solution was chosen based on a global assessment. A segment level assessment was never undertaken before one specific segmentation solution was chosen. Seventeen percent of studies based the decision on one single cluster analysis; 21 % ran a hierarchical analysis, used the dendrogram to chose the number of clusters and then ran one run of a partitioning algorithm; 45 % ran one computation for a range of cluster numbers; ten percent reran computations both across and within certain numbers of clusters; and seven percent provided insufficient explanation. Criteria mentioned for the selection of the global segmentation solution include overall interpretability of the solution, overall distinctness of segments contained, size of all segments contained or a combination of statistical criteria and visual inspection of all segment profiles.

3 Segment level criteria

We propose two new criteria: segment level stability within solutions with the same number of segments (SLS W ) and segment level stability across solutions with different numbers of segments (SLS A ). Both have in common that the entity being evaluated is the segment, not the global segmentation solution. The following analogy illustrates the key benefit derived from those new criteria: traditional measures assess the quality of a haystack (the global overall segmentation solution containing a number of segments). But, as we will demonstrate, the nicest haystack may not contain the sharpest needles. The benefit of the newly proposed evaluation criteria is that they enable the assessment of needles within haystacks and thus allow data analysts and managers to focus on what really matters: finding one or a small number of good individual target segments.

3.1 Segment level stability across solutions with different numbers of segments (SLS A )

SLS A measures the persistence of a segment reoccurring across segmentation solutions with different numbers of clusters. Higher SLS A values point to a higher likelihood of the segment representing a natural as opposed to an artificially constructed market segment (Dolnicar and Leisch 2010). Segment solutions containing one or more segments with high SLS A should not be discarded.

Let P 1, P 2,…, P m be a series of m partitions with numbers of clusters k 1<k 2<…<k m . SLS A can be quantified using an entropy measure (Shannon 1948). Entropy is defined as \(-\sum {p_{j}\log p_{j}}\) and can be interpreted as the uncertainty in a discrete probability distribution p 1,…, p k . In our case, the p j are the percentages of data points any given segment in P i+1 obtains from each segment in P i . Segments that have high entropy recruit a large number of members from different segments from the segmentation solution with fewer segments. In order to get a standardized measure for segment stability, entropy values are standardized by dividing them by the maximum possible entropy that would occur in the case of equal distribution of all segment members across all old segments; that is, \(- \sum (1/k)\log (1/k) = \log (k)\). The SLS A measure of stability is

$$\text{SLS}_A = 1 - \frac{\sum{p_{j}\log p_{j}}}{\log{k}} $$

As such, the values lie between a minimum (undesirable) SLS A value of 0 and a maximum (desirable) value of 1.

An SLS A plot is shown in Fig. 1. Each vertical column of circles represents one global segmentation solution; circles represent individual segments contained in those solutions. Lines between circles illustrate how many segment members stay in the same segment when more segments are extracted. High SLS A segments are depicted by a thick, dark blue connecting line across solutions and no other lines branching off. Note that plotting SLS A requires segment number relabelling; a relabelling algorithm is provided in the Appendix. Low SLS A segments have many thin, light grey branches feeding into and running out of them. The thickness of the lines indicates the absolute number of segment members flowing into a segment; the color indicates the number of segments these members have been sourced from. Because one of the most critical selection criteria for a target segment in practice is that of segment profitability or future profit potential of a segment, the coloring of the circles in the SLS A plot indicates the profitability or profit potential of each segment. The ability of SLS A to identify naturally occurring market segments has been tested using artificial data sets with known structure.

Fig. 1
figure 1

Segment level stability for the guest survey data across solutions (SLS A ) with four to nine segments shown in columns, stability shown by the thickness of lines, and profitability reflected in the color of the nodes

3.2 Segment level stability within solutions with the same number of segments (SLS W )

SLS W measures how often—across multiple computations of the segmentation solution with the same number of clusters—a segment with the same key characteristics is identified. This criterion has been proposed as an evaluation criterion for global segmentation solutions (Dolnicar and Leisch 2010). Following Hennig (2007), we show how it can be applied at the segment level; technical details are provided in the Appendix. High SLS W segments are attractive because they are likely to represent natural segments. Segmentation solutions containing high SLS W segments should not be discarded.

To compute SLS W , several bootstrap samples are drawn from the data set for each number of clusters of interest. Then, agreement between the original partition and each bootstrap partition is computed. Hennig (2007) defines maximum agreement as the stability of a segment in this bootstrap replica. SLS W across all bootstrap replicates can be visualized using a boxplot as shown in Fig. 2. Maximum SLS W is indicated by a horizontal line located at the top of the chart. Low SLS W is shown by a low median reproducibility and/or a high level of dispersion around the median. The ability of SLS W to identify naturally occurring market segments has been tested using artificial data sets with known structure.

Fig. 2
figure 2

Segment level stability (SLS W ) within the five and nine segment solutions for the guest survey data

4 Illustration with empirical data

4.1 Austrian national guest survey

Austrian National Guest Survey data from 1994 and 1997 from 11,378 tourists are used. The segmentation base contains 21 travel motives, such as “On holidays I want to rest and relax.” Answer options were as follows: applies to me greatly (1), mostly (2), slightly (3), and not at all (4). The motives are very distinct from each other, e.g., 14 principal components are needed to explain 80 % of the variance.

We compute segmentation solutions for between four and twenty segments using the k-means algorithm with 30 random starts for each number of clusters. All computations have been done in R (R Development Core Team 2016) using package flexclust (Leisch 2006). The traditional Calinski-Harabasz index gives no indication of a good choice of number of clusters at all; the index values decrease smoothly displaying no local minimum or elbow. The overall stability of the segmentation solutions for four to nine clusters leads to the conclusion that the five-segment solution should be chosen because it produces the highest median stability (0.84) across 100 replications.

Figure 1 plots SLS A . Each column in this plot represents one segmentation solution ranging from four to nine clusters. At the far left is the four-cluster solution, at the far right the nine-cluster solution. As can be seen, segments 1 and 9 (in the nine-cluster solution) have the highest SLS A values, with their membership changing only marginally even when the number of segments doubles. Unfortunately, both these segments represent response styles rather than distinctly profiled segments and therefore cannot be considered attractive target segments. Response styles are systematic tendencies of responding to survey questions independent of question content which are consistent over time and across survey contexts (Paulhus 1991).

Using the segment numbering from the nine segment solution (right side of plot), segments 3, 7, and 8 have low SLS A . They are artificially created as the number of segments increases. Segments 2, 4, 5, and 6 demonstrate reasonably high SLS A . These segments emerge initially in the five- and six-segment solution and then reappear in all solutions containing higher numbers of segments. They are not response style segments and therefore represent potentially interesting target segments. Moving to the analysis of descriptor variables, segments 4 and 6 are of particular interest because they display the highest profitability (indicated by red color in the plot) as assessed by their daily expenditures during the holiday on which they have been surveyed. More specifically, members of segments 4 and 6 spend nearly twice as much money on holiday as the least profitable segment 2 (73 Euros and 80 Euros per person per day, respectively, as opposed to only 51 Euros in segment 2). This difference is highly statistically significant (Kruskal Wallis test p value < 2.2e-16). Members of segment 6 have another interesting feature: they are much more frequently than other tourists first time visitors to Austria. As such, they could be described as light users as opposed to heavy users who are regular visitors to Austria. Heavy versus light usage is a valuable variable in market segmentation. It can be used as the segmentation variable in commonsense segmentation resulting in light, medium, and heavy usage segments which can then be described in detail before targeting. They can also be used as descriptors in a data-driven market segmentation study as is the case in our example.

Figure 2 shows SLS W for the travel motive data. The five segment solution is shown because it represents the segmentation solution recommended by the global stability criterion. The nine cluster solution is shown because it contains the segments with high SLS A (segments 4 and 6). As can be seen from Fig. 4, all segments in the five-cluster solution are indeed highly stable, explaining why the global measure recommends this solution. However, this solution does not contain what later becomes the highly profitable, high SLS A segment 4. Selecting this solution on the basis of global criteria would lead to segment 4 being irretrievably lost. Good haystack, no needle.

When inspecting the SLS W for the nine-cluster solution, the two highly profitable, high SLS A segments both emerge as high SLS W as well. Segment 4 is the highest SLS W segment (median 0.742), closely followed by segments 9 (0.738) and 5 (0.709). Segment 3 (which is not present in the five-cluster solution) has the lowest SLS W . Segments 4, 5, and, to a lesser degree, 6, are not response-style segments, yet they display high SLS W , thus warranting further assessment.

Who are segments 4 and 6? Segment 6 can be described as an adventure segment, and segment 4 as a health segment. This health segment is particularly interesting for the Austrian tourism context, which has many hot spring resorts and for which this segment is excellently suited. Members of this segment, as opposed to other segments, are slightly older, spend significantly more money when on vacation and engage in a different set of activities: they are more into relaxing and swimming, less into hiking, sightseeing, going to museums, and riding bicycles. The health tourist segment represents an attractive target segment, which—using overall quality criteria for assessing segmentation solutions—would not have been detected.

4.2 Fast food restaurant image data

In this second illustration, image data about a fast food restaurant (Subway) which were collected in 2009 are used. A total of 1453 respondents assessed—using a binary response format—the following attributes: yummy, fattening, greasy, fast, cheap, tasty, expensive, healthy, disgusting, convenient, and spicy.

Segmentation solutions for three to nine segments using the k-means algorithm with 30 random starts for each number of clusters were generated. The traditional Calinski-Harabasz index again gives no indication of a good choice of number of clusters at all; the index values are almost constant over the whole range of clusters, any partition could be chosen. For this data set, assessing global segmentation solution stability leads to no recommendation whatsoever in terms of which solution to choose. The options are, then, to either randomly choose a solution or inspect segment level criteria.

Figure 3 plots segment level stability across numbers of clusters. To illustrate the versatility of this plot, the node color in this particular plot reflects the frequency of eating at subway (instead of profitability which was plotted for the guest survey data). This allows simultaneous inspection of stability across numbers of clusters and heavy versus light user segments. As can be seen in Fig. 3, segments 3, 6, and 9 in the right column (the nine cluster solution) are heavy users of subway. As can also be seen, segment 9 (bottom row in Fig. 3) is extremely stable across numbers of clusters. It first emerges in the four cluster solution and then remains virtually unchanged until the nine cluster solution. Segment 6 first emerges in the five cluster solution and also stays practically unchanged until the nine cluster solution. Based on the insights gained from Fig. 3, both market segments 6 and 9 represent very attractive candidates for target segments.

Fig. 3
figure 3

Segment level stability for the fast food data across solutions (SLS A ) with three to nine segments shown in columns, stability shown by the thickness of lines, and user status (heavy versus light) reflected in the color of the nodes

In terms of stability within cluster numbers, Fig. 4 shows that segments 6 and 9 also outperform all other segments in this criterion which both have a median stability level of 0.89.

Fig. 4
figure 4

Segment level stability (SLS W ) within the nine segment solution for the fast food data

Inspecting the profile of these segments reveals that members of segment 9 perceive Subway—more so than the other segments—as yummy, fast, cheap, tasty, healthy, and convenient, but not as fattening, greasy, expensive, disgusting, and spicy. Segment 6 largely shares this perception, with the one exception that they do perceive Subway as spicy. Both segments are attractive market segments. They could both be targeted in a differentiated marketing strategy. Alternatively, Subway could choose to position itself in a certain ‘‘spiciness position” and focus on one of those segments only.

Importantly, had the global four segment solution been chosen initially, segment 6 would have gone unnoticed. The ability to inspect segment level stability has been crucial in being able to detect its existence.

5 Conclusions

Market segmentation has greatly contributed to understanding consumer behavior in the marketplace (Roberts 2000) and has therefore been widely adopted by industry. Despite its popularity, some aspects of segmentation analysis that may be statistically satisfactory do not provide optimal market insights for marketing managers. One such case is the selection of a market segmentation solution based on global criteria assessing an overall segmentation solution instead of assessing the segments contained therein. This is despite the fact that most organizations require only one or a small number of well-chosen, attractive (e.g., profitable) target segments to ensure survival and competitive advantage. The statistical properties of the global segmentation solution have the potential of distracting data analysts away from alternative solutions with worse global criteria values but containing individual highly attractive market segments. The excitement about a beautiful haystack may leave needles unnoticed.

This paper presents and demonstrates the usefulness of two new assessment criteria aiming at identifying segmentation solutions which contain interesting segments rather than being globally optimal. Interesting segments demonstrate high SLS A (segment level stability across segmentation solutions with different numbers of clusters) and high SLS W (segment level stability across repeated computations with the same number of clusters; reproducibility; replicability). The key advantage of the proposed criteria is that they protect segmentation solutions containing one or only a few very attractive market segments from being prematurely discarded. The two new criteria help data analysts to spot great needles in ugly haystacks.

The effectiveness of the proposed approach has been demonstrated using two empirical data sets. In both cases, using traditional selection criteria for market segmentation solutions would have failed in guiding the data analyst to select a good segmentation solution, either because traditional criteria resulted in no recommendation at all or because they resulted in a suboptimal recommendation. Using the two proposed segment level criteria allowed more detailed insights into the nature of segments emerging across different segmentation solutions and, in so doing, pointed to particularly attractive market segments. Pinpointing those segments made it possible to select a good market segmentation solution for further profiling and selection of one or more target segments.