1 Introduction

The use of ipsative measures is frequently observed in research about organizational values. One of the reasons for that is that ipsative measures may lower the risk of social desirability response bias (Meglino and Ravlin 1998). However, the analysis of ipsative data is problematic, because standard statistical analyses yield biased results (Guilford 1952; Cleman 1966; Hicks 1970). Therefore, authors are discouraged to use this type of measurement (Baron 1996; De Vries 2007).

Nevertheless, studies in which ipsative scores are applied keep recurring in the literature. A good example is the measurement of organizational culture and values by means of the competing values framework (CVF), and its accompanying ‘organizational culture assessment instrument’ (OCAI), see Cameron and Quinn (1999, 2006, 2011). The OCAI still is a very popular questionnaire; the latest manual (Cameron and Quinn 2011) had 2,349 citations on Google Scholar, at the end of January 2014, and an internet search suggested that the questionnaire is also widely used outside academia. Unfortunately, up until now, the majority of the applied statistical methods do not take into account the ipsative nature of the OCAI results. There is a need for an alternative ‘pragmatic empiricist approach’ that is also acceptable for ‘rigorous statisticians’, to use the terminology of Kolb and Kolb (2005, p. 11).

The goals of this paper are threefold: (1) To present a condensed literature review on ipsative measurement; (2) To provide an example of the use of ipsative measures in the domain of organizational culture and values (Cameron and Quinn 1999, 2006, 2011); and (3) To provide and illustrate an alternative method to analyze and test the results of ipsative measures, that is inspired by compositional data analysis methods that are widely used in geology (Aitchison 1982, 1986, 2003; Billheimer et al. 2001; Pawlowsky-Glahn et al. 2007).

2 Ipsative measurement

2.1 A condensed literature review

The use of ipsative measures in psychology has a long tradition. According to Martinussen et al. (2001), ipsative measures were initially reported by Marston (1999, original paper published in 1928), who developed the so-called dominance, influence, steadiness, compliance (DiSC) model, which describes four primary emotions and associated behaviors. To measure these constructs, Marston used ipsative scales. Since then, forced-choice, self-report questionnaire formats have been further developed (Stephenson 1935; Allport 1937; Cattell 1944), and used in all sorts of psychological testing: For instance, the measurement of personality (Gordon 1951; Johnson et al. 1988; Matthews and Oddy 1997; Bowen et al. 2002), performance evaluation (Sisson 1948; Berkshire and Highland 1953; Sharon 1970), the measurement of learning styles (Kolb 1976, 1984; Loo 1999), the assessment of human and organizational values and attitudes (Rokeach 1970; Quinn and Rohrbaugh 1983; Kamakura and Mazzon 1991), personnel selection (Johnson et al. 1988; Meade 2004; Christiansen et al. 2005), and vocational choice (Strong 1935; Callis et al. 1954; Campbell 1966).

The term ‘ipsative’ literally means ‘of the self’ and is a derivation from the latin ipse (Jackson and Alwin 1980). “An ipsative measure is individually (or self-)referenced rather than norm referenced” (Plotkin Group 2007, p. 1); ipsative data are person centered (Gaylin 1989). Ipsative scales are defined as: “any set of variables that sum to a constant for individual cases, regardless of the value of the constant” (Horst 1965, pp. 290–291), as a set of scores with the property that “the sum of the scores over the attributes for each individual equals a constant” (Cleman 1966; Cheung and Chan 2002, p. 58; Jackson and Alwin 1980, p. 218). According to Chan (2003, p. 99), a scale is ipsative when: “an attribute of an individual is measured relatively to his/ her scores on other attributes.” In purely ipsative measurement, each choice is scored. In fixed or constant-sum scales, items are grouped in item sets, and respondents must compare options instead of selecting the most desirable alternative, as is the case with normative scores (cf., Likert scales). In ipsative measurement “the total score across the measures is equal to each of the other subjects” (Loo 1999, p. 149). Hicks (1970, p. 167) stated that: “each score for an individual is dependent on his own scores on other variables, but is independent of, and not comparable with, the scores of other individuals.” Ipsative scores differ from normative scores in that they assess relative instead of absolute values (Brown and Bartram 2009). As a consequence, only intra-individual— not inter-individual—comparisons are possible (Cattell 1944; Hicks 1970; Closs 1976; Fedorak and Coles 1979; McClean and Chissom 1986; Baron 1996; Closs 1996).

Perfect or purely ipsative scores may be expressed in either percentages or ranks, as long as their sums add to a fixed constant. Ipsative data are produced by forced-choice response formats, either rankings or ratings (Alwin and Krosnick 1985). Meade (2004) called this type ‘forced-choice ipsative data’ (FCID) in case of ratings, and ‘ordinal ipsative data’ (OID) in case of rank-ordered scales. Ipsative data can also be the result of transformation of normative data, which is called ‘ipsatized data’ (Cattell 1944; Bartram 1996), or ‘Additive Ipsative Data’ (AID) (Chan and Bentler 1993). According to Meade (2004) these respective data types have distinct psychometric characteristics.

The use of ipsative measures in psychology is controversial (Meglino and Ravlin 1998). Problems have been described by many authors. Apart from the problematic statistical analysis, respondents report great difficulty in filling out ipsative scales (Steenkamp and Baumgartner 1998; Baumgartner and Steenkamp 2001). Hicks (1970, p. 181) concluded: “ipsative scores should be used only in situations where it has been demonstrated that (a) significant response bias exists; (b) this bias reduces validity; and (c) an ipsative format successfully diminishes bias and increases validity to a greater extent than do nonipsative controls for bias.” The best advice would be to avoid this type of measure wherever possible (Baron 1996; De Vries 2007). A fundamental criticism is that scores on a multi-scale measure are statistically interdependent on both item and covariance levels (Meade 2004): true and error scores are contaminated across scales, so reliability is difficult to assess. Because of these features, conventional correlation-based methods—such as factor analysis, regression analysis, and LISREL—are not allowed (Guilford 1952; Cleman 1966; Massy et al. 1966; Jackson and Alwin 1980; Chan and Bentler 1993; Dunlap and Cornwell 1994; Cornwell and Dunlap 1994). Also, the interpretation of results may be problematic, since the correlations between constant-sum scales and between ipsative factors often turn out to be spuriously negative (Cleman 1966; Hicks 1970; Johnson et al. 1988; Baron 1996; Brown and Bartram 2009). In general, reliability of ipsative data is lower than of normative data (Saville and Willson 1991; Bartram 1996).

Ipsative measurement not only has limitations, but also some advantages. Acquiescence responding, halo effects, faking good or impression management, and social desirability bias are better controlled by means of ipsative methods than by normative methods (Cunningham et al. 1977; McClean and Chissom 1986; Gurwitz 1987; Cheung and Chan 2002; Bowen et al. 2002; Cheung 2006). Ipsative data usually have a higher operational validity (Christiansen et al. 2005; Bartram 2007). Also, a greater differentiation of scores within a multi-variate profile is seen as a positive outcome (Baron 1996; Brown and Bartram 1999). In general, rankings show greater differentiation than ratings (Alwin and Krosnick 1985).

In the psychological literature, several authors have posed strong criticisms against the parametric statistical analysis of ipsative data (Guilford 1952; Cleman 1966; Hicks 1970). Bartram (1996, p. 26) summarized these criticisms as follows:

  • “They cannot be used to compare individuals on a scale-by-scale basis,

  • Correlations among ipsative scores cannot legitimately be factor-analyzed in the usual way,

  • Reliabilities of ipsative tests overestimate, sometimes severely, the actual reliability of the scales; in fact the whole idea of error is problematic,

  • For the same reason, and others, validity of ipsative scores overestimates their utility,

  • Means, standard deviations and correlations derived from ipsative test scales are not independent and cannot be interpreted and further utilized in the usual way”.

Several authors have sought a solution for the interdependency dilemma by leaving out one of the attributes from the item set, i.e., dropping one score or category (Tatsuoka 1971; Anderson et al. 1975; McClean and Chissom 1986; Steenkamp et al. 2001; Leeuwen and Mandabach 2002), or by using special tools i.e., the Chan and Bentler 1993/1996 (CB) method or the direct estimation (DE) method (Cheung and Chan 2002; Cheung 2004) in a confirmatory factor analysis (CFA) model. Some authors have claimed that, in practice, differences in the analyses of both ipsative and normative scores are not as big as expected (Saville and Willson 1991; Loo 1999). However, other authors have strongly disagreed (Cleman 1966; Hicks 1970; Martinussen et al. 2001). Ipsative data seem to produce comparable results to normative data when the number of scales is greater than 30 and there are low correlations between the scales (Bartram 1996; Baron 1996; Chan and Bentler 1993, 1996, 1998).

2.2 Related work: compositional data analysis

Apart from psychology, ipsative or compositional data are dealt with in many other disciplines, such as demography, population genetics, marketing, software engineering, and geology (Butler et al. 2005). We have particularly looked at software engineering and geology. In software engineering, Rinkevics and Torkar (2013) published a systematic review of 180 studies that had applied a particular software requirements priorization method, called ‘cumulative voting’, since 2000. Cumulative voting is a tool that produces compositional data, and is frequently used in software release planning and cost-value analysis. On the basis of this systematic review they concluded that no more than 22 of the 180 studies had actually analyzed some data, and that: “only one study uses compositional data analysis methods” (p. 280). Several studies had used inadequate statistical methods for analysis. Rinkevics and Torkar (2013) suggested an alternative method, based in compositional data analysis, called: Equality of Cumulative Votes, which is incompatible with our approach, since it is focusing on hierarchical item sets.

In geology, petrology and sedimentology, the experience with ipsative data, called compositional data, is ubiquitous (Aitchison 1982, 1986, 2003; Billheimer et al. 2001; Butler et al. 2005; Buccianti et al. 2006; Pawlowsky-Glahn et al. 2007). The analysis of the structure and composition of sediments requires a ‘constant-sum’ approach. For us, it came as a surprise to discover the unique solutions for the statistical analysis of ipsative data that had been developed, in quite another discipline. We have been very much inspired by those solutions.

2.3 Application domain: organizational culture and values

The second goal of the paper is to provide an example of the use of ipsative measures in the domain of organizational culture and values (Cameron and Quinn 1999, 2006, 2011). According to Hartnell et al. (2011, p. 677), organizational culture and values is a productive theme with more than 4,600 articles, since 1980. It is an important domain of study in which ipsative measurement frequently is used. According to Meglino and Ravlin (1998), this is because of at least four reasons: (a) “the locus of values is within the individual” (p. 353); (b) values are “less than totally conscious” (p. 360); (c) values are “hierarchically structured” (p. 360); and (d) values are “socially desirable phenomena” (pp. 354/360). Therefore, it is not precise enough to let people normatively rate values; they should make deliberate choices by using rankings or forced choices. According to Van Leeuwen and Mandabach (2002, p. 89), “because the choice nature of ranking fits with the view of values as innately comparative and competitive, researchers have argued for using ranking techniques in value research (Alwin and Krosnick 1985; McCarty and Shrum 2000).”

One of the most elaborate approaches in the domain of organizational culture and values is the CVF. The heart of this framework consists of a comprehensive theoretical model, the competing values model (CVM) which is related to organizational effectiveness. It was originally constructed on the basis of three bi-polar shared-value dimensions, called “recognized dilemmas” (Quinn and Rohrbaugh 1983, p. 370), “preferences” (Howard 1998, p. 234) or “competing values” (Quinn and Rohrbaugh 1981). These three basic dimensions were derived from Campbell’s (1977) review, listing some 30 organizational effectiveness indicators, and distilled in two panels of academics from various disciplinary backgrounds, in two successive rounds, by means of multi-dimensional scaling (Quinn and Rohrbaugh 1981, 1983). These value dimensions are: orientation towards change (stability and structural control versus flexibility and discretion); organizational perspective (internal stakeholders’ focus and integration versus external stakeholders’ focus and differentiation); and preferred objectives and processes (means versus ends), c.f. Quinn and Rohrbach (1983, p. 369); Cameron and Quinn (1999, pp. 13–40).

The first two dimensions span four quadrants, representing four historical theoretical paradigms: c.f., internal process, rational goal, human relations, and open systems (Quinn and Rohrbaugh 1983, p. 371; Quinn 1988), which are indicative for four organizational culture types, c.f.: (1) ‘hierarchy’ (Weber 1947): do things right, control (dominant values: control and internal focus); (2) ‘market’ (Williamson 1975; Ouchi 1979, 1984): do things fast, compete (dominant values: control and external focus); (3) ‘clan’ (Ouchi 1981; Wilkins and Ouchi 1983, pp. 472–474): do things together, collaborate (dominant values: flexibility and internal focus); and (4) ‘adhocracy’ (Mintzberg 1983): do things first, create (dominant values: flexibility and external focus), c.f., Quinn and Kimberly (1984); Cameron (2004; Cameron and Quinn (2006), p. 28). The third dimension represents the preferred objectives and processes in each quadrant (1983, p. 369): ad (1) hierarchy-culture means are: information management and communication; hierarchy-culture ends are: control and stability; ad (2) market-culture means are: planning and goal setting; market-culture ends are: productivity and efficiency; ad (3) clan-culture means are: cohesion and morale; clan-culture ends are: human resource development; ad (4) adhocracy-culture means are: flexibility and readiness; adhocracy-culture ends are: growth and resource acquisition (Quinn 1988). In each quadrant also two roles were distinguished (Quinn 1988): i.e., ad (1) hierarchy: monitor and coordinator; ad (2) market: director and producer; ad (3) clan: facilitator and mentor; ad (4) adhocracy: innovator and broker (Quinn 1988).

Because of the holistic character of organizational values (Zammuto 1988), an organization scores in all four quadrants of the CVM, which creates complex organizational culture profiles that are typical for different types of companies in different industrial sectors (Quinn 1988; Cameron and Quinn 1999, 2006, 2011), or life cycles (Quinn and Cameron 1983).

Several authors have found consistent evidence for the validity of the CVM (Quinn and Spreitzer 1991; Zammuto and Krakower 1991; Denison and Mishra 1995; Howard 1998; Cameron and Quinn 1999; Kalliath et al. 1999; Lamond 2003; Kwam and Walker 2004; Ralston et al. 2006; Rhee and Moon 2009). To measure the four culture quadrants in a sample of 796 executives, Quinn and Spreitzer (1991) used an ipsative questionnaire, which is a forerunner of the OCAI (Cameron 1978), and contrasted it with a normative counterpart, based on Likert-type scales (Quinn, undated, not published). The results of a multi-trait/multi-method analysis and of a multi-dimensional scaling procedure show convergent, discriminant, and nomological validity for both ipsative and normative measures of the CVM. The authors also reported high reliability coefficients for both ipsative and normative instruments (Cronbach’s Alpha), which are, however, questionable in case of the ipsative ones (c.f., Bartram 1996; Leeuwen and Mandabach 2002; Meade 2004).

The OCAI is a research questionnaire that was developed on the basis of the CVM model (Cameron 1978; Quinn and Cameron 1983; Quinn 1988; Cameron and Quinn 1999, 2006, 2011). The OCAI contains 24 statements, linked to the four culture types with six dimensions each: (1) Dominant organizational characteristics; (2) Leadership style; (3) Management of employees; 4) Organizational glue; (5) Strategic emphasis; and (6) Criteria for success (Cameron and Quinn 1999, pp. 31–40; Berrio 2003).

The CVM model and associated OCAI instrument have been widely used for assessing and profiling organizational cultures in a variety of organizations (Quinn and Cameron 1983; Cameron and Quinn 1999, 2006, 2011; Cameron 2004): for instance, health care (Kalliath et al. 1999); veterans health administration (Helfrich et al. 2007); construction firms (Oney-Yazic et al. 2006); libraries (Kaarst-Brown et al. 2004; Stanton 2004; Varner 1996), schools and universities (Cameron 1978; Zammuto and Krakower 1991; Berrio 2003; Kwam and Walker 2004); manufacturing companies (Zammuto and O’Connor 1992; Sousa-Poza et al. 2001; Braunscheidel et al. 2010); courier express delivery (Chan 1997); engineering and project management services (Igo and Skitmore 2006); civil engineering division of a Ministry (Schepers and Berg 2007); and public utility/administration organizations (Quinn and Spreitzer 1991; Talbot 2008).

The CVM model and associated OCAI instrument have been applied in different countries, including the USA (Cameron and Quinn 1999, 2006, 2011; Howard 1998; Sousa-Poza et al. 2001; Oney-Yazic et al. 2006; Helfrich et al. 2007; Braunscheidel et al. 2010); Switzerland and South Africa (Sousa-Poza et al. 2001); Australia (Igo and Skitmore 2006); Estonia, Japan, Russia, Czech, Finland, Germany and Slovakia (Übius and Alas 2009); China (Kwam and Walker 2004; Übius and Alas 2009); Korea (Rhee and Moon 2009); and the Netherlands (Schepers and Berg 2007).

The advantages of the CVF were summarized by Yu and Wu (2009, p. 40) as follows: “few dimensions but broad implications (...), empirically validated in cross-cultural research (...), most extensively applied in the context of China (...), most succinct (...).” According to Cameron (2009, p. 2): “The robustness of the framework is one of its greatest strengths. In fact, the framework has been identified as one of the 40 most important frameworks in the history of business”. Hartnell et al. (2011) conducted a meta-analytical study of the CVF. They analyzed 84 non-ipsative studies, and concluded (p. 688): “The results provide broad-based support for the CVM’s assertion that culture types are associated with important effectiveness criteria. The study’s findings, however, provide only mixed support for the CVM’s underlying theoretical suppositions. Given the moderately small association between the CVM’s culture types and effectiveness, fertile research opportunities exist to extend culture research by considering unexplored moderators, mediators, and culture configurations that further elucidate the veracity of culture’s relationship with effectiveness criteria.”

Researchers who have applied the CVF used a variety of statistical analyses: Q-methodology and multi-dimensional scaling analysis (Howard 1998); canonical correlation analysis (Sousa-Poza et al. 2001); ANOVA (Übius and Alas 2009; Schepers and Berg 2007); confirmatory factor analysis and multiple regression analysis (Rhee and Moon 2009); and structural equation modeling (Braunscheidel et al. 2010). It is questionnable whether or not these analyses are permitted since most studies used purely ipsative scales.

2.4 OCAI measurement scales

CVF makes use of a standardized questionnaire (the OCAI) consisting of six items representing the before-mentioned six organizational culture dimensions, while each item has four statements. For the typical OCAI answer format, see Table 1.

Table 1 Typical response format of an OCAI item

Respondents are asked to divide a hundred points to evaluate four statements that are indicative of the four organizational culture types. As an example, Table 2 shows the four statements of the first of six items pertaining to the culture dimension ‘dominant organizational characteristics’; plus an admissible response. Respondents are asked to give their answers for both the current and preferred situations. Let respondents be indexed by \(i (i=1,\ldots ,N)\), and let items be indexed by \(j (j=1,\ldots ,6)\). We use the term parts to denote the categories of each item. For example, in Table 1, the parts are hierarchy culture, market culture, clan culture, and adhocracy culture. Let the parts be indexed by \(d (d=1,\ldots ,4)\). In addition, let \(X_{ijd} \) denote the score of respondent \(i\) on part \(d\) on item \(j\); superscripts \(c\) and \(p\) are added when it is required to distinguish between scores on the current situation \((X_{ijd}^c )\) and preferred situation \((X_{ijd}^p)\). Because the data are ipsative, \(X_{ijd} \ge 0\) for all \(i,\; j,\; d\); and \(\sum _{d} X_{ijd} =100\) for all \(i,\; j\). The vector \(\mathbf{X}_{ij} =(X_{ij1} ,X_{ij2} ,X_{ij3} ,X_{ij4} )\), containing respondent \(i\)’s four scores on item \(j\), is called the individual dimensional profile (IDP).

Table 2 OCAI item, and individual dimensional profiles for both current and preferred situations

IDPs may be averaged over all six items of an organizational culture dimension, which results in an individual full profile (IFP). For an example, see Table 3, left-hand panel. The IFP is denoted as \(\mathbf X _{i+} =(X_{i+1} ,X_{i+2} ,X_{i+3} ,X_{i+4} )\). The “+” in the subscript indicates the part-wise arithmetic mean; hence, \(X_{i+d} =\frac{1}{6}\sum _{j=1}^6 X_{ijd} \). Taking the part-wise means of the IDPs and IFPs over the respondents yield collective dimensional profiles (CDP) and collective full profiles (CFP) for the total sample or for subsamples, which are denoted \(\mathbf X _{+j} =(X_{+j1} ,X_{+j2} ,X_{+j3} ,X_{+j4} )\) and \(\mathbf{X}_{++} =(X_{++1} ,X_{++2} ,X_{++3} ,X_{++4} )\), respectively.

Table 3 OCAI organizational culture profiles and their averages: the individual dimensional profiles and individual full profile (left-hand panel), and N = 226 individual dimensional profiles and collective full profile (right-hand panel)

As an example, for the first respondent, the left-hand panel of Table 3 shows the six IDPs \((\mathbf{X}_{1,1} ,\ldots ,\mathbf{X}_{1,6})\) and the resulting IFP \((\mathbf{X}_{1+} )\). The right-hand panel of Table 3 shows the six CDPs \((\mathbf{X}_{+,1} ,\ldots ,\mathbf{X}_{+,6} )\) and the resulting CFP \((\mathbf{X}_{++})\), for all repondents. OCAI-based CDPs and CFPs usually are displayed in a four-axes graph, in which each axis represents a part. For example, Fig. 1 shows the CFPs for both the current and preferred situations.

Fig. 1
figure 1

Typical graphical display of OCAI-based collective full profiles (CFPs) for both the current and preferred situations

2.5 Basic problems with the analysis of ipsative or compositional data

On the basis of the above-mentioned reviews, we will distinguish two problems of ipsative data, both related to the sum constraint. For a more rigorous discussion of these problems, we refer to, for example, Aitchison (1986). First, correlations and covariances between parts cannot be used. The correlation matrix and covariance matrix of ipsative data are singular. This means that one cannot readily apply data-analysis methods that require the inverse of the covariance or correlation matrix. Moreover, the covariances and correlations are negatively biased (for a mathematical proof, see, e.g., Aitchison 1986, pp. 53–54). This implies that statistical independence does not yield zero correlations. If data are completely randomly distributed over the D parts of a profile, the expected covariance is not equal to zero but equal to \(({-1})/({D-1})\). Table 4 illustrates the negativity bias with a profile that consists of two parts: If part A increases, part B, by definition, decreases, so the correlation between the two parts is unrelated to the scores and equals \(-1\) by definition. As a result, values of covariances and correlations cannot be interpreted and, therefore, cannot be used to investigate the internal structure of ipsative data. This also precludes data-analysis methods that use correlations or covariances as input, such as factor analysis, reliability analysis, principal component analysis, or regression analysis. It is either technically impossible because the covariance matrices are singular, or technically possible but the outcome is meaningless.

Table 4 Small dataset illustrating that for two-part profiles the correlation between the parts equals \(-1\) by definition

The second problem is that ipsative data cannot be used for normative measurement; that is, the values of the parts cannot be compared between respondents, or between organizations. For example, if organizations A and B have scores 20 and 50, respectively, on clan, then one cannot conclude that B has a higher clan culture than A. This phenomenon can be explained as follows. Suppose that employees could freely assign scores to an organization’s levels of hierarchy culture, market culture, adhocracy culture, and clan culture on scales from 1 to 100. Also, suppose that organization A is very successful and manages to do things right, fast, first, and often together. The employees may award organization A with scores in the first row of Table 5. Suppose organization B is not as successful and manages to do things quite often together but hardly ever right, fast, and first, which may result in scores in the second row of Table 5. Comparing the non-ipsative scores shows that organization A outperforms organization B on all four culture types. However, if respondents have to respond on an ipsative scale, the profile for organization A becomes 27, 26, 20, 27, because the 300 awarded points now have to be substituted by 100 points; the profile for organization B has to be multiplied by 2 to award 100 points. Due to the sum constraint, the culture profiles falsely suggest that organization B has more clan culture than organization A.

Table 5 Free-range scores (upper panel) and corresponding ipsative scores (lower panel) show that ipsative scores cannot be used normatively, for details see text

Parts of ipsative data can only be interpreted meaningfully relative to other parts in the profile. Interpretation can be in terms of dominance (e.g., “The amount of market culture is larger than the amount of adhocracy culture in organization B”), or in terms of ratios (e.g., “The amount of market culture is two times larger than the amount of adhocracy culture in organization B”). First, note that the interpretation in terms of ratios is stronger than the interpretation in terms of dominance. Second, note that both interpretations are also possible for the non-ipsative data.

Aitchison (2003, p. 2), also see Pawlowsky-Glahn et al. (2007), noted that three basic conditions should be fulfilled for any data-analysis method for ipsative data: (1) Scale invariance; (2) Permutation invariance; and (3) Subcompositional coherence:

  1. (1)

    Scale invariance means that “statistical inferences about compositional data should not depend upon the scale used” (Butler et al. 2005, p. 4). So, analyzing compositions \(\mathbf{X}=\left( {X_1 ,X_2 ,...,X_D } \right) \) and \(c\mathbf{X}=\left( {cX_1 ,cX_2 ,...,cX_D} \right) \) yields the same results for all \(c>0\). Ipsative data are scale invariant. For example, a particular respondent may have scores 40, 40, 10, and 10 on a particular OCAI item, for clan, adhocracy, market, and hierarchy, respectively. These scores can be reported on another scale, for example in proportions (0.40, 0.40, 0.10, and 0.10) (i.e., c = 0.01) without any loss of information. The scale of ipsative scores can be changed by multiplying the scores by a positive constant. A statistical method should be devised such that the scale of the ipsative scores does not affect the results of the statistical analysis. The most natural way to do so is to base the statistical analysis on ratios of the ipsative scores because ratios are the same irrespective of the scale of the scores. For example, for the above-mentioned respondent, the ratio of clan and market equals 4, irrespective of the scale. Scale invariance reinforces the idea that ipsative data parts only provide information relative to other parts from the same profile.

  2. (2)

    Permutation invariance means that the analysis gives equivalent results when the order of the parts is changed (Aitchison 1992). For example, if in one analysis, the parts are ordered traditionally (i.e., clan, adhocracy, market, and hierarchy) and in a second analysis, the parts are ordered alphabetically (i.e., adhocracy, clan, hierarchy, market), then both analyses should provide equivalent results. A classic way to get rid of the sum-constraint problem is to remove one of the parts from the data. This classical way is not permutation invariant because data-analysis results often depend on which part has been removed.

  3. (3)

    Subcompositional coherence is the property that results should be the same for components in the full composition as in any subcomposition (e.g., Greenacre 2011). A subcomposition is a profile with one or more parts deleted. For example, a researcher may be interested only in the parts clan and adhocracy and removes parts market, and hierarchy from the data. The vector \(\mathbf{S}_{ij} =(X_{ij1} ,X_{ij2})\), containing respondent i’s scores on clan and adhocracy for item j, is a subcomposition of the individual culture profile. Pawlowsky-Glahn et al. (2007, p. 9) stated about subcompositional coherence that “subcompositions should behave as orthogonal projects in conventional real analysis”. An important aspect of subcompositional coherence is that if one or more parts are removed from the profile, this should not affect the statistical results for the remaining parts. For example, if two researchers use the same data, but researcher 1 is interested in profiles consisting of all four cultures, and researcher 2 is interested in profiles consisting only of clan, adhocracy, and market (and hence removes the part hierarchy from the data), then the results from data analysis with respect to clan, adhocracy, and market should be the same.

In consequence, our suggestions for alternative methods for the statistical analysis and test of ipsative data should meet the above-mentioned conditions.

3 An alternative approach for the analysis of ipsative data

3.1 Alternative statistical methods

The third goal of this paper is to provide and illustrate alternative statistical methods to analyze ipsative measures resulting from the OCAI (Cameron and Quinn 1999, 2006, 2011). The suggested statistical methods are either based on parametric statistical methods such as: (a) The closed geometric mean which is often used in geology (e.g., Aitchison 1982, 1986; Egozcue et al. 2003), or nonparametric methods such as: (b) The nonparametric bootstrap test (e.g., Efron and Tibshirani 1993), and: (c) The permutation test (Fisher 1966; Odén and Wedel 1975). All presented statistical methods fulfill the three conditions for ipsative data analysis: scale invariance, permutation invariance, and subcompositional coherence (Aitchison 2003, p. 2; Pawlowsky-Glahn et al. 2007).

Table 6 Constructing average CDP profiles using arithmetic means (left), and closed geometric means (right), for details see text

Cameron and Quinn (2011) advocated the use of part-wise arithmetic means to compute IFP profiles, \(\mathbf X _{i+} =(X_{i+1} ,X_{i+2} ,X_{i+3} ,X_{i+4} )\); CDP profiles, \(\mathbf X _{+j} =(X_{+j1} ,X_{+j2} ,X_{+j3}, X_{+j4} )\); and the CFP profile, \(\mathbf{X}_{++} =(X_{++1} ,X_{++2} ,X_{++3} ,X_{++4} )\), from the IDP profiles \(\mathbf X _{ij} =(X_{ij1} ,X_{ij2} ,X_{ij3} ,X_{ij4} )\). A small example in Table 6, left-hand panel shows that the use of part-wise arithmetic means does not provide correct information on the ratios of the parts, and is not subcompositionally coherent. All profiles and subcompositions are scaled in percentages. In the first IDP profile, \(\mathbf{X}_{1j} = ({40,\;10,\;10,\;40})\), there is four times as much clan culture as there is adhocracy culture, whereas in the second IDP profile, \(\mathbf{X}_{2j} = ({40,\;40,\;10,\;10})\), the amounts of clan and adhocracy cultures are equivalent. So, on average, there is twice as much clan culture than adhocracy culture. Taking the arithmetic means of the two IDP profiles (Table 6, top left) produces the average IDP profile, \(\mathbf{X}_{+j} = ({40,\;25,\;10,\;25})\). So, there is 40 % for clan and 25 % for adhocracy. These percentages do not indicate that, on average, the ratio of clan and adhocracy is 2:1. Furthermore, if we only considered clan and adhocracy, then the subcompositions would be \(\mathbf{S}_{1j} = ({80,\;20})\) and \(\mathbf{S}_{2j} = ({50,\;50})\), respectively (Table 6, bottom left). Taking the part-wise arithmetic means produces \(\mathbf{S}_{+j} = ({65,\;35})\). The clan–adhocracy ratio obtained with the subcomposition (i.e., \(\frac{65}{35}=1.86)\) differs from the ratio obtained with the four-part profile (i.e., \(\frac{40}{25}=1.6)\). Hence, leaving out parts affected the results, and averaging profiles by taking the part-wise arithmetic means is not subcompositionally coherent.

  1. (a)

    Closed geometric mean Whereas the arithmetic mean is the natural center for unconstrained data, the closed geometric mean is the natural center for ipsative data (Aitchison 1997). Therefore, we advocate the closed geometric mean as a better alternative to the arithmetic mean, because it preserves subcompositional coherence. Let a center dot indicate the geometric mean. Using part-wise geometric means, rather than part-wise arithmetic means, for averaging profiles produces IFP profiles \(\mathbf{X}_{i\cdot } =(X_{i\cdot 1} ,X_{i\cdot 2} ,X_{i\cdot 3} ,X_{i\cdot 4} )\), where \(X_{i\cdot d} =\root 6 \of {\prod _{j=1}^6 X_{ijd} }\), CDP profiles \(\mathbf X _{\cdot j} =(X_{\cdot j1} ,X_{\cdot j2} ,X_{\cdot j3} ,X_{\cdot j4} )\), where \(X_{\cdot jd} =\root n \of {\prod _{i=1}^n X_{ijd} }\), and CFP profiles \(\mathbf{X}_{\cdot \cdot } =(X_{\cdot \cdot 1} ,X_{\cdot \cdot 2} ,X_{\cdot \cdot 3} ,X_{\cdot \cdot 4} ),\; X_{\cdot \cdot d} =\root n \of {\prod _{i=1}^n X_{i\cdot d} }\). To facilitate interpretation, the constructed profiles should be rescaled such as to add up to the same value of the individual culture profiles (also known as closing, hence the term closed geometric means). A closed profile \(\mathbf{X}\) is denoted as \(C (\mathbf{X})\). Table 6 (right-hand panel) shows an example. To compute the part-wise geometric means of the two IDP profiles, consider the part adhocracy. The geometric mean of the two values 10 and 40 equals \(\root 2 \of {10\times 40}=20\). The resulting vector \(\mathbf{{X}}_{.j} =( {40,20,10,20})\) adds up to 90 instead of 100, so all parts are multiplied by \(c=\frac{10}{9}\), to scale them back to percentages, yielding \(C(\mathbf{X}_{.j})= ({44.4,22.2,11.1,22.2})\). If the geometric mean procedure is applied to IDP profiles that consist of clan and adhocracy only (Table 6, bottom-right panel), there is no effect of using a subcomposition because in the resulting vector \(C( {\mathbf{S}}_{.j})=(66.7,33.3)\), adhocracy is also twice as much as clan. Computation and interpretation become problematic for data containing zeros because ratios are either 0 or infinity, and the geometric mean is zero by definition. In the context of the OCAI questionnaire, if a part has a score of zero, then the respondent did not see any traces of the culture in his or her organization. To circumvent the problem we have replaced profiles that contained \(z>0\) zeros as follows: Part \(X_{ijd} \) was replaced by \(\delta \) if \(X_{ijd} =0\), and by \(\left( {1-z\delta } \right) X_{ijd} \), if \(X_{ijd} >0\) (see Martín-Fernandez et al. 2003). We set \(\delta =0.5\), halfway between score 0 the total absence of a culture and score 1 the smallest score that acknowledges the existence of a culture.

  2. (b)

    Nonparametric bootstrap test We advocate the nonparametric bootstrap test (e.g., Efron and Tibshirani 1993) to construct 95 % confidence intervals for the parts of the CDP profiles and CFP profiles, in the following way. First, we drew 2000 nonparametric bootstrap samples of size \(N\) from the data, yielding 2000 sets of \(N\) individual profiles. Second, we computed the CDP (or CFP) profile for each of the bootstrap samples with the closed geometric means, yielding 2000 bootstrap CDP (or CFP) profiles. Third, for each bootstrap CDP (or CFP) profile, we computed the Aitchison Distance (Aitchison 1986, p. 193) between the bootstrap CDP (or CFP) profile and the bootstrap CDP (or CFP) profile obtained from the data, resulting in 2000 distance measures. Let \(\mathbf X _{\cdot j}^*\) be a bootstrap CDP profile, then the Aitchison distance between \(\mathbf{X}_{\cdot j}^*\) and \(\mathbf{X}_{\cdot j} \) is defined as

    $$\begin{aligned} d_A \left( \mathbf{X _{\cdot j}^*,\mathbf X _{\cdot j} } \right) =\sqrt{\frac{1}{2D}\mathop \sum \limits _{d=1}^D \mathop \sum \limits _{e=1}^D \left( {\ln \frac{X_{\cdot jd} }{X_{\cdot je} }-\ln \frac{X_{\cdot jd}^*}{X_{\cdot je}^*}} \right) ^{2}.} \end{aligned}$$
    (1)

Fourth, we deleted the CDP (or CFP) profiles pertaining to the 5 % largest Aitchison distances, which can be regarded as the 5 % most extreme CDP (or CFP) profiles. Finally, the highest and lowest values for each component of the remaining 95 % of the bootstrap mean CDPs or CFPs determined the upper and lower bounds of each confidence interval, respectively. The 95 % confidence intervals were visualized by light-gray envelopes around the profiles.

  1. (c)

    Permutation test There was no statistical test readily available to test whether current and preferred CDP/ CFP profiles are the same. Therefore, we devised a permutation test (e.g., Welch 1990) on the basis of Aitchison Distances to test the null hypothesis that the current and preferred culture profiles in a group are equivalent. The testing procedure consists of three steps. First, we computed the original Aitchison distance, the Aitchison distance (Eq. 1) between the current and preferred CDP profiles. Second, we constructed 1,000 pairs of permutation profiles. Each pair of permutation profiles was constructed as follows: We randomly assigned each respondent’s current and preferred IDP profiles to either condition 1 or condition 2; then we computed the average profile (using the closed part-wise geometric mean) in both conditions. These two average profiles constitute the pair of permutation profiles. Third, for each pair of permutation profiles, we computed the Aitchison Distance between the two profiles, resulting in 1,000 Aitchison distances. If at least 95 % of these Aitchison Distances were smaller than the original Aitchison distance, the null hypothesis was rejected. Using a similar rationale, we devised a permutation test to examine the null hypothesis that the current and preferred CFP profiles are equivalent.

The R-software package ‘compositions’ (Van den Boogaart 2005; Boogaart and Tolosana-Delgado 2008; Van den Boogaart et al. 2013) was used to compute the alternative current, and alternative preferred CFPs. Also, the nonparametric bootstrap test and the permutation test were programmed in R (R Development Core Team 2007). The syntax is available from the first author.

3.2 Illustration

In order to illustrate the alternative approach for the statistical analysis of ipsative data, an already exising dataset was used. The data were collected in April/ May 2010 for an empirical study about the development of new values in a transnational company following a merger (Çeliksöz et al. 2010). Respondents were asked to fill out a questionnaire that included—among other things—the standardized OCAI. All 3,600 employees of the transnational company were invited to fill out the questionnaire with web-based software (Globalpark 2010; Unipark 2013); 661 employees (18.3 %) provided valid responses to the items about organizational culture in the survey. In order to strictly focus on the analysis of actual data, and not on corrections for missing data, only those 661 respondents were selected. In this study, we will refer to the scores of these 661 respondents as the ‘total dataset’. Also, we selected the respondents of Country 8 as a minimal, but meaningful subset (N = 20). Figure 2 shows both the current and the preferred CFP profiles for the total sample (left-hand panel) and the sample from Country 8 (right-hand panel). In the figure, the results of the original Cameron and Quinn (1999, 2006, 2011) method using part-wise arithmetic means (top-hand panel), and the results of the alternative method using closed part-wise geometric means (bottom-hand panel) are shown.

Fig. 2
figure 2

Current CFP profiles and preferred CFP profiles for the total sample (left; N = 661), and for the subsample from Country 8 (right; N = 20) Upper original Cameron and Quinn (1999, 2006, 2011) method: CFP profiles obtained using part-wise arithmetic means Lower alternative method: CFP profiles obtained using closed part-wise geometric means, and 95 % bootstrap confidence envelopes. Note

figure a
;
figure b

Figure 2 shows that the profiles resulting from the two methods are highly comparable (relative emphasis on market culture in the current situation and a small emphasis on clan culture in the preferred situation), but the parts are slightly different for both the total sample (orginal method: current market culture = 28.0; preferred clan culture = 29.6; alternative method current market culture = 28.3; preferred clan culture = 30.3), and for Country 8 (original method: current market culture = 31.3; preferred clan culture = 27.1; alternative method current market culture = 32.9; preferred clan culture = 27.1). Also, the CFP profiles for the total sample and for the sample from Country 8 are very similar in shape, for both methods. Figure 2 also shows 95 % bootstrap confidence intervals, where the old method never did. For the total sample, the 95 % bootstrap confidence intervals are rather small, so the estimated CFP profiles are probably close to the population values. Also, the intervals do not overlap: The Aitchison Distance computed from the total sample was greater than all permutation distances; hence, the null hypothesis was rejected \((\textit{p} < 0.001)\). This suggests strong evidence that current CFP profiles and preferred CFP profiles are not equivalent. For Country 8, the 95 % bootstrap confidence intervals are much larger and do overlap, which suggests that the current and preferred cultures are equivalent. For example, the part current clan culture was estimated at 19.5, but due to sample fluctuations, the population value is probably somewhere in the range [15.1–24.2], while the preferred clan culture was estimated at 27.1, but due to sample fluctuations, the population value is probably somewhere in the range [22.0–32.0]. We advocate interpreting parts as unequal only if the confidence envelopes do not show any overlap, as is the case in Fig. 2, bottom/left-hand panel (total sample).

4 Conclusions

This paper is about ipsative measurement and the analysis of organizational values, especially concerning the CVF framework (Cameron and Quinn 1999, 2006, 2011). Although Cameron and Quinn do not seem to acknowledge this in their 1999/2006/2011 publications, for ipsative measures resulting from the OCAI, the calculation of arithmetic averages over respondents formally is not allowed. Therefore, any numerical difference between the culture types in current and preferred culture profiles is, on principle, uninterpretable. When using arithmetic means there is neither a meaningful way to compare the profiles, nor a way to find any significant differences between them. In this paper we suggest an alternative way to statistically compute and compare culture profiles by using closed part-wise geometric means, a nonparametric bootstrap test, and a permutation test.

In the original Cameron and Quinn method, arithmetic averages were calculated per respondent (IDP and IFP), and over respondents (CDP and CFP). As has been stated by many authors (Cattell 1944; Hicks 1970; Closs 1976; Fedorak and Coles 1979; McClean and Chissom 1986; Baron 1996; Closs 1996), this operation is not permitted with purely ipsative data. In this paper, we have suggested and illustrated the use of closed part-wise geometric instead of part-wise arithmetic means, as an alternative parametric method to calculate both IDPs, IFPs, CDPs, and CFPs. We have demonstrated the analysis of CFPs using data from the OCAI.

In the original Cameron and Quinn method, any numerical difference between current and preferred IFPs and CFPs had to be interpreted by visual inspection of the profiles only, because parametric tests of significance were not allowed. Also, there was no possibility to calculate meaningful variances. In this paper, we have suggested and illustrated nonparametric alternatives: the use of 95 % bootstrap confidence intervals and Fisher’s permutation test. We have suggested a permutation test based on Aitchison Distances to test the null hypothesis that the current and preferred CFPs are equivalent.

Technically speaking, our presented alternatives for the statistical analyses and testing of ipsative data resulting from the OCAI seem to be adequate. All statistical methods suggested in this paper satisfy the three conditions for ipsative data analysis: scale invariance, permutation invariance, and subcompositional coherence. We also think that there is a high need for this alternative approach, since the CVF framework is very popular among practitioners.

Regarding content, the original and suggested methods may end up with comparable results – especially when the individual culture profiles are rather close to the neutral profile, as was the case in our illustration. But it is a matter of statistical rigor. As was already mentioned in the introduction to this paper, there is a need for an alternative ‘pragmatic empiricist approach’ that is also acceptable for ‘rigorous statisticians’ (Kolb and Kolb 2005, p. 11).

A real added value of our approach is the opportunity to statistically test hypotheses about differences between IFP and CFP profiles (current and preferred cultures).

5 Discussion

Differences between profiles computed with part-wise arithmetic means and profiles computed with closed part-wise geometric means tend to become smaller as the profiles become closer to the neutral profile \(\mathbf X _i =\left( {\frac{1}{D},\frac{1}{D},\ldots ,\frac{1}{D}} \right) \), which equals (25, 25, 25, 25) for four-part profiles. Because in our data the culture profiles were rather close to the neutral profile, we found small differences between profiles computed with closed part-wise geometric means, and profiles computed with part-wise arithmetic means.

Sample fluctuations may affect the estimates of the CFP profiles. Especially, for small samples such as the sample from Country 8, the profiles need not be very stable, but stability of the culture profiles has generally not been reported in studies that used the OCAI.

The use of ipsative measurement in social science research remains controversial; complex statistical analyses, problems with the correct understanding of the results, and complaints by respondents that it is very hard to fill out questionnaires containing forced-choice items are well-known dilemmas. In this paper, we have modestly tried to start solving the problem of the statistical analysis. In our view, this is an essential first step to put an end to the long-lasting controversy about the use of ipsative measures in scientific research.

With respect to the correct understanding of results, as far as we know, even the most ingenious statistical methods cannot solve the limitation that ipsative data only allow a relative interpretation. In case of the CVF framework, this is an interpretation in terms of ratios between the different culture types. We believe that these ratios can be interpreted meaningfully.

However, it should be emphasized that analysing pure ipsative data only allows for intra-individual—not inter-individual—comparisons (Cattell 1944; Hicks 1970; Closs 1976; Fedorak and Coles 1979; McClean and Chissom 1986; Baron 1996; Closs 1996). In case of the CVF framework, this condition means that comparisons between current and preferred CFP profiles within the same organization indeed are allowed, yet comparisons of the CFP profiles of different organizations are not permitted. Future research might tackle this subject, in considerably more detail.

Finally, the relative difficulty respondents experience when filling out forced-choice items remains to be an unassailable hurdle in practice: the low response rate of the research that was used as an illustration in this paper, is no exception. Ipsative measures seem to be exclusively applicable to higher-educated employees.

In our view, the use of ipsative measurement is limited. Only when the above-mentioned shortcomings are accepted, the use of ipsative measurement can have clear advantages in the domain of organizational culture and values. As Meglino and Ravlin (1998) stated, it forces people to make explicit choices, and by doing so, it might better capture their ‘true values’. Also, as was mentioned earlier in this paper, acquiescence responding, halo effects, faking good or impression management, and social desirability bias may be better controlled by using ipsative rather than normative methods (Cunningham et al. 1977; McClean and Chissom 1986; Gurwitz 1987; Cheung and Chan 2002; Bowen et al. 2002; Cheung 2006).

This paper ultimately was aimed to provide and illustrate an alternative approach to the analysis and test of ipsative measures resulting from the OCAI (Cameron and Quinn 1999, 2006, 2011). Further research might explore whether or not our proposed methods are suitable for the analysis and test of ipsative data in other theoretical domains and application areas.