Introduction

Orthorexia nervosa was first described by Steven Bratman [1] as a pathological obsession with the consumption of foods an individual deems to be ‘healthy’. Since its initial reporting, while there continues to be regular media reporting regarding orthorexia (e.g., Washington Post, Guardian, Triple J) [24], the research literature relating to the condition is limited. Reflecting the dearth of knowledge regarding orthorexia, it has no formal diagnosis in the Diagnostic and Statistical Manual of Mental Disorders (5th ed.) [5], nor is there universally accepted criteria or definition for its diagnosis, although several have been suggested [6,8,9,9].

To assess for the risk of orthorexia, Bratman and Knight [7] developed a test (commonly referred to as the “Bratman Test”) that contained 10 questions, such as ‘Do you spend more than 3 h a day thinking about healthy food?’ answered in a yes/no response format. As reported by Dunn and Bratman, the Bratman test designed as a screening tool and is not underpinned by any psychometric testing, suggested cut-offs or validated against reference groups. In an effort to address some of these limitations, Donini et al. [10] developed the ORTO-15.

The ORTO-15 consists of 6 (out of the 10) items from the Bratman Test and an additional nine items such as ‘Do you think that the conviction to eat healthy food increases self-esteem?’. The yes/no response format was changed to a four-point version (always, often, sometimes, never). The ORTO-15 was evaluated using 525 participants randomly allocated into two samples (n = 404). Both samples completed the ORTO-15 and Scale 7 ‘Psychasthenia’ of the Minnesota Multiphasic Personality Inventory (MMPI) [11] to assess obsessive–compulsive symptoms.

The initial analyses involved splitting the 404 participants into 4 groups based upon scores on the ORTO-15 and Scale 7 of the MMPI: normal eating behaviour and non-clinical Scale 7 score; normal eating behaviour and clinical Scale 7 score; “healthy” eating behaviour (lower 25th percentile scores on the ORTO-15) and normal and clinical Scale 7 score; ‘Orthorexia’ group being “healthy” eating behaviour (lower 25th percentile scores on the ORTO-15) and clinical Scale 7 score. The authors identified, the differences in ORTO-15 scores between groups (lower 25th versus 26th percentile and above) were significant and based on this tested three ORTO cut-off scores (< 35, < 40, <45), of which the 35th cut-off had the highest specificity (94.2%) and high negative predictive value (91.1%). Using the second set of participants (n = 121), to test < 35 and < 40 cut-off scores, Donini et al. concluded that the most appropriate cut-off was 40 as it had strong predictive values (efficacy 75%, sensitivity 100%, specificity 73.6%, positive and negative predictive values of 17.6 and 100%, respectively).

Since the development of the original Italian ORTO-15, it has been translated into several languages, including German [12], Hungarian [13], Portuguese [14, 15], Polish [16,18,18], Spanish [19, 20] and Turkish [21,23,23], for a full review see Dunn and Bratman [8]. Further scale development, including confirmatory factor analysis (CFA) based on the original ORTO-15 has been conducted with several studies resulting in 11-item [13, 23] or 9-item versions [12, 18], although it should be noted that across both 9 and 11-item ORTO scale versions, each study produced different factor structures with only items 3, 4, 7, 10, 11, and 12 being found to be used across all versions. Despite the ORTO-15 and its derivatives being well-used In Europe [10, 13, 17, 18, 20,22,22, 24,26,27,28,29,29], only a handful of studies have utilised the ORTO in predominantly English-speaking countries [30,32,33,34,34] and further validation of the ORTO-15 in predominately English-speaking cohorts is limited. One study evaluated the ORTO-15 in a U.S. college sample and found poor internal consistency (Cronbach’s α = 0.14) and a two-factor structure reflecting eating concerns/worry, and the perceived benefits of healthy eating [30]. Another study, also based on a U.S. College sample, reported better but still poor internal consistency (Cronbach’s α = 0.62) [31]. Beyond its psychometric properties, the ORTO’s clinical utility is also questionable, given that prevalence rates are often reported in the 30–70% range [8].

As identified by Donini et al. [10], the ORTO-15 was developed on the proposition that orthorexia is a condition characterised by an obsessive personality, excessively focused upon engaging in healthy eating habits. Several studies have identified that ORTO scores are strongly predicted by obsessions [e.g., Obsessive Compulsive Inventory Revised (OCI-R); Maudsley Obsessive–Compulsive Inventory (MOCI), Yale-Brown-Cornell Eating Disorder Scale (YBC-EDS)] [23, 27, 28, 32] and engagement in dysfunctional eating patterns [e.g., Eating Attitudes test (EAT-26 or EAT-40 item versions)] [23, 25, 27, 28, 35]. Given the definition of orthorexia and the findings that obsessive compulsive symptoms and dysfunctional eating patterns are strong predictors of orthorexia scores, and that both the EAT-26 and OCI-R are well-developed and clinically validated, the utilisation of these scales to classify orthorexic versus non-orthorexic individuals is likely to be more accurate than using an obsession scale alone.

The aim of this study was to conduct a series of confirmatory factor analyses on the 15-, 11- and 9-item versions of the ORTO and recommend cut-off scores in an English-speaking cohort.

Method

Participants Five hundred and eighty-five participants completed the online survey. The study sample consists of 103 (17.6%) men and 482 women (82.4%). The mean age for females and male participants were 35.24 (SD = 11.52) and 32.14 (10.45), respectively. Twenty-nine-point seven percent were single, 49.7% worked full-time, and 34.9% reported being students. The average body mass index (BMI) for this sample was 24.42 (SD = 5.1) suggesting sample fell in the normal range (for more details see Table 1).

Table 1 Sample characteristics

Measures

ORTO-15

Orthorexia severity was measured using the ORTO-15 [10]. The ORTO-15 is a self-report, multiple-choice questionnaire consisting of 15 items (e.g., Are your eating choices conditioned by your worry about your health status?), that measure orthorexia on a 4-point Likert scale. As identified by Donini et al. [10] the original ORTO-15 items 3, 4, 6, 7, 10, 11, 12, 14, 15 were scored 1 = always, 2 = often, 3 = sometimes, and 4 = never, items 2,5,8,9 were reversed scored, and two items (1 and 13) were scored 2 = always, 4 = often, 3 = sometimes, and 1 = never. In a personal communication L. Donini identified the rationale for the scoring of items 1 and 13 were based on the belief that ‘central’ answers were more consistent with an orthorexic attitude (personal communication, November 22, 2017). In addition to the original scoring method for the ORTO-15, alternate corrected scoring (CS) versions were created using the standardised scoring procedure (i.e., 1 = always, 2 = often, 3 = sometimes, and 4 = never), these are referred to as ORTO-15CS, ORTO-11CS and ORTO-9CS. Final scores are obtained by summing the scores for the 15 items, with lower scores reflects a greater degree of orthorexia. To extend upon the original study, the current study will derive cut-offs for defining orthorexia based upon two criteria: (1) participants having a score of 20 and over on the EAT-26, indicating being ‘at risk’ of an eating disorder [36], and (2) participants having a score of 21 and over on the OCI-R [37] which is indicative of an individual likely to have obsessive compulsive disorder (OCD).

Obsessive–Compulsive Inventory-Revised (OCI-R)

OCD was measured using the OCI-R [37]. The OCI-R is an 18-item self-report questionnaire for assessing OCD symptoms. Each item (e.g., I am upset by unpleasant thoughts that come into my mind against my will) uses a 5-point Likert scale response format (0 = not at all, 4 = extremely), where higher scores indicate a greater degree of OCD symptoms. A final score is obtained by summing scores for each item. Higher scores indicate greater OCD symptoms. Scores of 21 or more indicate the probable OCD.

Eating Attitudes Test (EAT-26)

Dysfunctional eating patterns were measured using the EAT-26 [36]. The EAT-26 is a widely used measure of dysfunctional eating, consisting of 26 items (e.g., I am terrified about being overweight) that comprise three subscales: Dieting, Bulimia and Food Preoccupation, and Oral Control. Each item uses a six-point Likert scale response format (always = 3, usually = 2, often = 1, sometimes = 0, rarely = 0, never = 0), where a higher score indicates a response that reflects a greater degree of dysfunctional eating. Final scores are obtained by summing responses from each item, with higher scores reflecting greater dysfunctional eating. Scores of 20 or more indicate the potential risk of an eating disorder.

Procedure

Participants were recruited via two methods. The study was first posted on an online portal that allowed university students to voluntarily participate. Links to the study were also posted on social media pages, and by various Australian news-media publications. Individuals were asked to complete online questionnaire survey via a URL where they were informed that their participation in this study was voluntary and that they were free to withdraw at any time. Consent was implied by participants’ decision to complete the survey after being presented with informed consent information. Ethical approval for the study was obtained by the Swinburne University Human Research Ethics Committee (SUHREC).

Statistical analysis

A Confirmatory factor analysis was used (CFA) to see whether the well-defined ORTO-9, ORTO-11 and ORTO-15 model fit our data. Goodness of fit statistics such as (GFI), Tucker-Lewis index (TLI), Comparative Fit Index (CFI), and the root-mean-square error of approximation (RMSEA) were obtained. For an acceptable model the goodness of statistics must satisfy the following criterion suggested by Byrne [38], Goodness of Fit Index (GFI) > 0.90, CFI, 0.90, TLI > 0.90 and RMSEA < 0.08.

To identify an optimal cut-off value for predicting probable orthorexia, a binary logistic regression model was fitted to the sum of the selected Orto items scores with dysfunctional eating category as the binary predictor variable. Based on this model, predicted probability of probable orthorexia or not was computed for each participant. A receiver-operating-characteristics (ROC) was then obtained and sensitivity (probability of correctly classify an individual is in orthorexia status) and one-specificity (probability of correctly classify an individual is in free of orthorexia status) were computed. The total area under the curve ROC is a measure of the overall performance of a diagnostic test. A value of 0.5 under the ROC curve indicates that the variable performs no better than chance while a value of 1.0 indicates perfect discrimination [39]. Figure 1 shows, the area under the ROC was 0.96, a near perfect discrimination.

Fig. 1
figure 1

ROC space

Results

As shown in Table 2, based upon the original recommend scoring ORTO-15 (see Table scoring grid, [10]), the ORTO-15 was found to have an inadequate fit and poor Cronbach α. Prior further CFA, inter items correlation were undertaken with items 5 and 8 were reversed to be consistent with the other items in the scale and yielded a Cronbach α of 0.82. As found with the original Ortho-15, the ORTO-15CS, ORTO-11CS and ORTO-9CS versions also had poor model fits. Consequently, a fourth model was developed using a combination of exploratory factor analysis (EFA) and CFA.

Table 2 Goodness of fit statistics for the well-defined ORTO-15, ORTO-11 ORTO-9 and ORTO-7 using confirmatory factor analysis

Development of the ORTO-7 Data were randomly assigned into two samples and the first sample was used for an exploratory factor analysis (EFA) (n = 295) using the principal axis factoring method with an oblimin rotation. The oblimin rotation method was used as any latent factors were expected to be correlated rather than orthogonal. The second sample (n = 290) was used for a CFA using maximum likelihood estimation. The initial eigenvalues showed that the first factor explained 46% of the variance, and the second factor 11% of the variance. However, the pattern matrix and the Scree plot for the initial eigenvalues suggested that only a single factor was appropriate. Horn’s Monte Carlo simulation method with 1000 iterations [40] confirmed this decision. Six items “ORTO2-When you go in a food shop do you feel confused?”, “ORTO5-Is the taste of food more important than the quality when you evaluate food?”, “ORTO6-Are you willing to spend more money to have healthier food?”, “ORTO8-Do you allow yourself any eating transgressions”, “ORTO14-Do you think that on the market there is also unhealthy food?”, and “ORTO15-At present, are you alone when having meals?” were removed due to communalities below 0.2 or cross-loading. CFA using the second sample suggested two items within the remaining items “ORTO10-Do you think that the conviction to eat only healthy food increases self-esteem?”, and “ORTO12-Do you think that consuming healthy food may change your appearance?” were redundant and were removed. The strong and stable structure for the seven-items single-factor model was good, see Table 3 for loadings. Following the EFAs, Cronbach’s α for the scales were calculated to assess internal consistency and were revealed to be α = 0.83 indicating that the scale demonstrated acceptable internal consistency.

Table 3 Standardised factor loadings for the ORTO-7 factor model

Development ORTO-7 cut-off scores EFA sample (n = 295) was reused to derive a cut-off value to diagnosis orthorexia status using a binary logistic liner regression and receiver-operating-characteristics (ROC). The obtained cut-off value then validated using the CFA sample (n = 290).

A 2 (dysfunctional eating high/low with high scores being > 19 on the EAT REF) by 2 (OCD symptoms high/low with high scores being > 20 on the OCI-R REF) factorial analysis of variance indicated that there was no interaction between dysfunctional eating and OCD categories (F(1581) = 0.70, p = 0.404, partial η2 = 0.01). However, there was a significant difference in mean orthorexia score between dysfunctional eating category (F(1581) = 338.37, p < 0.001, partial η2 = 0.37) as well as OCD category (F(1581) = 7.97, p = 0.005, partial η2 = 0.01). As the partial effect for the OCD category was minimal compared to the dysfunctional eating category, it suggests that that OCD was not useful in determining a cut-off value to define orthorexia.

According to the Westin classification [39], Fig. 1 shows, the area under the ROC was 0.96, a near perfect discrimination.

Table 4 shows the sensitivity, specificity positive predictive and negative predictive values at various orthorexia cut-off values from 17 to 21. When orthorexia cut-off value increases, the sensitivity also increases but the specificity decreases, therefore, a trade-off between sensitivity and specificity should be considered, by taking into account the positive predictive and negative predictive values. Table 4 shows the preferred optimal value to diagnosis the orthorexia would be 19. To confirm the selection, the same cut-off values were applied to the validation sample. The results were also confirmed that the preferred appropriate cut-off value for probable orthorexia based on the ORTO-7 would be 19. This value suggests that individuals who self-report having a current eating disorder have a 98% of chance of having probable orthorexia status. A Chi-square test of independence was performed to examine the association of probable orthorexia (yes, n = 199 /no, n = 386) with demographics variables outlined in Table 1. Significant differences across group were found for gender [χ2(1) = 26.47, p < 0.001), current eating disorder [χ2(1) = 41.07, p < 0.001], and dieting [χ2(1) = 7.07, p = 0.008]. These results indicate that individual identifying themselves as female, or current eating disorder, or currently dieting were significantly more likely to be in the probably orthorexia group. Table 5 shows more female (38.6%) participants were identified as orthorexia while 11.2% of male participants were identified as orthorexia, and 87.1% of participants having current eating disorder and 54.1% of participants on diet were also identified as orthorexia.

Table 4 Sensitivity, specificity, predictive values for different cut-off values of ORTO-7
Table 5 Characteristics of the orthorexic subjects compared with the demographic variables

Discussion

The growing interest in orthorexia both in terms of its characteristics and outcomes demands that research tools used explore these factors are psychometrically valid. To date, while the ORTO-15 and its derivatives (ORTO-11, ORTO-9) have been used in multiple studies [12,14,14, 20, 25, 26, 29, 34, 35], evidence of the psychometric validity of them measures in English-speaking contexts is lacking. Only two studies have examined the psychometric properties of the ORTO-15 within an English-speaking sample [30,32,33,34,34], with each demonstrating the ORTO-15 had poor internal consistency. Given this, the aim of the current study was to conduct a series of CFA on the 15-, 11- and 9-item versions of the ORTO and recommend cut-off scores in an English-speaking cohort.

A series of CFA identified that all three (15-, 11- and 9-item) reported versions of the ORTO did not produce acceptable models. Consequently, the development of suggested cut-off scores to provide an indication of probable Orthorexia was not able to be undertaken. Although based on a single large sample size, these findings suggest that the use of either of the 15-, 11- and 9-item versions of the ORTO should be used with caution unless validated prior using CFA to ensure structure validity.

Based on an EFA and CFA using separate independent samples a seven-item version of the ORTO was developed (ORTO-7). The ORTHO-7 was found to have a strong and stable factor structure. To evaluate potential cut-off scores, consistent with the original validation publication, a scale assessing obsessive–compulsiveness and dysfunctional eating were utilised. The results imitated that while there was a significant difference on ORTO-7 mean score between dysfunctional eating and OCD categories, the OCD category difference was minimal and not determined to be statistically helpful in deriving cut-off values. Based on an ROC analyses, scores equal to 19 or more are likely to represent probable orthorexia. This score represented the optimal balance between sensitivity and specificity as well as predictive positive values and negative predictive values across both the study and validation samples. This study finds that individuals who self-report having a current eating disorder also have a 98% of chance of also having probable orthorexia, as based on the ORTO-7 cut-off. While in contrast, those identified as engaging in in dysfunctional eating patterns (EAT scores 20 or above) had only 11% of chance of being identified as having probable orthorexia. These findings suggest that there is a strong overlap between currently established eating disorders and orthorexia symptomology. Given this, further research is needed to establish whether orthorexia is in fact eating disorder or a subtype or prodromal expression of established eating disorders.

While this is the first large study using an English-speaking cohort to explore and validate a measure of ORTO, based on the original ORTO-15, it is not without limitations. As the study was based on an online self-report cohort, sample-selection bias (e.g., individual with an eating-based concerns) could not be eliminated. Determining cut-off values by optimizing either test sensitivity alone or test specificity alone will do so at the expense of the other parameter. Such methods also fail to examine the cost incurred from misdiagnoses or the effect of orthorexia prevalence on the frequency of false-positive or false-negative test results. This is especially pertinent, given that previous research has highlighted that the ORTO is sensitive to health-oriented eating behaviours without distinguishing whether these behaviours are inherently pathological or clinically significant (i.e., causing interpersonal distress, health issues, or disruption to daily living; [8, 34]). Distinguishing between clinically significant and benign orthorexic symptoms is essential to establishing accurate prevalence rates and may underpin the high prevalence figures that are typically reported [8]. Given the recent publication of suggested diagnostic criteria for Orthorexia [8] researchers should explore the potential predictive ability of the ORTO-7 against the Orthorexia diagnostic criteria. As this study was based on one dataset, future research is needed to evaluate the reliability of the recommended ORTO-7 factor structure. Further validation, including predictive/discriminate validity across eating disorder, disordered eating and non-disordered eating groups is also warranted.

In conclusion, the ORTO-15 and its derivatives (11- and 9-item) were not found to be psychometrical strong in a large English-speaking cohort. A subsequent seven-item version derived from the original ORTO-15 using a combination of EFA and CFA was found to have a strong, stable factor structure. A cut-off scores equal to 19 or more is likely to represent probable orthorexia. Future research should replicate the current findings and explore the potential difference on cut-off scores against diagnostic criteria.