Introduction

Alexithymia is a well reported disturbance that affects the way individuals experience and express their emotions. It is characterized by a marked difficulty in consciously experiencing, identifying, and describing emotions, as well as reduced introspection. This multifaceted construct is currently accepted as a deficit in the cognitive processing of emotions with two major components: an affective component, which encompasses reduced emotional awareness, and a cognitive component, which encompasses a concrete and reality-based thinking style (Taylor, Bagby, & Parker, 1997). Alexithymia interferes with interpersonal relationships, as individuals exhibit deficits in understanding and relating to the feelings of others, in addition to their own. However, the consequences of this emotional deficit extend beyond interpersonal difficulties. The presence of alexithymia has been considered both to have a harmful effect on health and to play a key role in the expression of various psychosomatic and mental pathologies (De Gucht & Heiser, 2003; Taylor et al., 1997).

Currently, the most widely used method to assess alexithymia is the 20-item Toronto Alexithymia Scale (TAS-20), which was developed by Bagby, Parker, and Taylor (1994). The English language version has a three-factor structure that corresponds to the theoretical construct of alexithymia: difficulty identifying feelings (DIF), difficulty describing feelings (DDF), and externally oriented thinking (EOT). DIF refers to problems distinguishing between emotions and bodily sensations, as well as difficulties distinguishing between different emotions. DDF concerns the inability to verbally express emotions to others. EOT denotes an impoverished imaginative life marked by concrete and poor introspective thinking.

The TAS-20 has been cross-validated in different languages and cultures (cf. Taylor, Bagby, & Parker, 2003) and the three-factor structure has been replicated often, though support has not been universal. There are several studies which have failed to replicate this factor solution, suggesting that there might be alternative factor structures. These findings limit the understanding of whether the factorial structure of the TAS-20 varies across samples and, in particular, if patient populations have a different factor structure than non-patient populations. This issue is of major interest because the TAS-20 is widely used in clinical research and practice, though its factorial validity is not well-established in medical and psychological disorders (Kooiman, Spinhoven, & Trijsburg, 2002).

Examining the construct validity of the TAS-20 has an increased significance in the particular context of anorexia nervosa (AN). First, because alexithymia is a core symptom in this disorder and has been implicated in the severity of eating symptoms (Courty, Godart, Lalanne, & Berthoz, 2015). Being considered a negative prognostic indicator (Speranza, Loas, Guilbaud, & Corcos, 2011), it is assumed that assessing alexithymia can help the clinicians to scrutinize emotion regulation deficits and determine a better-suited treatment (Lumley, Neely, & Burger, 2007; Speranza et al., 2011). In addition, evidence for the validity of the TAS-20 has a potential impact on improving treatment by making it possible to determine how far each alexithymia dimension might be linked to the severity of eating symptoms and other clinical outcomes. It is noteworthy that alexithymia dimensions can act differently when compared with the measure as a whole (Taylor et al., 1997). Finally, we do not know if emotional avoidance and control that typically characterizes AN patients might influence the ability to self-assess the different facets of alexithymia (Oldershaw, Lavender, Sallis, Stahl, & Schmidt, 2015), thus giving inaccurate self-report.

Critical Examination of Factor Analytic Studies on the TAS-20 in Clinical Samples

The literature on factorial validity of the TAS-20 in specific populations with medical conditions or psychiatric problems (hereinafter referred to as clinical samples) is increasing in recent years. However, these studies have not been summarized and critically examined in a manner that allows us to draw conclusions and make recommendations for future research. Here we update findings of the factor analytic studies of the TAS-20, last done formally in 2003 (Taylor et al., 2003); and consider only those studies which used clinical samples (alone or in combination with non-clinical). In addition to this overview, we critically examine some methodological issues that can impact the quality and comparability of the data. More specifically, our aims here are to critically reflect upon the data preparation process, factorial analysis procedures, and study sample characteristics.

The studies using clinical samples are briefly described in the supplementary material (Tables S1 and S2), divided into those which confirm (n = 10) or contradict (n = 10) the three-factor structure proposed by Bagby et al. (1994). The first conclusion to be drawn is related precisely to the balance between these two categories. The relative variety of factor solutions stemmed from these studies makes the original structure questionable in clinical groups. Some studies found a combination of the factors DIF and DDF (Cleland, Magura, Foote, Rosenblum, & Kosanke, 2005; Guillén et al., 2014; Haviland & Reise, 1996; Kooiman et al., 2002) which can be influenced by the fact that communication of feelings to others requires introspection and the ability to identify those feelings in oneself (Bagby et al., 1994). Other researchers found a factor structure where the EOT dimension comprised two (Müller, Bühner, & Ellgring, 2003) or three factors (Pinaquy, Chabrol, & Barbe, 2002). No stable solutions were found by other studies (Koch et al., 2015; Richards, Fortune, Griffiths, & Main, 2005). Moreover, most reported a low internal consistency in EOT factor (α < .70). Thus, the three main questions to be clarified are whether (1) DIF and DDF comprise two independent dimensions, (2) clinical data exhibit the EOT factor, and (3) difficulties in the EOT structure and consistency are related to some problematic items.

At the methodological level, there are some issues that should be highlighted. In terms of data preparation, only 30% (n = 6) of the studies have clearly reported that the TAS-20 had been examined for normality. Generally, the failure to meet the assumption of multivariate normality can lead to an overestimation of the chi-square statistic, being recommended in this case, the use of distribution-free methods (Powell & Schafer, 2001). We also found a lack of information regarding the extent of missing data in 75% (n = 15) of the studies. Not mentioning how missing data were addressed limits the relevance of the findings (Jackson, Gillaspy Jr, & Purc-Stephenson, 2009).

In terms of factorial analysis procedures, studies revealed different configurations varying between the use of exploratory factor analysis (EFA; n = 4), confirmatory factor analysis (CFA; n = 10), or both (n = 6). Within EFA, we found that researchers used the component analysis extraction method (n = 6) and varimax rotation (n = 3), despite the use of these methods being discouraged (Costello & Osborne, 2005). Within CFA, choices have to be made concerning estimation method, fit indices, and cutoff criteria. In this overview we concluded that different cutoff criteria for fit indices were adopted. Interpretation of results can vary greatly depending on the stringency of criteria employed. In addition, only three studies that used CFA compared the proposed three-factor model with alternative solutions published in literature (Koch et al., 2015; Meganck, Vanheule, & Desmet, 2008; Müller et al., 2003), despite the recommendation of including alternative and theoretically plausible models (Jackson et al., 2009).

When looking at the characteristics of the tested samples we observed that several studies used mixed diagnostic groups instead of specific patient groups, reinforcing the perspective that there is a lack of studies that analyze the factor structure of the TAS-20 in selective clinical samples. More research using specific psychiatric diagnoses is needed since emotional instability can vary across psychiatric disorders. Particularly as emotional liability affects awareness and expression of emotions, thus making the most commonly utilized methodology of self-report unsuitable for assessing alexithymia facets (Koch et al., 2015).

Assessment of Alexithymia in Anorexia Nervosa

Alexithymia is a predominant factor in AN (Bourke, Taylor, Parker, & Bagby, 1992; Torres et al., 2015). There is a robust body of literature documenting that alexithymia levels are elevated in individuals with AN compared to healthy controls (for review see Caglar-Nazali et al., 2014; Nowakowski, McFarlane, & Cassin, 2013; Oldershaw et al., 2015; Torres, Guerra, Lencastre, Vieira et al., 2011; Westwood, Kerr-Gaffney, Stahl, & Tchanturia, 2017). The prevalence rates (established cutoff for alexithymic cases: TAS-20 ≥ 61; Taylor et al., 1997) in this eating disorder (ED) are high, reaching up to 50% in the majority of studies (Nowakowski et al., 2013). Based on these findings is the assumption that maladaptive eating behaviors can be used as a strategy to avoid or cope with feelings (e.g., Brockmeyer et al., 2012; Clinton, 2006; Wildes, Ringham, & Marcus, 2010). In fact, research data suggest that ED behaviors developed as a means to control emotional experiences (for review see Haynos & Fruzzetti, 2011), limiting the regulation of emotions through mental processes, as seen in alexithymia (Veríssimo, 2003). This evidence is in line with etiological models, which posit that emotional dysregulation is implicated in AN development (e.g., Hatch et al., 2010; Southgate, Tchanturia, & Treasure, 2005; Treasure, Corfield, & Cardi, 2012).

Alexithymia also appears as a critical factor in the persistence of disorders in AN. Functional abnormalities associated with alexithymia might foster broader emotion processing deficits that are often observed in AN patients. Neuroimaging studies found functional differences in brain systems during the processing of unpleasant body image-related words (Miyake et al., 2009) and negative words concerning interpersonal relationships (Miyake et al., 2012), as a function of alexithymia in AN. Alexithymia proved to be closely linked to social difficulties, playing a key role in the relational isolation of these patients (Courty et al., 2015). Past research in AN also suggested that alexithymia is strongly related to anxiety and depression (Li, Zhang, Guo, & Zhang, 2015; Lulé et al., 2014). Although depression is probably the variable that best accounts for the variance in alexithymia (Parling, Mortazavi, & Ghaderi, 2010), it is not a complete explanation for cognitive–affective disturbances in this ED (Torres et al., 2015). In AN, alexithymia seems to be a trait feature or a consolidated change that is a consequence of the illness. Specifically, no association was found between alexithymia and several state variables such as age, body mass index (BMI), medication status, illness duration, treatment duration (Torres et al., 2015), and weight restoration (Beadle, Paradiso, Salerno, & McCormick, 2013).

This stability in alexithymic characteristics may have implications in AN prognosis including a poor response to psychological treatments, particularly those focusing on insight, emotional awareness, and a close alliance with a therapist (Lumley et al., 2007). Cognitive limitations in emotion regulation may also predispose one for the use of maladaptive eating behaviors in stressful situations, limiting the AN recovery (Speranza, Loas, Wallier, & Corcos, 2007).

Given the critical role of alexithymia in AN development and maintenance, the assessment of this construct is a common procedure in both clinical and research settings. However, we do not know if the TAS-20, as a self-report measure that requires insight, provides valid data in individuals exhibiting avoidance of emotions and poor reflective functioning, as is the case with AN (Oldershaw et al., 2015). By ascertaining the factorial validity of the TAS-20, we will better understand the validity of self-assessment of alexithymia as a multidimensional construct.

Until now, three studies are known to have investigated the validity of the TAS in samples with ED. The first published study was carried out by Troop, Schmidt, and Treasure (1995) with a mixed sample (N = 127) of patients with AN and bulimia nervosa (BN). These authors used a previous version of the instrument—the TAS-26 (Taylor, Ryan, & Bagby, 1985)—and found a four-factor structure: Inability to Identify Feelings, Paucity of Fantasy, Non-communication of Feelings, and Concrete Thinking. In turn, Loas, Braun, Delhaye, and Linkowski (2001) used the most recent version of TAS, the TAS-20, and their results outlined the proposed three-factor model. Two factors, however, revealed low internal consistency: DDF (α = .61) and EOT (α = .56). The clinical sample assessed in this study was composed of a mixed group of patients with ED or addictive behaviors (N = 659). Lastly, Guillén et al., (2014) investigated the performance of the TAS-20 in patients with ED (N = 103) and found a set of 13 clinically interpretable items composed of DID and DDF items, plus item 8 from the EOT subscale, with a one-dimensional structure.

None of these studies conducted factor analysis by ED types. Despite a recently proposed transdiagnostic view of ED based on cross-diagnostic commonalities (Fairburn, Cooper, & Shafran, 2003), there are reasons to support the study of construct validation of the TAS-20 by diagnosis, including: (a) the nascent nature of the transdiagnostic model necessitating further investigation; (b) previous studies having observed differences in emotion processing between individuals with AN and BN (Gilboa-Schechtman, Avnon, Zubery, & Jeczmien, 2006; Pascual, Etxebarria, & Cruz, 2011; Sexton, Sunday, Hurt, & Halmi, 1998); (c) starvation and low BMI may impact emotional functioning (Westwood et al., 2017); and (d) individuals with AN usually seek treatment at younger ages (American Psychiatric Association, 2013), resulting in very different profiles in terms of cognitive and social maturity.

An additional issue to be explored is the factorial validity of the TAS-20 in the light of the new AN diagnostic criteria. The transition from DSM-IV (American Psychiatric Association, 2000) to DSM-5 (American Psychiatric Association, 2013) has resulted in several changes including removal of the amenorrhea criteria, broadening of the weight criteria, and greater emphasis on clinical impressions of patients’ fear of weight gain if behaviors interfering with weight gain can be observed. This broader definition of the disorder has allowed the inclusion of atypical or subthreshold presentations (Brown, Holland, & Keel, 2014; Dahlgren & Wisting, 2016), whose differences in clinical picture and prognosis appear to increase phenotypic heterogeneity (Mustelin et al., 2016). This creates the need to investigate whether the underlying putative structure of alexithymia construct is reflected in the TAS-20, when applied to this more heterogeneous diagnostic category.

In sum, several reasons justify the relevance of determining the factorial solution of the TAS-20 in AN. Keeping in mind that impaired emotional functioning is an essential element in the AN genesis and prognosis, the use of this measure can be very useful to increase understanding of the mechanisms by which alexithymia influences clinical aspects of the condition. This line of research might benefit from the use of subscale scores rather than a global score, attending to the multidimensional nature of the construct. Given the lack of such research in AN, this study sought to examine whether the original three-factor model of the TAS-20 could be replicated in a Portuguese sample of treatment-seeking patients with AN, in light of the new DSM-5 diagnostic criteria. In addition, we aimed to assess whether this factor solution provides a better fit to the scale compared with alternative factor structures that have been proposed in the literature (Koch et al., 2015; Meganck et al., 2008; Müller et al., 2003; Tsaousis et al., 2010; Zhu et al., 2007). This study predicted that the original three-factor solution was the model that best fits the data.

Method

Participants

A total of 125 female participants with AN, ranging between 13 and 40 years old (M = 19.73 years; SD = 5.97), were recruited from six public hospitals and two private clinics. Exclusion criteria included past or present psychotic disorders and illicit substance use or alcohol abuse. At the time of data collection they were all in active treatment, inpatient (29.6%; n = 37) or outpatient (70.4%; n = 88). Illness duration varied between 3 and 168 months (M = 35.77; SD = 34.37; Median = 24). Seventy-six patients were diagnosed as having AN restrictive subtype (AN-R) and 49 as having AN binge–purge subtype (AN-B). The participant’s BMI ranged between 13.4 and 18.5 kg/m2 (M = 15.81; SD = 1.77).

Materials

Interview for the Diagnosis of Eating Disorders-IV

Participants’ diagnosis was established through the application of the Interview for the Diagnosis of Eating Disorders-IV (IDED-IV; Kutlesic, Williamson, Gleaves, Barbin, & Murphy-Eberenz, 1998; Torres et al., 2008). The IDED-IV is a semi-structured interview that was developed for the purpose of differential diagnosis of ED, based on DSM-IV-TR criteria (American Psychiatric Association, 2000). With the transition to DSM-5, the diagnostic threshold for AN was lowered, enabling inclusion of atypical or subthreshold cases previously diagnosed as eating disorder not otherwise specified (EDNOS). Thus, in order to adhere to the new definition of AN, we reanalyzed and recoded all the interviews of cases diagnosed with EDNOS, using the DSM-5 criteria (American Psychiatric Association, 2013), post hoc. The IDED-IV begins with a semi-structured overview of the participant’s history of ED symptoms and current eating pattern. Descriptions of current eating patterns are obtained for both a typical day and days in dieting. Information is also obtained regarding medical problems associated with ED (Kutlesic et al., 1998). This overview, together with the questions included in diagnostic section, allows for exploration of the reasons for maintaining a low weight other than fear of weight gain (e.g., somatic complaints, extreme need for control) and the behaviors intended to avoid weight gain (e.g., skipping meals, substantial caloric restriction). To recode diagnoses, we used this information to infer fear of weight gain without needing to re-interview participants, and thus expand the Criterion B as defined in DSM-5. In addition, we readjusted the maximum BMI threshold for determining low weight (Criterion A) to 18.5 kg/m2, in accordance with the WHO definition of underweight (WHO, 1995), and removed the amenorrhea criteria. No changes have been made to Criterion C (body image disturbance, undue influence of weight or shape on self-evaluation, or the denial of seriousness of low weight). Two members of the team experienced in the diagnosis and treatment of ED recoded the interviews of EDNOS participants (n = 31). Agreement between AN diagnosis (yes/no) was evaluated using Cohen’s kappa coefficient (κ). We found high interrater agreement (κ = .83). The new sample of AN participants included all DSM-IV-TR cases (n = 103) as well as new cases that were previously diagnosed as EDNOS (n = 22) now encompassed under the DSM-5 criteria.

20-Item Toronto Alexithymia Scale (TAS-20)

The TAS-20 is a self-report scale comprising 20 items that respondents rate on a five-point Likert type response format ranging from 1 (strongly disagree) to 5 (strongly agree). The first factor (DIF) in the three-factor model for the TAS-20 consists of seven items assessing the difficulty in identifying feelings and distinguishes them from the somatic sensations that accompany emotional arousal. Factor 2 (DDF) consists of five items assessing the difficulty in describing feelings to others. Factor 3 (EOT) consists of eight items assessing externally oriented thinking. Five items are negatively keyed (Bagby et al., 1994). We used a validated Portuguese version of the TAS-20, with the same number of items as the original version and good psychometric properties (Prazeres, Parker, & Taylor, 2000). This version underwent the gold-standard translation and back-translation process to establish cross-language equivalence, involving eight translators (three of them bilingual) and the authors of the original English version. Factor structure cross-validation was tested in two non-clinical samples (normal adults and university students), replicating the three-factor model (Prazeres et al., 2000). This model was also confirmed by Veríssimo (2001) in Portuguese clinical samples (outpatients attending a routine general practice consultation and patients with inflammatory bowel disease).

Procedure

The study received ethical approval from all the hospitals and clinics involved in sample recruitment. Patients participated voluntarily after signing informed consent. For subjects younger than 18, informed written consent was provided both by a parent and assent of the participant. All participants agreed to participate in the study.

Data Analysis and Tested Models

No missing data were found and no data transformations were carried out. Confirmatory Factor Analyses (CFA) were accomplished using Structural Equation Modeling with EQS version 6.1. The four models tested were as follows:

  1. (a)

    Model I: A one-factor model, where it is assumed that all items reflect a one-dimensional construct (Lambert et al., 1999);

  2. (b)

    Model II (DIDF–EOT): A two-factor model with DIF and DDF forming one factor (DIDF: Items 1, 2, 3, 4, 6, 7, 9, 11, 12, 13, 14, and 17) and EOT as the second factor (EOT: Items 5, 8, 10, 15, 16, 18, 19, and 20; Cleland et al., 2005; Erni, Lötscher, & Modestin, 1997; Loas, Otmani, Verrier, Fremaux, & Marchand, 1996);

  3. (c)

    Model III (DIF–DDF–EOT): the common, three-factor solution reported by Bagby et al. (1994): DIF (Items 1, 3, 6, 7, 9, 13, and 14), DDF (Items 2, 4, 11, 12, and 17), and EOT (Items 5, 8, 10, 15, 16, 18, 19, and 20);

  4. (d)

    Model IV (DIF–DDF–PT–LIE): a four-factor solution, with DIF, DDF, and EOT split into two factors: “pragmatic thinking” (PT; refers to a concrete and reality-based cognitive style; items 5, 8, and 20) and “lack of importance of emotions” (LIE; concerns the low importance placed on emotional experiences; items 10, 15, 16, 18, and 19; Müller et al., 2003).

In evaluating the model fit, the following indices were considered: the Satorra–Bentler scaled chi-square (Satorra & Bentler, 2001) and the corresponding chi-square to degrees of freedom ratio, the Comparative Fit Index (CFI; Bentler, 1990), the root mean square error of approximation (RMSEA; Browne & Cudeck, 1993), and the standardized root mean residual (SRMR; Bentler, 1995). The following criteria were used as standards of acceptable fit: non-significant chi-squared (χ2) test (p > .05; Barrett, 2007), χ2/df < 2 (Wheaton, 1987), CFI > .90 (Byrne, 2010), RMSEA < .06 , and SRMR < .08 (Hu & Bentler, 1999). Models were compared using the Akaike’s information criterion (AIC; Akaike, 1987) and the consistent Akaike’s information criterion (CAIC; Bozdogan, 1987; Dayton, 2003), the latter because it is based on the sample size in order to compensate for the overestimation of AIC (Acquash, 2013). The model that presents the lowest values of AIC and CAIC is the one that is more representative of the true model or “the best approximation model among those being considered” (Dayton, 2003, p. 284).

The Cronbach’s alpha coefficient was calculated to assess the internal consistency of each factor. Item-to-scale homogeneity was also evaluated by calculating mean inter-item correlations (MIIC), but only in the best fitting model. A value > 0.7 for the Cronbach's α coefficient and between 0.2 and 0.4 for the MIIC were required (Briggs & Cheek, 1986).

In all analyses, p values of < .05 were considered statistically significant.

Results

Preliminary Analyses

Descriptive statistics and item intercorrelations were calculated (supplementary material, Tables S3 and S4, respectively). To test the distribution of data we used the Kolmogorov–Smirnov test. A significant p value was found but a close inspection of items’ skewness and kurtosis indicated that the deviation from normality was not problematic since absolute values were lower than 2.0 (Schumacker & Lomax, 2016).

The sample was composed of 59 adolescents (ages 13–17; 47.2%) and 66 adults (ages 18–40; 58.2%). We defined the cutoff point between age groups as age 18, based on the definition of adolescence—a stage of physical and psychological development that occurs during the period from puberty to legal adulthood (Dahl, 2004). The prevalence of alexithymia in adolescents was 57.6% (n = 34) and in adults was 66.7% (n = 44). According to t tests, there were no significant differences between age groups in TAS-20 mean scores, t (123) = − 1.80, p = .074, d = − 0.32. No mean differences were found in BMI, t (123) = 1.17, p = .234, d = 1.30. There was no significant difference between the two AN subtypes (AN-R and AN-B) on the TAS-20, t (123) = 0.77, p = .444, d = 1.19, and BMI, t (123) = 1.15, p = .252, d = 1.32.

Confirmatory Factor Analyses

Despite the high probability of normally distributed data, the conservative robust maximum likelihood estimation method was used (Ullman, 2006). Tested models were specified as presenting reflective latent factors correlated among them (exception made to the one-dimensional factor), without allowing for correlations between indicator’s errors. For establishing the scale factor (metric), the first indicator of each latent factor was set to one and all the other indicators were freely estimated. There were no cross-loadings in models. The fit indices of the four models are presented in Table 1 (see “Original models” data). All models showed poor robust fit indices. Model I provided the worst fits, followed by Model II. In Models III and IV the values were more acceptable, although below the cutoff values for the presented fit indexes. Despite the very similar results between these two models, the AIC and CAIC values indicated that Model III offered a better fit to the data.

Table 1 Model fit indexes for confirmatory factor analysis (N = 125)

As none of the four tested models met values for acceptable fit, we hypothesized that this poor performance might be related to some problematic items, as stated in the critical examination of factor analytic studies presented in this paper. When analyzing the standardized factor loadings we verified that items 16 and 20 repeatedly showed low factor loadings (bellow .06). Thus, to clarify if problematic items have a significant impact on the structure of the instrument, we retested the models removing the zero-parameter variables detected by Wald test. In Model I, the Wald test indicated that eight items could be removed without affecting the chi-square in a meaningful way (items 5, 8, 10, 15, 16, 18, 19, and 20; χ2 between .21 and 1.00, p > .05). Following the same procedure, items 16 and 20 were eliminated from the other three models (item 16: χ2 between .05 and .25; item 20: χ2 between .10 and 3.83; p > .05).

The standardized factor loadings of final models are presented in Table 2. Items 5 and 15 had very low factor loadings in every model. In Models II and III, item loadings in the EOT latent factor were relatively unbalanced (between .24 and .65), and five out six items showed loadings lower than .50, indicating that < 25% of the result of those items were explained by the latent dimension. Globally, the removal of items indicated by Wald test resulted in an improvement in the fit of all models (see Table 1, “Final models” data). Despite this improvement, only Models III and IV had sufficient fit with the data. In what concerns Model IV, it should be noted, however, that the PT factor is only composed of two indicators, below the number of items per factor required in CFA analysis, which may be a primary limitation to the model adequacy. The combined fit indices for the CFA supported the three-factor structure tested in final Model III (SBχ2/df = 1.40; CFI = .90; RMSEA [HI95%] = .06 [.08]; SRMR = .08). This model also presented the lowest values of AIC and CAIC, which indicates that it is the one that best served the data of this sample.

Table 2 Confirmatory factor analysis: standardized factor loadings and measurement error for single items of the TAS-20 for final models

For final models, intercorrelations among factors were examined using Pearson’s r (Table 3). All correlation coefficients were positive and significant (p < .05). Strong associations were found between DIF and DDF (r = .72) and PT and LIE (r = .83). All other correlations ranged from weak to moderate (r between .27 and .33).

Table 3 Correlation values between final models’ latent factors

Reliability

In original models, the internal consistency was acceptable for Model I (α = .82). The subscales related to difficulty identifying and describing feelings also evidenced an acceptable reliability both considering the two factors together (DIDF, α = .87) or separated (DIF, α = .84; DDF, α = .77). Cronbach’s alpha was unacceptably low for EOT (α = .46) in Models II and III, and even lower in factors related to the thinking style in Model IV (α = .30 and α = .33, for PT and LIE, respectively).

Items’ removal did not result in significant changes in internal consistency. An adequate reliability remained in the unifactorial model (α = .82). A poor internal consistency persisted in PT (α = .19) and LIE (α = .43) factors. Although the EOT alpha has increased to .49, it remains low. MIIC were calculated for each factor of final Model III and only in the EOT a value below of the minimum accepted was observed (EOT, MIIC = .14; DIF, MIIC = .43; DDF, MIIC = .39). It must be taken into account that the reliability of the EOT subscale can decrease with age (Loas et al., 2017). In order to explore this hypothesis in our sample, we calculated the EOT internal consistency separately by age group. A lower value was found in adolescents (α = .39) relative to adults (α = .55). This trend was not observed in the DIF (adolescents, α = .87; adults, α = .79) and DDF (adolescents, α = .82; adults, α = .70) factors.

Discussion

In this study we tested four-factor competing models for the TAS-20 by means of CFA based on a clinical sample of patients with AN. We found that, although the original three-factor solution (Model III) was the model that best fit the data, globally all models tested with the full scale provided a poor fit to the data. Thus, the standard three-factor structure that was replicated in previous Portuguese validation studies (Prazeres et al., 2000; Veríssimo, 2001) was not supported in this sample. When analyzing the standardized factor loadings we verified that some items repeatedly yielded very low values. Then, we tested non-nested models where poorly performing items were removed. Based on the Wald test modification index, two items (16 and 20) were eliminated from Models II, III, and IV and all EOT items from Model I. The results showed that Model III without items 16 and 20 has an adequate fit and is the most suitable model.

Together, our results helped to clarify three questions that have arisen from the synthesis of findings of factor analytic studies of the TAS-20 in clinical samples held in this article, whether: (a) DIF and DDF comprise two independent dimensions, (b) clinical data exhibit the third factor, and (c) difficulties in the EOT structure and consistency are related to some problematic items.

Regarding the first question, our data supported the independence of DIF and DDF in AN samples. In Model III, the best fitting model, they are two separated, strongly correlated (r = .72) factors, as expected, but without multicollinearity. This result is in agreement with select previous studies (e.g., all studies described in Table S1), but contradictory findings also exist (e.g., the majority of studies described in Table S2). It should be noted, however, that this factor solution with DIF and DDF as independent factors is substantially different from the one-dimensional structure reported by Guillén et al. (2014) with ED patients. The adequate reliability of DIF and DDF also reinforces the notion that these two concepts can be distinguished from each other, as indicated by other studies using clinical samples (Maggini & Raballo, 2004; Müller et al., 2003; Pinaquy et al., 2002). This leads us to conclude that, even though verbalization and differentiation of feelings are interconnected, they do not necessarily act together in AN. In other words, there may be situations where, despite individuals’ ability to acknowledge their emotional states, emotions are difficult to express. This incongruence between emotion expression and inner experience was documented by Gramaglia et al. (2016) and converges with Fox and Power’s (2009) position that the non-expression of feelings in ED is not entirely explained by reduced emotional awareness, but is often due to an intense fear of dealing with emotions that are perceived to be overwhelming to the self. On the other hand, it is also plausible to assume that individuals with high DIF scores may be able to express emotions, as previously reported in AN (Torres, Guerra, Lencastre, Roma-Torres et al., 2011). In these cases, the emotional expression relies on the differentiation between positive and negative emotions, without distinguishing the experiential dimension of each emotion. This perspective is of particular relevance to understand how emotions can be expressed by individuals that typically present poor interoceptive awareness, i.e., an impaired perception of both emotional states and sensations of hunger and satiety (Myers & Crowther, 2008).

With respect to the second question, as to whether clinical data exhibit the EOT dimension, our conclusion is that EOT is a salient component, as the results of the CFA clearly indicate relative superiority of the three-factor structure. In addition, and relating to the third question, our study provides evidence that the well-documented problems in literature relative to the EOT structure and internal consistency might be partly due to the poor performance of some items. In our sample, item 16 (“I prefer to watch ‘light’ entertainment shows rather than psychological dramas”) yielded a very low parameter estimate and is commensurate with findings from other studies (Ling, Zeng, Yuan, & Zhong, 2016; Müller et al., 2003; Parker, Shaughnessy, Wood, Majeski, & Eastabrook, 2005; Simonsson-Sarnecki et al., 2000; Zhu et al., 2007). This also applies to item 20 (“Looking for hidden meanings in movies or plays distracts from their enjoyment”), in which the low contribution to the EOT factor shown is consistent with previous research (Koch et al., 2015; Müller et al., 2003; Zech, Luminet, Rimé, & Wagner, 1999). In the case of item 20, a low factor loading was also found in the Portuguese validation study of the scale with a non-clinical sample (Prazeres et al., 2000).

In practice, low factor loadings suggest that items are poorly related to the underlying construct. That can happen if item content does not represent the construct or if its interpretation is not clear. In the case of items 16 and 20 we believe that they can be problematic due to contextual complexity and polysemy (Kooiman et al., 2002; Moriguchi et al., 2007), in particular with item 16, and the use of unfamiliar terms (Parker, Eastabrook, Keefer, & Wood, 2010), such as “psychological dramas” in item 20. For this reason, we believe that the quality of measurement of the EOT factor could be improved if items that repeatedly presented weak loadings in several studies are reviewed, as suggested by Meganck et al. (2008). We recommend that these items should be rewritten using simpler and clearer language, in order to increase item comprehension. At a syntactic level, we suggest the use of shorter sentences, not requiring the balance between two facets or double entendre. We believe these changes would be beneficial, not only for the use of the TAS-20 in AN, but also in other settings, since our problems concerning this third factor have been repeatedly replicated.

This procedure might not be enough to improve the quality of the factor. It should be noted that, according to our results, even removing problematic items the EOT factor remains below the optimal internal consistency (α = .49). This finding is consistent with other Portuguese validation studies, in which only one of five samples achieved an acceptable value (Prazeres et al., 2000; Veríssimo, 2001). This problem in reliability is so widespread in literature that it begs attention to the characteristics of the instrument. One of the criticisms most mentioned on the TAS-20 is the disproportionate number of negatively keyed items (four out of five) which are designed to load onto the EOT factor. Although important for avoiding response set bias, negatively worded items create increased complexity, thus making the scale vulnerable to response tendencies (Loas et al., 2001). This issue may be more salient with alexithymic respondents, as they may have decreased mental flexibility (Kojima, Frasure-Smith, & Lespérance, 2001; Richards et al., 2005), particularly pertinent for AN samples, in which a high prevalence of alexithymia is expected. Preliminary data support the presence of a method effect induced by these items (Meganck et al., 2008). However, far much more research is required to elucidate the nature of wording effects associated with the measure. As previously done with other psychological measures, such as the Rosenberg Self-Esteem Scale, future studies should determine if this measurement procedure contributes variance to scores beyond what is attributable to variance in the construct (Salerno, Ingoglia, & Lo Coco, 2017). In addition, it would be pertinent to examine the stability of method effects over time, using a longitudinal methodological approach (Marsh, Scalas, & Nagengast, 2010).

There are two additional speculations to explain the low reliability of the EOT subscale. One is relative to the inclusion of young subjects in the sample. Despite the use of the standard three-factor model with adolescents in previous studies, it was also found that the reliability of the EOT subscale, in particular, can decrease with age (for review see Loas et al., 2017). Findings from our study corroborate this data by the fact that a lower value of internal consistency was observed in the adolescent group (α = .39). Nevertheless, this argument can only partially explains the lack of item homogeneity in the EOT subscale, as adults still presented a suboptimal level of reliability (α = .55). The other argument is related to the nature of the EOT construct. This factor concerns the tendency to leave emotions unanalyzed and this avoidant attitude might be difficult to capture when it is not entirely conscious.

We considered, however, that the low internal reliability of the EOT factor does not preclude the validity of the three-factor model of the TAS-20 in our sample. It should be noted that “the factor structure of a set of indicators and the internal consistency of the factor scales consisting of the summation of a set of indicators within those factors are distinct (albeit related) issues” (Tsaousis et al., 2010, p. 447). In practice, this means that the assessment of alexithymia in AN can reliably be made by means of self-report. Results from factorial validity analyses provide some evidence that participants are able to reflect on their emotions. This perspective converges with previous research in documenting that meta-emotional abilities are preserved in AN despite high levels of alexithymia (Torres, Guerra, Lencastre, Roma-Torres et al., 2011). However, some caution should be taken when applying the EOT scale. We recommend that investigation of the different facets of the construct should be supported by other assessment tools, such as the Toronto Structured Interview for Alexithymia (TSIA), which is found to be a more sensitive instrument in detecting EOT in adolescent patients with AN (Balottin, Nacinovich, Bomba, & Mannarini, 2014).

Limitations of the present study are primarily related to the study sample. The sample size of 125 can be considered small for CFA in general populations. However, given the prevalence rate of AN, the sample size is relatively large. Although we recognize that a larger sample could lead to more robust results, research to determine sample size for structural equation modeling applications, particularly within select clinical populations, has been inconclusive (e.g., MacCallum, Roznowski, & Necowitz, 1992; Tanaka, 1987). Various sample size guidelines have been proposed (e.g., 50 observations per variable, no < 100 observations total, 5–10 observations per parameter). The study by Hamilton, Gagne, and Hancock (2003) even suggested that sample size may not bias the parameter estimates to a substantive degree, and recommended the use of samples of at least 100. Two other sample-related limitations concern age range, including different levels of cognitive and emotional maturity, and the impossibility to test measurement invariance between age groups or diagnostic subclasses of AN. In terms of diagnosis, it is still undetermined whether there is a transdiagnostic factor structure of TAS-20 in ED. While not the focus of this paper, we suggest further exploration in future studies. Similarly, consideration should be given to the possibility of the presence or absence of binge eating/purging behaviors to be a critical variable to differentiate ED types on alexithymia. Very recent studies in emotion regulation deficits tend in this direction (Mallorquí-Bagué et al., 2017; Weinbach, Sher, & Bohon, 2017). An additional procedure would be recommended in comparing CFA results between AN and a non-clinical sample. Considering that we used a translated version of the TAS-20, this procedure would allow exploring whether the poor results on factorial structures achieved without items removed may be due to specific characteristics of AN patients or issues related to cross-cultural equivalence in construct operationalization. Given the impact of cross-cultural issues of the alexithymia construct, this point could add importance to the clinical utility and implication of the findings of the current study. Lastly, we stress the absence of a more comprehensive study of construct validity, including convergent and discriminant validity assessment, which could have been more informative about the degree to which the EOT subscale, in particular, indeed measures what it purports to be measuring.

Conclusions

Confirmatory factor analysis was used to determine the number and nature of the factors underlying the TAS-20 in a sample of patients diagnosed with AN, according the recent DSM-5 criteria. The three-factor structure of the TAS-20 was confirmed. Its quality of measurement improves if two items designed to load onto the EOT factor, namely item 16 and 20, are removed. The internal consistency of the EOT is demonstrated to be lower than that of the other TAS-20 scales, which is in accordance with a large body of research literature. Difficulties in item comprehension may be at the root of these problematic items, not only in AN but in other clinical and non-clinical samples. Contrary to DID and DDF, the EOT is a dimension of alexithymia that can be problematic when subjects are young.