Introduction

Listed in the “Otherwise Specified Feeding and Eating Disorders” section of DSM-5 [1], night eating syndrome (NES) is characterized by evening hyperphagia, nocturnal ingestions, and sleep and mood disturbances [2], and it has been found to be prevalent among clinical population (e.g., 12.3% in a psychiatric population [3]) and also among college students (e.g., 9.5% in Turkey college students [4] and 2.8% in Chinese college students [5]). Moreover, NES is often associated with negative physical and psychological consequences, such as obesity [6], psychological distress [7], and functional impairment [8].

As the most widely used self-report instrument for screening NES [9, 10], the Night Eating Questionnaire (NEQ) covers the core behaviors and symptoms associated with NES [11], and it provides a convenient global score for NES severity [12]. Specifically, the NEQ contains of 14 items about NES symptoms, which are rated on a 5-point Likert scale. There are four factors established from the 14 items, namely, (1) nocturnal ingestions (i.e., waking at night to eat), (2) evening hyperphagia (i.e., overeating at night), (3) morning anorexia (i.e., not eating anything all morning), and (4) mood/sleep (i.e., mood and sleep disturbances). According to the study about the development of the original English version of the NEQ [13], the NEQ showed an acceptable reliability (Cronbach’s alpha = 0.70), good construct validity (e.g., moderately correlated with EDE global score, r = .37) and good discriminant validity (e.g., NEQ scores for patients without NES were significantly lower than those of night eaters); thus, it was an efficient and valid measure of severity for NES.

To date, the English version of NEQ has been translated and validated in many languages, including Arabic (Egypt) [14], Hebrew (Israel) [15], Italian (Italy) [16], German (Germany) [9], Korean (Korea) [17], Portuguese (Brazil), Spanish (Spain) [18], Traditional Chinese (Taiwan) [10], and Turkish (Turkey) [19]. Based on the findings from these previous validation studies, the NEQ demonstrated to be a suitable tool for assessing NES across many different countries [9, 10, 15, 16, 18, 20].

According to the study conducted by Tu et al. [10], the English version of NEQ was translated into traditional Chinese (i.e., C-NEQ), and the C-NEQ demonstrated strong reliability and validity in a general Taiwanese sample. For example, internal consistency measured by Cronbach’s alpha was 0.82, 2-week test–retest reliability was 0.85, and it showed moderate–high correlations (rs = .37–0.54) with eating disturbances measured by Eating Disorder Examination Questionnaire (EDE-Q) [21].

However, it should be noted that significant differences exist in the social, linguistic, and cultural contexts between mainland China and Taiwan [22]. For example, the traditional characters are used in Taiwan, Hong Kong, and Macau, whereas the simplified characters are used in mainland China. Thus, many psychological or behavioral instruments that have been validated in Taiwan in traditional Chinese should not be directly used in mainland china without language transformation, and because of the social, linguistic, and cultural differences, the simplified Chinese versions should be further validated [23,24,25].

Moreover, previous studies have shown gender differences in the tendency to engage in night eating (e.g., males being more likely to have night eating than females) [7, 26, 27]. Therefore, it is important to examine whether the distinct responses to the items provided by males and females reflect true gender differences in the measured trait, or they are just the results of different meanings ascribed to questionnaire items by different groups [28].

Overall, based on the foregoing discussion, the aim of the present study was to examine the psychometric properties and to test the gender invariance of the simplified Chinese version of the NEQ among a large sample of mainland Chinese college students.

Method

Participants and procedure

Initially, 1462 university students from three provinces in mainland China (i.e., Hunan in central China, Shandong in northern China, and Shaanxi in western China) were contacted for this survey, and a total of 1237 (a response rate of 84.61%) agreed to participate, and they completed the consent form attached with the paper–pencil questionnaires.

Measures

Simplified Chinese version of Night Eating Questionnaire (SC-NEQ)

Developed by Allison et al. [13], the Night Eating Questionnaire (NEQ) consists of 14 items rated on a 5-point Likert scale, for assessing four correlated night eating constructs: morning anorexia, evening hyperphagia, mood/sleep, and nocturnal ingestions, with the total score representing the severity of NES [13]. Recently, the English version of the NEQ has been translated into Traditional Chinese (C-NEQ) [10]. Based on this version of traditional Chinese, the simplified Chinese version was obtained by language transformation. Specifically, two Chinese Ph.D. students (one majors in Clinical Psychology and the other one majors in Chinese linguistics) did the transformation. In this process, in addition to the differences between traditional and simplified Chinese characters, other differences in language usages between mainland China and Taiwan were also considered. After the transformation, the preliminary version of the simplified Chinese NEQ (SC-NEQ) was obtained and was then further reviewed by an expert in eating behaviors. After minor revisions were made, the SC-NEQ was finalized and used in the current study.

Chinese version of Eating Disorder Inventory (C-EDI)

The EDI [29] consists of 64 items comprising eight subscales with a 6-point Likert-like scale ranging from 1 (‘Never’) to 6 (‘Always’). In the current study, among the eight subscales of EDI, the Bulimia scale, the Body dissatisfaction scale, and the Drive for thinness scale were used. Based on the discussion by Garner et al. [29], these three scales measured eating disturbances, whereas the others were general psychological scales that were relevant to, but not specific to, eating disorders. The Chinese version of EDI was previously translated and validated, and showed good reliability and validity in a sample of Chinese college students [30]. The value of Cronbach’s alpha in the current sample for the Bulimia subscale, Body dissatisfaction scale, and Drive for thinness scale was 0.86, 0.87, and 0.90, respectively.

Data analysis

The factor structure was explored using principal component analysis (PCA) on a randomly-selected sample of 600 college students and the rest of sample (n = 637) was then used to conduct a confirmatory factor analysis (CFA) to validate the factor structure derived from PCA.

Moreover, for the PCA, Promax rotation was used, and the rule of eigenvalues larger than one [31] and parallel analysis [32] were used to determine the number of factors extracted. Item loadings larger or equal to 0.3 were considered to load on a factor. It should be noted that the data in our sample were obviously non-normally distributed for item 10–14 (see Table 2) due to the two stop-criteria in the questionnaire which lead to a lot of zeros in the second half of the questionnaire (i.e., if a respondent chose 0 for item 9, there is no need for him/her to answer item 10–14, and if a respondent chose 0 for item 12, there is no need for him/her to answer item 13 and 14). Thus, to account for the violation of normality, based on the previous literature [33,34,35,36], we used the weighted least squares means and variance adjusted (WLSMV) estimator for conducting CFA rather than the commonly used maximum likelihood (ML) estimator which assumes data normality. Furthermore, the goodness of fit for the CFA model was evaluated by the comparative fit index (CFI), the Tucker–Lewis index (TLI), and the root mean square error of approximation (RMSEA). In addition, the weighted root mean square residual (WRMR) was not reported, because it is not accurate with very large sample sizes [37]. Moreover, a good fit for RMSEA should be lower than 0.08, while for CFI and TLI, the values should be higher than 0.95 [38].

Reliability was examined using both Cronbach’s alpha [39] and omega [40] with 0.7 indicating good internal consistency reliability. It should be noted that although Cronbach’s alpha was the most widely used indicator of internal consistency reliability, it has numerous psychometric deficiencies documented in the previous literature, while omega overcomes some of the fundamental problems intrinsic to the calculation of alpha [41]. Thus, to have a better description of the reliability of the SC-NEQ, we used both Cronbach’s alpha and omega.

Measurement invariance across gender was then examined. Model comparison for invariance analyses was also conducted to examine the property of invariance of this test on gender groups using a ΔCFI (a value less than 0.010 indicates invariance) [42] and ΔRMSEA (a value less than 0.015 indicates invariance) [43] in a sequential manner. In addition, it should be noted that the traditional indicator of invariance, ∆χ2, has been found to be oversensitive to minor model difference with large sample size [42], so ∆χ2 was not reported in the current study. All data analyses were carried out through the SPSS 23.0 (IBM SPSS, Inc., Chicago, IL, USA), Mplus 7.4 (Muthén & Muthén, Los Angeles, CA, USA), and R package psych [44].

Results

Descriptive analysis

From the 1237 participants, demographic information (gender, age, ethnicity, and self-reported height and weight) was collected. Results showed that around half of these participants (54.2%) were female, a majority of them (84.2%) were Han ethnicity (vs. minorities), whereas a small percent of them (31.4%) were younger than 20 years (i.e., the median of their ages), and a few of them (11.3%) were overweight (BMI greater than 24 kg/m2 [45]). The demographic characteristics of the sample are presented in Table 1.

Table 1 Demographic characteristics of the total sample (n = 1237)

The descriptive results for the CS-NEQ scores by each item are presented in Table 2. Then, t tests were used to explore the potential gender differences for each item and the total scores of the SC-NEQ. It should be noted that, for two-group comparisons, normality assumption is not that critical for the selection of an appropriate statistical test [46], so t tests were used. The results indicated that there were statistically significant gender differences for 8 items (i.e., item 2, 5, 8, 9, 10, 11, 12, and 14) and the total scores of the SC-NEQ.

Table 2 Distributions of SC-NEQ scores by each item

Principal component analysis

Results showed that the Barlett’s test of sphericity was statistically significant (χ2 = 2195.28, df = 78, p < .01) and the Kaiser–Meyer–Olkin value was 0.80, indicating that the 13 items from the SC-NEQ were appropriate for factor analysis. The results of PCA with Promax rotation revealed a four-factor structure explaining 60.37% of the total variance. The number of four factor was also confirmed by parallel analysis (Appendix A). The Eigenvalues of the four extracted factors were 3.77, 1.77, 1.30, and 1.01 for nocturnal ingestions, mood/sleep, evening hyperphagia, and morning anorexia, respectively (the eigenvalue of the first factor not extracted was 0.97). The factor loadings and the correlations among the four factors are shown in Table 3.

Table 3 Rotated component matrix of PCA using Promax and factor correlation matrix (n = 600)

Confirmatory factor analysis

The results of CFA showed acceptable model fit, with CFI = 0.96, TLI = 0.95, and RMSEA = 0.09 (90% CI 0.09–0.10; probability RMSEA less than 0.05 = 0), indicating that the four-factor model derived by the PCA fitted the data acceptably with another sample. In addition, for comparison purpose, a one-factor model was also tested, and the results showed unsatisfactory model fit, with CFI = 0.91, TLI = 0.89, and RMSEA = 0.14 (90% CI 0.13–0.14; probability RMSEA less than 0.05 = 0), which further confirmed the four-factor structure derived from PCA. Moreover, for assessing the suitability of using a total score of the SC-NEQ, a second-order factor model was examined, and acceptable model fit was found, with CFI = 0.96, TLI = 0.95, RMSEA = 0.09 (90% CI 0.08–0.10; probability RMSEA less than 0.05 = 0). The factor loadings of the first-order and second-order models and the correlations among the factors are shown in Table 4.

Table 4 Factor loadings of CFA and factor correlation matrix (n = 637)

Reliability

Internal consistency was examined by both Cronbach’s alpha and omega, and results showed that Cronbach’s alpha and omega was 0.70 and 0.83, respectively, for the whole scale. With regard to the four subscales, Cronbach’s alpha was 0.19, 0.41, 0.60, and 0.88 for morning anorexia, evening hyperphagia, mood/sleep, and nocturnal ingestions, respectively; and omega was 0.30, 0.44, 0.69, and 0.91 for morning anorexia, evening hyperphagia, mood/sleep, and nocturnal ingestions, respectively.

Correlations with BMI and three subscales of EDI

Results (Table 5) showed that the SC-NEQ total scores were positively and significantly correlated to its four subscales, with r = .34 (p < .01) for morning anorexia, r = .55 (p < .01) for evening hyperphagia, r = .67 (p < .01) for mood/sleep, and r = .59 (p < .01) for nocturnal ingestions. Furthermore, the correlation between SC-NEQ and BMI was close to zero. However, the SC-NEQ total scores were positively and significantly correlated with EDI-Bulimia (r = .35, p < .01), EDI-body dissatisfaction (r = .09, p < .01), and EDI-Drive for thinness (r = .16, p < .01).

Table 5 Correlations between night eating and BMI and eating disturbances (n = 1237)

Gender invariance tests

To test gender invariance, the correlated four-factor model was first tested for males and female, and results showed that the model could fit the data acceptably for both gender groups. Then, configural invariance model, metric invariance model, and scalar invariance model were tested in a sequential manner. Results showed that the ΔCFI and ΔRMSEA between configural invariance and metric invariance models, and the ΔCFI and ΔRMSEA between metric invariance and scalar invariance models were less than the cut-off values for invariance (i.e., 0.010 for ΔCFI, and 0.015 for ΔRMSEA), indicating that both weak invariance and strong invariance across gender groups were supported. The results of gender invariance tests are shown in Table 6.

Table 6 Fit indices of gender invariance tests

Discussion

This study transformed the traditional Chinese version of NEQ (C-NEQ) into simplified Chinese (SC-NEQ), and empirically examined the psychometric properties and gender invariance of the SC-NEQ in a large sample of Chinese college students. The results showed that the SC-NEQ had generally good psychometric properties and it was also strongly invariant across gender groups.

In the current study, PCA yielded a four-factor structure which replicated that of the original English version [13] and the traditional Chinese version of NEQ [10]. However, there were several items (i.e., item 1, item 2, and item 8) loading on two factors. The presence of cross-loading items is not unusual, when considering that cross-loading items were commonly presented in previous validation studies of NEQ [10, 13, 15, 18, 19]. However, for the current study, it should be noted that, although these three items showed cross-loadings, larger loadings from these items went to the intended factors, and the lower loadings were for other factors; moreover, the lower loadings for these factors were less than 0.40 (one of the widely used cut-off values of factor loading). Thus, these items with cross-loadings did not affect the factor structural similarity of the SC-NEQ to the original English version of NEQ. In addition, although the use of PCA to explore the factor structure of the SC-NEQ in the current study was in line with all previous validation studies of the NEQ [9, 10, 13, 15, 18], it should be noted that PCA is not a real factor analysis [47]; thus, future studies may explore the factor structure via true methods of factor analysis (e.g., principal axing factoring) and further test the alternative factor structures of the NEQ derived from true methods of factor analysis.

Regarding the results of CFA, it is worth noting that the values of CFI and TLI for the CFA models were greater than the suggested cut-off of 0.95 [38], whereas the values of RMSEA were slightly over the recommended value of 0.08 [38]. According to a previous simulation study exploring the performance of RMSEA in non-normally distributed data [48], RMSEA was found to be inflated with non-normality. However, in contrast to RMSEA, CFI and TLI are less likely to be affected by non-normality [49]. Thus, because of the scoring of the NEQ, the data of SC-NEQ in the current study were non-normally distributed (see Table 2), which could have contributed to the inflated values of RMSEA (i.e., over 0.08). Furthermore, considering that CFI and TLI are more robust than RMSEA in non-normality [49], we believe the CFA models in our study fitted the data well.

For the reliability of the SC-NEQ, the Cronbach’s alpha and omega were 0.70 and 0.83, respectively, which were greater than 0.70, indicating good reliability of the SC-NEQ as an overall scale. For the four subscales, the Cronbach’s alpha and omega were acceptable (greater than 0.60) for Mood/sleep and Nocturnal ingestions; however, the Cronbach’s alpha and omega ranged from 0.19 to 0.44 for Morning anorexia and Evening hyperphagia, with Morning anorexia showing the lowest reliability. Thus, the reliability of the overall scale of the SC-NEQ and its two subscales (i.e., mood/sleep and nocturnal ingestions) was acceptable, whereas the reliability for its other two subscales (morning anorexia and evening hyperphagia) was unacceptable. These findings of reliability are in line with previous studies showing good reliability for the overall scale but not for certain subscales (e.g., morning anorexia) [9, 13, 17]. Thus, in line with the recommendation given by the authors of validating the Arabic version of NEQ [14], we also recommend to use the total score of the SC-NEQ rather than using subscales with low reliability as standalone measures.

Consistent with previous studies [9, 16], in the current study, Morning anorexia was also found to be nearly unrelated to the other subscales (with r ranging from .01 to .10) and also showed the smallest correlation with the total score of the SC-NEQ. Thus, this finding may confirm that Morning Anorexia might not contribute a lot to the construct of NES [50], and it might be a descriptor but not a core feature of NES [51].

Moreover, the correlation between BMI and NES was close to zero, indicating there was nearly no association between these two variables. According to the previous literature, some studies found a significant positive association between NES and BMI [4, 26, 52], while several other studies reported no relationship between NES and BMI [16, 53, 54], and there was even one study reporting that those with a history of being underweight were more likely to have NES [2]. Thus, the association, or lack thereof, between NES and BMI remains unclear. From a different perspective, a negligible correlation between BMI and NES could be considered positive evidence for the psychometric characteristics of SC-NEQ, as this could be construed as evidence of discriminant validity (i.e., lack of relationship between two unrelated constructs).

Furthermore, our finding of a moderate correlation, r = .35, between the SC-NEQ and the EDI-Bulimia was consistent with the previous literature [8, 18]. For instance, Allison et al. [18] reported a correlation of 0.37 between the total scores of the NEQ and the global score of EDE, and moreover, Meule et al. [9] also reported a similar correlation of 0.38 between the scores on the NEQ and the EDE-Q total score. Thus, this finding of moderate correlation between the SC-NEQ and the EDI-Bulimia supports the construct validity of the SC-NEQ.

However, some previous studies [9, 10] reported moderate to high correlations (i.e., 0.36–0.54) between the NEQ and weight/shape concern measured by the EDE-Q. But in the current study, weak correlations were found between the SC-NEQ and the EDI-Body dissatisfaction (r = .09) and the EDI-Drive for thinness (r = .16). However, it should be noted that, in the current study, the EDI, but not EDE-Q, was used to measure weight/shape-related concerns in the current study, which might have contributed to the observed inconsistencies as described above.

Furthermore, to the best of our knowledge, the current study was the first study that tested the gender invariance of the NEQ in a community sample, and our results showed that the SC-NEQ was strongly invariant across gender groups. This invariance finding indicates that the SC-NEQ measures the same construct (i.e., night eating) across gender groups. Thus, the gender difference of NES revealed in the current study was less likely to be resulted from the measurement aspects of the SC-NEQ. Future studies may use the SC-NEQ to explore what factors may have contributed the gender difference of NES among mainland Chinese college students.

Limitations

There were several limitations of the current study. First, the test–retest reliability of the SC-NEQ was not examined; thus, future studies are highly recommended to explore the test–retest reliability of the SC-NEQ. Second, the non-normality of the data might have affected the values of model fit indicators (e.g., RMSEA) in the current study; thus, future researchers trying to further validating the SC-NEQ may consider using data transformation or other methods to minimize the influence from data non-normality on the values of model fit indicators. Third, the current sample only consists of mainland college students, which limits the generalization of the findings from the current study to other populations; therefore, future studies are needed to further replicate the current study in other populations in mainland China (e.g., patients with eating disorders). Fourthly, the construct validity of the SC-NEQ was only tested by the C-EDI, and future studies are recommended to use many other measures, such as a measure of night eating episodes, for a more comprehensive assessment of the construct validity of the SC-NEQ. Finally, Brown [36] recommended that the sample size of the groups for measurement invariance tests to be as balanced as possible, because the results of invariance tests can be largely affected by the unequal sample sizes between groups. Thus, considering the quite unequal sample sizes for ethnicity, age and BMI, we did not conduct measurement invariance tests for these demographic variables. However, future studies are highly recommended to explore whether the SC-NEQ is still invariant in terms of these demographic variables.

Conclusion

In conclusion, the simplified Chinese version of NEQ (SC-NEQ) demonstrated adequate psychometric properties and showed strong measurement invariance across gender groups. These results suggest that the SC-NEQ could be considered as a useful instrument that is suitable for use among Chinese mainland college students for assessing night eating.