Keywords

Introduction

Self-determination theory (SDT; Deci and Ryan 1985, 2000, 2002) is one of the most influential motivational theories in the field of educational psychology. Although numerous studies have been conducted to investigate the motivation to learn a second language (L2) using the SDT framework (e.g., Hiromori 2006; Noels et al. 2000; Pae 2008; Tanaka 2013; Vandergrift 2005), there is no research that examines the motivation for vocabulary learning when studying English as a foreign language (EFL) from this perspective. As there is no SDT questionnaire focusing on EFL vocabulary learning motivation, this study aims to develop and evaluate an SDT questionnaire for EFL vocabulary learning using Rasch analysis.

Self-determination Theory

SDT (Deci and Ryan 2002) categorizes motivation into three broad categories: intrinsic motivation, extrinsic motivation, and amotivation. Intrinsic motivation refers to motivation to engage in an activity for the sake of one’s own enjoyment. Extrinsic motivation refers to motivation driven by external rewards. Amotivation is a state of lack of motivation. Extrinsic motivation is further classified into four types of regulation, three of which (identified, introjected, and external regulation) have been employed in empirical studies in L2 motivational literature (e.g., Noels et al. 2000; Tanaka 2013). Identified regulation is a state regulated by the importance and values of learning. For example, learners with identified regulation study English vocabulary because they believe that English vocabulary is useful or important to accomplish their life goals. Introjected regulation concerns the maintenance of a person’s self-worth. For instance, students study English vocabulary because they do not want their classmates to think that they are poor at English or slow in acquiring English vocabulary. External regulation is a state regulated by rewards or punishments. For example, learners with external regulation study English vocabulary because they want to get course credits, grades, or high test scores. According to SDT, these five types of motivation and regulation are ordered from intrinsic motivation to amotivation on a continuum (Fig. 1) and have a simplex-like structure. Theoretically, adjunct regulations on the continuum should be correlated more highly than regulations situated further apart.

Fig. 1
figure 1

The self-determination continuum (Ryan and Deci 2000, p. 72)

Research Purposes

As discussed above, there is no questionnaire for EFL vocabulary learning motivation using the SDT framework. This study aims to develop and evaluate an SDT questionnaire for EFL vocabulary learning using Rasch analysis.

  1. 1.

    Does each item function properly?

  2. 2.

    Is each of the five constructs reliable?

  3. 3.

    Is each of the five constructs unidimensional?

  4. 4.

    Is the 6-point rating scale psychometrically optimal?

  5. 5.

    Do the five constructs form the simplex-like structure that SDT postulates?

Method

Creation of a Questionnaire for SDT Vocabulary Learning Motivation

An SDT questionnaire for English vocabulary learning motivation was developed drawing primarily from Tanaka (2013, 2014). The developed questionnaire consists of five constructs (intrinsic motivation, identified regulation, introjected regulation, external regulation, and amotivation for learning English vocabulary), with five items in each construct. The questionnaire is a 6-point Likert scale ranging from 1 (Strongly disagree) to 6 (Strongly agree). See Appendix for the English translation of the questionnaire.

Data and Sample

The data for this study comes from first-year science and engineering students (N = 179; mostly male students aged between 15 and 16) at a public technical college in Japan. They took five English classes per week (45 min per session) and vocabulary lists were assigned as weekly homework over the year. At the time of data collection, the students had completed approximately four years of compulsory English learning at secondary schools (i.e., at junior high school and college). The questionnaire was administered in Japanese in classes around the end of the 2012 academic year.

Data Analysis Procedures

Rasch analyses were performed using Winsteps 3.80.0 as follows. First, a Rasch fit analysis was conducted to examine items and the reliability of each construct measured by the questionnaire. Second, a Rasch principal components analysis (PCA) of item residuals was conducted to examine dimensionality of each construct. Third, the rating scale categories of each construct were assessed and optimized based on Linacre’s (2002) six criteria. In addition to these Rasch analyses, Pearson product-moment correlation coefficients were calculated among the five constructs using SPSS 19.0 to examine the simplex-like structure that SDT postulates.

Results

Items

First, a Rasch fit analysis was conducted to examine items designed to measure each construct in the questionnaire. The criteria for acceptable items were the infit and outfit mean square (MNSQ) statistics of 0.50–1.50 (Linacre 2012, p. 553). Table 1 shows the summary of the Rasch item fit statistics. All infit and outfit MNSQ statistics were between 0.50 and 1.50 (Max. = 1.38, Min. = 0.68 for infit MNSQ statistics; Max. = 1.48, Min. = 0.69 for outfit MNSQ statistics). The point-measure correlations of the items were adequately high (M = 0.82, Max. = 0.91, Min. = 0.75). Taken together, each item functioned properly and adequately contributed to measuring the intended construct.

Table 1 Rasch item fit statistics for the five SDT motivation items

Reliability and Separation of Measures

The reliability of each construct was examined based on Rasch person reliability and separation estimates and Rasch item reliability and separation estimates. The criteria for person estimates are above 0.80 for reliability and above 2.0 for separation, as person reliability of 0.80 indicates the presence of 2 or 3 statistically distinct levels in the sample (Linacre 2012, p. 574). As shown in Table 2, while three constructs (intrinsic motivation, introjected regulation, and amotivation) satisfied the criteria, two constructs (identified and external regulation) showed person reliability and separation estimates slightly lower than the criteria (reliability: 0.7, separation: 1.81 and 1.82). Consequently, eight misfitting people (5 % of the total of 179 participants) were temporarily eliminated based on the analysis of the most unexpected responses, and person reliability (separation) were recalculated for identified and external regulation. As a result, person reliability (separation) improved into 0.82 (2.12) for identified regulation and 0.81 (2.05) for external regulation, satisfying the criteria. Given that the elimination of a very small number of misfitting people improved the reliability (separation) estimates, person reliability of each construct was adequately high.

Table 2 Reliability and separation of the five constructs

With respect to item estimates, a reliability estimate above 0.90 and a separation above 3.0 are considered ideal values, as this confirms the item difficulty hierarchy (low, medium, and high difficulties) of the instrument (Linacre 2012, p. 575). Most of the item reliability (separation) estimates were very high, being above or very close to the ideal values of 0.90 (3.00). However, one construct (external regulation) showed low item reliability (0.67) and separation (1.47) estimates. When the eight misfitting people were temporality eliminated, item reliability (separation) improved to 0.77 (1.82). Although these values are still lower than the ideal values, the reliability of 0.77 is not considered very problematic as it is close enough to the value of 0.80 where items are stratified between 2 and 3 levels in terms of difficulty. However, some improvement in reliability and separation is recommended for a revised version. As low item reliability indicates “a narrow range of item measures, or a small sample” (p. 575), a revised version should have more items with a wider difficulty range or should be tested with a larger sample with wider ability variance.

Dimensionality

Dimensionality of each construct was examined using the Rasch PCA of item residuals. Construct unidimensionality is assessed in terms of variance explained and variance unexplained by the Rasch measures. Table 3 shows the results of the analysis. The four constructs (intrinsic motivation, identified regulation, introjected regulation, and amotivation) had an adequate amount of variance explained by the Rasch measures as they were more than the half the total variance. Concerning unexplained variance, the ideal eigenvalue of the first contrast in the residuals was less than 2.0 (Linacre 2012, p. 353). In practice, however, the eigenvalue should be less than 3.0, as the strength of at least 3 items (i.e., eigenvalue of 3.0) is necessary to form a secondary dimension (p. 496). As shown in Table 3, all the five constructs had eigenvalues less than 3.0 and thus satisfied the practical criterion. However, the unexplained variance in percentages appeared to be somewhat large. In particular, external regulation had a large amount of residuals for the first contrast (30 %), which was greater than the variance explained by the item difficulties (20.1 %). The total unexplained variance (58.1 %) was also larger than the total variance explained by measures (41.90 %). Given that the total explained variance should ideally be four times larger than the total unexplained variance (p. 496), residuals in the construct of external regulation is very large. As such, item loadings were examined to explore a possible secondary dimension.

Table 3 Rasch PCA of item residuals of the five constructs

As shown in Table 4, the five items were separated into two clusters. Whereas two items with high positive loadings (EX4 and EX5) concern course credits, three items with high negative loadings (EX1, EX2, and EX3) are grade-related items. The disattentuated person measures from these two clusters of items showed a mere medium correlation (r = 0.53, p < 0.001).

Table 4 Positively and negatively loading items in the Rasch PCA of item residuals for external regulation

Although some degree of multidimensionality is suggested for this construct, “[m]ultidimensionality always exists to a lesser or greater extent” (Linacre 2012, p. 497). Examination of the content of items is also important to determine unidimensionality. Linacre (p. 489) suggested the following guidelines for determining unidimensionality:

[L]ook at the content (wording) of the items. If those items are different enough to be considered different dimensions (similar to “height” and “weight”), then split the items into separate analyses. If the items are part of the same dimension (similar to “addition” and “subtraction” on an arithmetic test), then no action is necessary. You are seeing the expected co-variance of items in the same content area of a dimension.

The theoretical content of external regulation represents a “broad” motivational state regulated by external factors such as rewards and punishment, which include grades, scores, and credits. Although the removal of either grade- or credit-related items improves the Rasch unidimensionality of this construct, both clusters are part of the same external regulation. As such, it is not necessary to separate the items into two constructs.

Rating Scale Categories

The effectiveness of the original 6-point rating scale categories (1 = Strongly disagree, 2 = Disagree, 3 = Slightly disagree, 4 = Slightly agree, 5 = Agree, and 6 = Strongly agree) was examined and optimized based on Linacre’s (2002) six guidelines:

  1. 1.

    Each category should have more than 10 observations;

  2. 2.

    Each category should have a peak in the probability curve;

  3. 3.

    The average category measures should progress with the rating scale categories;

  4. 4.

    Outfit mean squares should be smaller than 2.0;

  5. 5.

    Threshold calibration should progress with the rating scale category; and

  6. 6.

    The category threshold should be between 1.4 and 5.0 logits apart.

Concerning the sixth criterion, the minimum threshold separation was assessed based on Wolfe and Smith’s (2007) criteria: 0.59, 0.81, 1.1, and 1.4 for a 6-, 5-, 4-, and 3-point scale, respectively. When the above six criteria were not satisfied, the rating scale categories were optimized by combining categories.

Table 5 shows the summary of the category structure for intrinsic motivation. In the 6-point rating scale, the separation between the first and second thresholds (τ1 = −2.01, τ2 = −1.71) was 0.30, which was well below the required 0.59 logits for a 6-point rating scale. In the 5-point rating scale, when categories 1 and 2 were combined, the separation between the thresholds became 1.09 (τ1 = −2.12, τ2 = −1.03), which was greater than the required 0.81 for a 5-point rating scale. The other criteria were also satisfied as all the categories had more than 10 observations; the outfit mean square statistics were below 2.0; the average category measures were ordered, progressing from −2.70 for category 1 to 2.08 for category 5; the shape of the probability curves was peaked for each category (Fig. 2). Thus, the 5-point rating scale was considered optimal for the construct of intrinsic motivation.

Table 5 Summary of the category structure for intrinsic motivation
Fig. 2
figure 2

The 5-point rating scale performance for intrinsic motivation

Table 6 shows the summary of the category structure for identified regulation. In the 6-point rating scale, threshold measures were disordered between categories 1 and 2. In the 5-point rating scale when these categories are combined, the threshold measures were ordered but the separation between the thresholds (0.35, τ1 = −1.40, τ2 = −1.05) was well below the required 0.81 for a 5-point rating scale. Consequently, categories 1 and 2 were combined again. The separation between the first and second thresholds (τ1 = −1.49, τ2 = −0.12) became 1.37, which was larger than the required 1.1 for 4-point rating scale. The other criteria were also satisfied (see Table 6 and Fig. 3). Thus, the 4-point rating scale was considered optimal for the construct of identified regulation.

Table 6 Summary of the category structure for identified regulation
Fig. 3
figure 3

The 4-point rating scale performance for identified regulation

Table 7 shows the summary of the category structure for introjected regulation. All the categories had more than 10 observations; the outfit mean square statistics were below 2.0; the average category measures were ordered, progressing from −3.47 for category 1–3.19 for category 6, and the smallest separation between the thresholds was 1.71 (τ5 = 2.17, τ6 = 3.88), which was greater than the required 0.59 logits for a 6-point rating scale. Moreover, the shape of the probability curves peaked for each category (Fig. 4). Thus, the 6-point rating scale was optimal for this construct.

Table 7 Summary of the category structure for introjected regulation
Fig. 4
figure 4

The 6-point rating scale performance for introjected regulation

Table 8 shows the summary of the category structure for external regulation. Categories 1 and 2 were combined twice, as the separation between the first and the second thresholds was well below the required value of 0.59 logits for a 6-point rating scale and 0.81 logits for a 5-point rating scale. In the 4-point rating scale, the smallest separation between the thresholds (1.42, τ3 = 0.08, τ4 = 1.46) was greater than the required 1.1 for a 4-point rating scale. The other criteria were also satisfied (see Table 8 and Fig. 5). Thus, the 4-point rating scale was considered optimal for the construct of external regulation.

Table 8 Summary of the category structure for external regulation
Fig. 5
figure 5

The 4-point rating scale performance for external regulation

Table 9 shows a summary of the category structure for amotivation. Categories 5 and 6 in the 6-point and categories 4 and 5 in the 5-point rating scales were combined as threshold measures were reversed. In the 4-point rating scale, the threshold measures were ordered. The other criteria were also satisfied (see Table 9 and Fig. 6). Thus, the 4-point rating scale was considered optimal for the construct of identified regulation.

Table 9 Summary of the category structure for amotivation
Fig. 6
figure 6

The 4-point rating scale performance for amotivation

Table 10 shows the results of rating scale optimization. The 6-point rating scale was retained only for introjected regulation. Scales were reduced into 5-point rating scales for intrinsic motivation, and 4-point rating scales for identified regulation, external regulation, and amotivation.

Table 10 Summary of the rating scale optimization

The Theoretical Tenets of the Simplex-Like Structure of the SDT Scale

As discussed earlier, SDT (Deci and Ryan 2002) postulates a simplex-like pattern on the continuum of the five subscales, where adjunct regulations have a stronger and positive correlation with each other. The results of the correlation analysis showed that the five constructs have the simplex-like structure that SDT postulates (Table 11). As such, the measurement of the five constructs adequately represents SDT.

Table 11 Correlation matrix of the five constructs

Conclusion

The present study aimed to develop and evaluate an SDT questionnaire for EFL vocabulary learning motivation using Rasch analysis. First, the results of a Rasch fit analysis showed that each item functions properly and adequately contributes to measuring the intended construct. Second, the results of Rasch reliability and separation analyses revealed that both person and item reliability (separation) were adequately high, although some improvement was recommended for external regulation. Third, the results of the Rasch PCA of item residuals showed that constructs were adequately unidimensional. Fourth, the results of rating scale analysis showed that the original 6-point rating scale was retained only for introjected regulation. The rating scale categories of the remaining four constructs were properly optimized by reducing categories. Fifth, the results of the correlation analysis showed that the measurement of the five constructs adequately represented the self-determination theory. Taken together, the developed SDT questionnaire instrument for EFL vocabulary learning motivation was adequately valid and reliable for the participants of the present study.