Introduction

Reading strategies consist of behaviors or thoughts intentionally activated by individuals to understand and attribute meaning to a text and achieve their reading goals (Afflerbach et al., 2008; Miyamoto et al., 2019). These include cognitive and metacognitive strategies that can be activated before, during and after reading a text (Mason, 2013). Cognitive strategies involve the construction of meaning from the integration of new information into existing knowledge schemas, allowing the reader to understand the material read (Afsharrad et al., 2017). Examples of cognitive reading strategies are paraphrasing, questioning, scanning textual information, predicting content from images, titles or context, activating prior knowledge, summarizing, underlining key words and sentences, underlining unfamiliar words, taking notes and consulting the dictionary (Mokhtari & Reichard, 2002; Parkinson & Dinsmore, 2018). Metacognitive strategies refer to the knowledge and self-regulation of cognitive processes, as well as the monitoring and assessment of the achieved level of comprehension (Erler & Finkbeiner, 2007). These allow the reader to plan the reading activity, monitor and adapt their cognitive activity while reading, analyze the effectiveness of the strategies used and readjust them (Afsharrad et al., 2017; Griva et al., 2012). Hence, several studies have consistently found, across a wide range of grade levels, a connection between the use of reading strategies and the reading comprehension levels attained (Ammel & Keer, 2021; Frid & Friesen, 2020; Köse & Güneş, 2021; Liao et al., 2022). Research has also indicated that programs centered on promoting the use of the aforementioned strategies have a positive impact on students’ reading and language skills (Bippert, 2020; Mason, 2013). Therefore, the availability of robust measures to assess reading strategy use is of utmost importance for practice and for research, whether it is for purposes of identifying students performing poorly, monitoring progress or assessing the effects of intervention programs.

Reading strategy use has been assessed using different methods, which can be grouped into online and offline methods. The first group includes, among others, think-aloud protocols (e.g., Cromley & Wills 2016; Meyers et al., 1990; Wang, 2016) and eye movement registrations (e.g., Tremblay et al., 2021). In both cases, the behavior of the readers during a reading task is registered and coded. Although the scores obtained with these methods are the ones that are most highly correlated with reading comprehension (Cromley & Azevedo, 2006), they still have some disadvantages. Veenman (2011) points out weaknesses such as the inability of some individuals to verbally report what they are doing in think-aloud protocols, the fact that these methods can be intrusive for some people and, most of all, that they are very time-consuming and labor intensive. In the case of eye-movement registration methods, these also require costly equipment that may not be available or easily accessible. Offline methods are those in which reading strategy use is assessed before or after the reading task. One example of this type of method consists of requiring the use of a specific strategy during a reading task and asking comprehension questions afterward (Spörer et al., 2009). Although this is a method that provides some evidence on whether students are able to use strategies with efficacy, concerns have been raised that this is not a pure measure of reading strategy use, as the scores also measure comprehension itself (Muijselaar et al., 2017). Hence, the most commonly used off-line measures of reading strategy use are questionnaires. These can be divided into questionnaires that address the students’ knowledge of reading strategies and questionnaires that address the frequency of use of reading strategies. In the first type, students are asked what they do in specific situations, such as what they do when they stop understanding the text (Muijselaar & de Jong, 2015). The second type of questionnaire is essentially self-reporting measures, where respondents indicate the frequency with which they use specific strategies during reading (Liao et al., 2022; Mokhtari & Reichard, 2002). Self-reporting questionnaires are the most commonly used methods (Cromley & Azevedo, 2006), despite having been occasionally criticized – for example, for being prone to social desirability and memory distortions (Veenman, 2011), for not providing information on how effectively readers use the strategies but merely on how often they use them (Miyamoto et al., 2019), or for being less related to reading measures than on-line measures (Cromley & Azevedo, 2006). Although these limitations should be considered, these instruments are still useful in various settings due to their cost-effectiveness relationship and convenience of use, as they can be easily administered to large groups and can be quickly scored (Gascoine et al., 2017; Veenman et al., 2006).

The Reading Strategy Use (RSU) scale is a self-reporting measure of the frequency of use of cognitive and metacognitive reading strategies that was originally developed in New Zealand to assess children aged approximately 10–13 years old (Pereira-Laird & Deane, 1997). A two-factor structure – with a cognitive and a metacognitive factor – was established trough confirmatory factor analysis. Both factors had adequate reliability as measured by Cronbach’s alpha (0.73 and 0.85 for the cognitive and metacognitive factors, respectively). Validity evidence was provided by correlations with scores in measures of reading comprehension, vocabulary, and study skills, as well as with the classroom marks. The RSU was later modified to be administered to younger children (attending the second grade) in the United States (Reutzel et al., 2005). Among the modifications was the shrinkage of the response scale from seven categories to three categories. However, the psychometric properties of this modified version were not explored. In 2015, the version for older children was adapted for European Portuguese (Ribeiro et al., 2015). Contrarily to the results of Pereira-Laird & Deane (1997), a one-factor structure obtained a better fit than a two-factor structure. Reliability was high (α = 0.85) and the RSU scores were positively correlated with school marks. Additionally, a test of differences indicated higher RSU scores reported by girls, compared to boys (Ribeiro et al., 2015). Although, studies of the RSU have shown adequate psychometric properties, the measure has been criticized (e.g., Mokhtari & Reichard 2002) for including items that do not seem to be reading strategies. In the study of adaptation for the Portuguese population (Ribeiro et al., 2015), one item that do not seem to be a good representative of the construct – “After I have been reading for a short time, the words stop making sense” –, was dropped, as it had a very low factor loading. More detailed analyses of the items can lead to the identification of additional items with validity issues and thus improve the measure. Additionally, the functioning of the response scale of the RSU has never been studied. Research has made an effort to provide guidelines for the optimal number of categories in Likert-type response scales, but has obtained variable results, suggesting a number varying between 4 and 10 (Lozano et al., 2008; Preston & Colman, 2000). On the one hand, having a very low number of categories can lead to low response variability and having an excessively high number of categories can foster extreme responding, either way decreasing validity (Cox, 1980). Moreover, RSU has been used to test for gender comparisons (Ribeiro et al., 2015), but no evidence of measurement invariance has been provided. Measurement invariance is a fundamental property of any assessment instrument, as it assures that the performance of an individual on it depends only on their level in the latent variable and not on their group of origin. Therefore, fair comparisons between groups should be based on nonbiased items. Differential item functioning (DIF) analyses are a statistical procedure applied at the item level to check whether the items measure aspects other than the latent variable (Walker, 2011). The presence of DIF suggests the presence of bias in the item. Thus, the assertion of the existence of gender differences can only be made after guaranteeing the absence of DIF in the instrument used to measure reading strategy use.

The goal of this study is to address the psychometric properties of the items of the European Portuguese version of the RSU using Rasch model analyses. These analyses have several advantages compared to the most traditional ones in the scope of classical test theory. Among the most relevant are the properties of conjoint measurement and specific objectivity (de Ayala, 2009). In Rasch model analysis, two parameters are estimated: an item parameter (bi) for each individual item, traditionally designated item difficulty, and an ability parameter for each person (θn). These are estimated conjointly and placed on a common logit (logarithm of odds) scale, thus being of interval level. These estimates can be visualized in a continuum on which the items and persons are ordered according to their respective parameter values. Thus, “because of the interval level data and conjoint measurement scale between person ability and item difficulty, Rasch measurement indices are considered item and sample independent” (Sondergeld & Johnson, 2014, p. 584). This property is known as specific objectivity and establishes that the difference between two people in a skill should not depend on the specific items with which it was estimated, and that the difference between two items should not depend on the subjects used to estimate its parameters. Thus, if the data effectively fit the Rasch model, the comparisons between people will be independent of the administered items and the estimates of the parameters of the items will not be influenced by the distribution of the sample that is used (Prieto & Delgado, 2003). Moreover, Rasch analysis provides indicators of how well each item fits the underlying construct (Bond & Fox, 2007), which can be useful to flag RSU items that may not be measuring adequately reading strategy use. Although the Rasch model was originally developed to deal with dichotomous data, the model was extended to polytomous data – the Rasch Rating Scale Model (RSM; Andrich 1978). The RSM has an additional advantage: it allows testing of the functioning of the categories of the Likert scales (Bond & Fox, 2007). Moreover, although there are several statistical procedures to examine the presence of DIF, Rasch-based DIF analyses are one of the most commonly used methods to detect gender-based DIF in educational assessments (Aryadoust et al., 2012). Hence, the specific goals of this study were (a) to provide additional evidence of unidimensionality for the RSU; (b) to assess reliability and the fit of the RSU items to the Rasch RSM; (c) to examine the functioning of the seven-category response scale of the RSU; (d) to investigate the existence of gender-based DIF in the RSU scores; and (e) to explore the relationship between the RSU scores and reading comprehension.

Regarding the first goal, we expect to obtain evidence of unidimensionality, as this structure already obtained the best fit in the previous study with the Portuguese RSU versions (Ribeiro et al., 2015) and unidimensionality is a pre-requisite for the estimation of Rasch model. As the item fit and the response scale of the RSU were never explored, no predictions are presented. Regarding the fourth goal, we expect that the RSU items show no meaningful gender-based DIF, so that fair comparisons between boys and girls in the scores can be made. As research has found some gender differences, with girls reporting using more reading strategies than boys (Afsharrad et al., 2017; Griva et al., 2012; Köse & Güneş, 2021), we expect to replicate this result with RSU, after guaranteeing the absence of meaningful gender-related DIF. Regarding the last goal, we expect to find a positive relation between the RSU scores and the scores in a reading comprehension measure, as previous research has clearly demonstrated a positive correlation between reading strategy use and comprehension, i.e., readers who use reading strategies are likely to achieve better reading comprehension (Follmer & Sperling, 2018; Köse & Güneş, 2021; Liao et al., 2022).

Method

Participants and procedures

The sample was retrieved from a study on reading comprehension [reference omitted]. The study was authorized by the ethics committee of the University of Minho. As the data were collected in schools, authorizations of the school boards and of the Portuguese Ministry of Education were also collected. Written informed consent was also collected from students’ parents or other legal tutors. The participants were administered a battery of tests that included the RSU and reading measures. The RSU and the reading comprehension test were administered to students in group (class) in the students’ classrooms by psychologists with experience in administering these measures. Each measure was administered in two different sessions. Students took about 10–15 min to complete RSU and about 45 min to complete the reading comprehension measure.

A total of 179 students participated, of whom 94 (52.5%) attended fifth grade (mean age = 11; SD = 0.55) and 85 (47.5%) attended sixth grade (mean age = 12; SD = 0.50) in public schools in northern Portugal. The number of boys (N = 90) and girls (N = 89) was similar. Approximately 45% of the students (N = 80) benefited from school social support, which was provided to students from low socioeconomic levels. Most of the children’s mothers had completed only elementary education (59%), whereas 25.9% completed secondary education and 15.1% had a higher education degree.

Measures

Reading Strategy Use (RSU) Scale (Pereira-Laird & Deane, 1997). This is a self-report instrument composed of 22 items that measure students’ frequency of use of cognitive and metacognitive reading strategies when reading narrative and expository texts. Each item consists of a statement that represents a reading strategy, and students should indicate how frequently they use it using a seven-category scale with the following descriptors: 1 = never; 2 = almost never, 3 = seldom, 4 = sometimes, 5 = often, 6 = almost always, 7 = always. Although a two-factor structure had the best fit in New Zealand (Pereira-Laird & Deane, 1997), the results of the validation study for European Portuguese (Ribeiro et al., 2015) suggested a one-factor structure and a high reliability (Cronbach’s alpha = 0.85). Evidence of validity was also provided, as indicated by significant correlations between the RSU scores and students’ school scores. The referenced study (Ribeiro et al., 2015) also suggested that one item should be dropped, and therefore, in this study, a version with 21 items was used. The items can be found in the Appendix.

Test of Reading Comprehension of Narrative Texts (TRC-n; Rodrigues et al., 2020; Santos et al., 2016). This test is composed of five vertically scaled test forms that assess reading comprehension of narrative texts in students from grades 2 to 6. In the present study, the test forms for grades 5 and 6 (TRC-n-5 and TRC-n-6, respectively) were administered. Students should read silently a group of texts and the respective multiple-choice questions (three options) and to mark their responses on an answer sheet. The responses are scored as 0 (incorrect) or 1 (correct). The total raw score of each test form is converted to a standardized score that is placed in a common metric for all test forms. The reliability coefficients are high for the test forms for grades 5–6, ranging between 0.72 and 0.97 (Rodrigues et al., 2020). Regarding validity evidence, scores obtained in these test forms are statistically correlated with scores in other measures of language and reading (Rodrigues et al., 2020).

Data analysis

Rasch RSM analyses were carried out using Winsteps Version 3.61.1 (Linacre & Wright, 2001). To check for unidimensionality, principal component analysis of the Rasch standardized residuals (PCAR) was carried out (Chou & Wang, 2010). The residual is the difference between an observed response and that predicted by the model. To determine the presence of a dominant dimension in the data, two requirements must be met: (a) the percentage of the variance explained by the main dimension must be at least 20% and (b) the eigenvalue of the first secondary dimension must be lower than 3 (Miguel et al., 2013). Correlations of the residuals were also computed to check the local independence of the items, i.e., the measurement principle that asserts that the score in an item depends solely on the person’s latent trait and is not influenced by the responses to other items. Coefficients lower than 0.70 provide evidence for the local independence of the items (Linacre, 2011). To assess the functioning of the response scale, the following requirements were examined: (a) a monotonic increase of the average measures, which indicates that persons presenting a high level on the latent trait will endorse higher response categories; (b) the nonexistence of a response category with an infit or outfit higher than 2.00; and (c) a monotonic increase in the step calibrations (Bond & Fox, 2007; Linacre, 2002a). The step calibrations are the logit values where categories k and k-1 have the same probability of being endorsed. Ordered steps indicate that each category is the most likely to be observed at certain intervals of the measurement scale. Item fit was assessed by analyzing the mean square (MNSQ) infit and outfit statistics. Values between 0.5 and 1.5 indicate a good fit (Gómez et al., 2012). Moreover, the values should not be higher than 2.0 because they suggest severe misfit (Linacre, 2002b). Item-total correlations were also examined, and a minimum of 0.40 was stipulated. Reliability was checked by means of the item separation reliability (ISR) and person separation reliability (PSR) indices. ISR is an estimate of how likely it is to achieve the same ranking of the items in the measured variable given a different sample of comparable ability, and PSR is an estimate of how likely it is to achieve the same ordering of the persons if those were given another set of items that measured the same construct (Bond & Fox, 2007; Gómez et al., 2012). Both reliability estimates range between 0 and 1, and a minimum of 0.70 is advisable. Finally, the presence of gender-based DIF was examined, with boys being the reference group and girls the focal group. The Rasch model DIF statistics are based on the comparison of the difficulty parameters of each item obtained by each group. Empirical evidence of the presence of DIF is provided by statistically significant (p < .05) results in Rasch-Welch’s t test. However, the impact of uniform DIF on test scores should be considered meaningful only when it is found in more than 25% of the items within a dimension (Rouquette et al., 2019) and DIF contrast is higher than 0.50 logit (Linacre, 2011). After checking the absence of DIF, a comparison of differences between boys and girls in mean estimates was performed using Welch’s t test. The Pearson correlation coefficient and linear regression were used to analyze the relationship between the RSU scores and the scores in the measure of reading comprehension.

Results

Dimensionality and local independence

The results of the PCAR indicated that the variance explained by the measures was 57.8%. The unexplained variance of the first contrast of residuals was 4.5% and had an eigenvalue of 2.2. These values provide evidence for the unidimensionality of the measure. The largest standardized residual’s correlation was 0.31, suggesting the absence of local dependence of the items.

Item parameters and category statistics

Table 1 presents the descriptive statistics for the raw scores in each item. For most items there was a balanced distribution of the responses in the seven categories. However, there were some items more prone to extreme responses. This was the case particularly for the two reverse-coded items – Item 14 (“When some of the sentences that I am reading are hard, I give up the reading”) and Item 16 (“When I cannot read a word in the story, I skip it”) –, as they had a large portion of the responses concentrated in the lower end of the scale. In contrast, items 3, 13 and 20 were highly endorsed, suggesting that the strategies described in these items (slow-down reading, go-back in the text and re-reading) are quite frequently used by students.

Table 1 Descriptive statistics of the items’ raw scores

Regarding the items’ Rasch RSM parameters, the two reverse-coded items (items 14 and 16) had item-total correlations lower than 0.40. Item 16 also had an MNSQ oufit higher than 2.0. Therefore, these two items were deleted, and the data were again calibrated. After deleting these items, all item-total correlations were higher than 0.40, and no item had an MNSQ infit or outfit higher than 2.0. Table 2 presents the category statistics for this calibration. The seven-category scale presented a regular distribution of the category frequencies, a monotonic increase in the average measures, and all response categories had fit statistics below 2.00. However, there was no monotonic increase in the step calibrations, as the steps/thresholds were disordered.

Table 2 Category statistics and reliability

Figure 1 presents the category probability curves for the seven-category response scale. Each category should have a distinct peak in the probability curve graph, thus indicating that each category is indeed the most probable for some portion of the measured variable (Bond & Fox, 2007). As shown in Fig. 1, categories two, three and five were never the most likely ones in the continuum. These results suggested that collapsing categories can improve the efficiency of the response scale. The data were calibrated again, and the response scale was recoded to a five-category scale. To maintain the coherence of the scale, we collapsed categories 2 (almost never) and 3 (seldom), as well as categories 5 (often) and 6 (almost always).

Fig. 1
figure 1

Category probability curves for the seven-category response scale

Fig. 2
figure 2

Category probability curves for the five-category response scale

The results for the five-category response scale indicated that the steps increased monotonically, each of the categories was the most likely one across the continuum, the fit statistics improved slightly and there was no decrease in reliability (see Table 2; Fig. 2). Regarding reliability, both PSR and ISR were very high (see Table 2). Table 3 presents the item statistics for this last calibration after recoding the response scale. All items presented an item-total correlation higher than 0.40 and infit and outfit lower than 2.00, suggesting an adequate fit.

Table 3 Reading Strategy Use Scale: item statistics

Differential item functioning (DIF) and relationship with comprehension

Table 4 presents the results of DIF analyses. Four items (Items 9, 10, 13 and 20) had statistically significant DIF. Girls had higher average estimates than boys in Items 9 and 10, and the opposite was verified for Items 13 and 20, in which boys had higher average estimates. However, the size of the contrast was lower than 0.50 in all four cases, and therefore, the DIF is not meaningful. When comparing mean estimates for the 19 final items, girls (mean estimate = 0.40, SD = 0.74) obtained significantly higher scores than boys (mean estimate = 0.15, SD = 0.75), t(176) = -2.245, p = .026.

Table 4 Results of DIF analyses as a function of gender

A significant and positive relationship was found between the total scores of this revised version of RSU and the scores in reading comprehension (r = .246; p < .001). The RSU scores explained about 6% of the variance in reading comprehension (β = 0.246; R2 = 0.06)

Discussion

The main goal of this study was to explore the psychometric properties of the items of the European Portuguese version of the RSU using Rasch RSM model analyses. The first specific goal was to provide evidence of unidimensionality for the RSU. In addition to being a requirement for the Rasch model, a one-dimensional structure had already been previously found in a sample of Portuguese students using confirmatory factor analysis (Ribeiro et al., 2015). The results of this study show that unidimensionality is replicable with a different method to investigate the internal structure of the test. This factor structure contrasts with the one obtained in the study of the original version in New Zealand, where a two-factor structure, distinguishing cognitive and metacognitive strategies, had the best fit (Pereira-Laird & Deane, 1997). In fact, there has been some debate on the differentiation between cognitive and metacognitive strategies, as they are interdependent and sometimes difficult to distinguish. As Veenman (2011) states, “higher-order metacognitive processes monitor and regulate lower-order cognitive processes that, in turn, shape behavior. Thus, drawing inferences is a cognitive activity, but the self-induced decision to initiate such activity is a metacognitive one” (p. 205). In addition, research suggest that cognitive strategy use may not be highly effective without the concomitant use of metacognitive strategies (Zhao et al., 2014). For example, making an outline of what is being read is effective mainly when the reader is able to monitor the process, assess it and revise it when needed. Another example: underlining the main ideas or key concepts involves distinguishing between those that are relevant and those that are not, and this process requires a constant monitoring as the reading advances, and sometimes going back and forth in the text to revise the underlined information. Other studies suggest that promoting the use of cognitive strategies, such as the use of concept mapping, leads to an improvement of metacognitive skills (e.g., Welter et al., 2022). Thus, the one-factor structure observed in the measure analyzed in this study may reflect the fact that cognitive and metacognitive strategies are probably mobilized together frequently.

A second goal was to assess reliability and the fit of the RSU items to the Rasch RSM. The results suggested a high reliability, and all items had a good fit to the model. However, two items were discarded due to low item-total correlations. These items were reverse-coded. Although the inclusion of reverse-coded items has been a long-term recommended practice to avoid the acquiescence effect in self-report measures (e.g., Nunnally 1978), some more recent research has discouraged it. A considerable number of studies have shown that reverse-coded items decrease model fit and frequently form separate dimensions that lack meaningfulness, which is particularly problematic in unidimensional tests (Cassady & Finch, 2014; Clauss & Bardeen, 2020; Essau et al., 2012; Woods, 2006). The reason for this finding may be related to some evidence that suggests that reverse-coded items and straightforward items may involve different cognitive processes and thus not measure the same latent trait (Suárez-Alvarez et al., 2018; Weems & Onwuegbuzie, 2001). Additionally, research has also suggested that combining both types of items in the same test decreases the variability in the responses and leads to worse discriminative power and reliability (Suárez-Alvarez et al., 2018; Vigil-Colet et al., 2020). A content analysis showed additional problems, particularly in the case of Item 16. The adequacy of the strategy included in this item – “When I cannot read a word in the story, I skip it” – is highly ambiguous. On the one hand, skipping difficult words or text parts seems to be a strategy much more used by poor readers than by good readers (Anastasiou & Griva, 2009). Moreover, contrary to poor readers, good readers focus more on constructing the meaning of the text as a whole, instead of focusing on understanding all single words (Lau, 2006), and therefore prefer to use strategies such as activating previous knowledge or imagery (Anastasiou & Griva, 2009; Lau & Chan, 2003). We should also note that skipping difficult words can sometimes be an adequate reading strategy: in some cases, that meaning can be inferred later, as the reading advances further in the text, and, in other cases, not knowing the meaning of some specific words simply does not hinder the comprehension of the text as a whole (Giasson, 2000). For all these reasons, we recommend that Items 14 and 16 should be dropped from RSU.

The third goal of the study was to examine the functioning of the seven-category response scale of the RSU. This analysis is one of the potentialities of the Rasch RSM model (Bond & Fox, 2007). The results indicated that the seven-category response scale was inadequate, as some categories were redundant, i.e., they were never the most likely to be endorsed. A five-category response scale showed better results. The functioning of the response scale in reading strategy use questionnaires has seldom been investigated. However, a five-category Likert scale, ranging from never to always, is frequently the option – see, for example, one the most used measures, the Metacognitive Awareness of Reading Strategies Inventory (MARSI; Mokhtari & Reichard 2002). The results of our study provide empirical evidence for the adequacy of this option.

The fourth specific goal was to investigate the existence of gender-based DIF in the scores of the RSU, as the presence of DIF means that any inferences made from the scores are necessarily biased. The results of our study suggest that although four items obtained statistically significant DIF, the effect size was negligible. Therefore, fair comparisons between boys and girls can be made in reading strategy use assessed with the RSU. Consistent with previous research (Afsharrad et al., 2017; Griva et al., 2012; Köse & Güneş, 2021), the results of our study indicate that girls use reading strategies more frequently than boys.

The final goal was to explore the relationship between the RSU scores and the scores in a reading comprehension measure. Given that previous research has clearly demonstrated a correlation between reading strategy use and reading comprehension (Follmer & Sperling, 2018; Köse & Güneş, 2021; Liao et al., 2022), we expected to find a positive correlation between the scores of both measures, which was indeed found in the results of our study. Nonetheless, the relationship was weak, with reading strategy use explaining only 6% of the variance observed in reading comprehension. Studies in European Portuguese with children in grades 4 to 6 show that, at this stage, reading comprehension still depends heavily on basic reading skills, such as oral reading fluency, and on linguistic skills, such as vocabulary (Fernandes et al., 2017; Rodrigues et al., 2022). Thus, reading comprehension performance can be explained to a lesser extent by reading strategy use. Hence, the findings of this study provide evidence of validity for the revised version of RSU with 19 items and a five-category response scale.

Conclusion

Overall, the findings of the present study add evidence of reliability and validity for the Portuguese version of the RSU, confirming it as a robust measure to assess the frequency of use of reading strategies in students from the fifth and sixth grades. Rasch RSM modeling established the unidimensionality of the scale, suggested changes to be introduced in the Likert scale, and provided evidence of the nonexistence of gender bias. These findings suggest that measures can be improved when a more detailed examination of the items and the response scale are performed, and Rasch modeling offers several possibilities for these analyses that complement the more traditional ones, such as factor analyses. The findings of our study can also serve as a reference framework for the study of the psychometric properties of the modified version of the RSU for younger children (Reutzel et al., 2005), which are yet to be explored. The RSU is a relatively short instrument and can be administered in large groups, making it especially useful for use in educational settings or in research when large amounts of data must be collected in short amounts of time.