Introduction

Research on engagement emerged from professional and occupational contexts. In these contexts, engagement is defined as a positive psychological state that is characterized by vigor, dedication and absorption associated with work-related well-being (Bakker et al. 2008; Hirschi 2012; Schaufeli and Bakker 2010). In recent years engagement has also been studied in educational contexts, namely in higher education (Bresó et al. 2011; Christenson and Reschly 2010; Kuh 2009; Vasalampi et al. 2009). These studies are often present in international research concerning academic learning and achievement (Krause and Coates 2008; Schaufeli et al. 2002).

Students’ academic engagement can be defined as the time, intention and energy students devote to educationally sound activities. Academic engagement is related to the policies and practices that institutions use to induce students to take part in those activities (Hodson and Thomas 2003; Kuh 2005; Wierstra et al. 2003). Research has established that engaged students invest more in their performance, participate more and tend to develop mechanisms to help them persist and self-regulate their learning and achievement (Klem and Connell 2004; National Research Council and Institute of Medicine 2004).

Academic engagement is associated with a positive way of experiencing academic activities and contexts, since it is related to positive academic and social outcomes (Klem and Connell 2004; Wonglorsaichon et al. 2014), to satisfaction and self-efficacy (Coetzee and Oosthuizen 2012), and to a reduction of achievement problems, burnout and dropout (Chapman et al. 2011; Christenson et al. 2012; Christenson and Reschly 2010; Eccles and Wang 2012; Elmore and Huebner 2010; Finn and Zimmer 2012; Fredricks et al. 2004, 2011; Gilardi and Guglielmetti 2011; Reschly and Christenson 2012).

As engagement is a broad meta-construct it can be problematic because various definitions exist both within and across the different types of engagement (Fredricks et al. 2016). Two dominant conceptualizations of academic engagement have emerged in the literature (for a recent debate on academic engagement see Senior and Howard 2015). Schaufeli et al. (2002) adapted the Utrecht Work Engagement Scale (UWES) from the business organizations’ perspective to measure student engagement in university settings. The adapted scale, the UWES – Student version (UWES-S), uses the same three work engagement dimensions (vigor, absorption and dedication) adapted to the university context by rephrasing some of the original UWES items. The other predominant student academic engagement conceptualization by Fredricks et al. (2004) defines academic engagement as a multidimensional construct, integrating behavioral, emotional and cognitive dimensions, which is usually in line with the notion that the behavioral component corresponds to vigor, the emotional one to dedication and the cognitive one to absorption (Christensen 2017). However, criticisms have been raised regarding Salanova et al.’s (2002) and Fredricks et al.’s (2004) student academic engagement conceptualizations. The former was a simple adaptation of the workplace to the university context; the latter was derived mainly for high school students (Marôco et al. 2016).Theorizing academic engagement as a multidimensional construct allows for the better generalization and understanding of academic engagement as a combination of its several factors. Also, analysis of the engagement first-order factors (behavioral, emotional and cognitive) allows for pinpointing the different contribution to overall engagement and direct interventions.

Clarification is needed since some theoretical frameworks almost overlap with previous literature (Fredricks 2015). In the academic engagement literature, there is a need for clear definitions with differentiation between the dimensions within the adopted framework (Fredricks et al. 2004). Raising the importance of having measures that take this into consideration without crossing the content of different dimensions of different factors increases the utility of analyzing the validity evidence of multidimensional psychometric instruments. Marôco et al. (2016) reviewed the main criticisms of both approaches and developed the University Student Engagement Inventory (USEI). This inventory includes the behavioral, cognitive and emotional dimensions of academic engagement, which is the definition and division of dimensions adopted by most research (Fredricks 2015). The behavioral dimension is related to behaviors such as attending classes, arriving on time, doing prescribed tasks/homework in scheduled time, participating in activities in and out of the classroom, and respecting the social and institutional rules. The cognitive dimension refers to all the students’ thoughts, perceptions and strategies related with the acquisition of knowledge or development of competencies to academic activities, for example their study methods, learning approaches and academic self-regulation. The emotional dimension refers to positive and negative feelings and emotions related to the learning process, class activities, peers and teachers, for example a sense of belonging, enthusiasm, and motivation (Antúnez et al. 2017; Carter et al. 2012; Marôco et al. 2016; Sheppard 2011). Validity evidence based on response processes (i.e. face validity) of the behavioral, cognitive and emotional as dimensions of academic engagement was evaluated by a focus group of university students and psychologists in the original proposal of Marôco et al. (2016). In this study, we focus on the validity evidence based on the USEI’s internal structure.

Although there is a consensus about the relevance of this construct to the explanation of academic behavior and learning, there is not a precise delimitation of the construct and its dimensionality (Christenson et al. 2012; Fredricks and McColskey 2012; Kahu 2013; Reschly and Christenson 2012; Wolf-Wendel et al. 2009). A debate is still ongoing concerning the definition and internal structure of the academic engagement construct. This conceptual haziness (Appleton et al. 2008) extends to the dimensionality of the construct’s instruments: Some authors assume it to be a unidimensional primary factor or a second-order factor as it is a general motivational trait or state, while other authors defend its multidimensionality, but without consensus regarding the number of dimensions (Fredricks et al. 2004; Handelsman et al. 2005; Lin and Huang 2018; Reschly and Christenson 2012).

In this paper, we focus on the USEI for the university context and evaluate one of the sources of evidence proposed in the Standards for Educational and Psychological Testing (American Educational Research Association et al. 2014) regarding the validity evidence based on the internal structure. Specifically, we aim to find good validity evidence regarding the dimensionality of the first-order three-factor model (H1) of a possible second-order latent factor model (H2), measurement invariance for gender (H3) and for the scientific area of college graduation (H4), and good evidence of reliability of the scores through internal consistency using several estimates (H5). This type of validity indicators intends to demonstrate the relevance of an instrument that simultaneously can be useful to investigation and practice. Namely, shown evidence of a meta-construct (academic engagement) which is useful for research, demonstrating the utility of its specific domains for interventions with specific students´ subgroups.

Method

Validity is a vital issue when it refers to the quality of psychometric scales, and it refers to the extent to which the evidence supports the interpretation of scale scores (Crutzen and Peters 2017). Validity concerns the understanding of scale scores in a specific study; it isn’t a characteristic of a scale in itself (American Educational Research Association et al. 2014). Consequently, evidence from other studies must be used to justify the choice of a specific scale, although in a strict sense it doesn’t guarantee the same validity evidence in a new study (Crutzen and Peters 2017). Nevertheless, every study that uses psychometric scales must pay attention to the validity evidence brought by each scale in each study. Historically, different types of validity have been approached; the current Standards for Educational and Psychological Testing evolved after the first version, more than 60 years ago (American Psychological Association 1954). The current Standards approach validity as a unitary concept, with five sources of validity recognized (Sireci and Padilla 2014): based on internal structure, based on test content, based on the relation to other variables, based on response processes and based on the consequences of testing. Although these are not considered to present distinct types of validity, an inclusive evaluation of the instrument includes these different sources of evidence in a coherent account (American Educational Research Association et al. 2014).

Validity evidence based on the internal structure includes three basic aspects: dimensionality, measurement invariance and reliability (Rios and Wells 2014). To assess dimensionality, one can opt for several factor analytic methods; however, confirmatory factor analysis (Brown 2015) is the most comprehensive approach for comparing observed and hypothesized test structures, as it evaluates the relationships between items and the latent variables (theoretical constructs) and which items should be measured (Bollen 1989).

Measurement invariance assesses whether an instrument is fair for different subgroups from a psychometric perspective (van de Schoot et al. 2012), such as occupations (Sinval et al. 2018), countries (Reis et al. 2015), genders (Marsh et al. 2010) and other groups. It can also be evaluated using different statistical approaches, with multigroup confirmatory factor analysis being the most popular (Davidov et al. 2014). This approach consists of setting increasingly constrained sets of structural equation models, and comparing the more restricted models with the less restricted models (van de Schoot et al. 2015).

Since the validity of scores depends on their reliability (American Educational Research Association et al. 2014), without reliability we can’t have appropriate validity evidence (Kaplan and Saccuzzo 2013). It can be evaluated with different techniques, although the most usual is through internal consistency estimates, such as Cronbach’s α, Revelle’s β or McDonald’s ωh (Zinbarg et al. 2005). It provides evidence about the consistency of the test scores across repeated administrations (American Educational Research Association et al. 2014).

Participants

A sample of 908 Portuguese first-year university students (ages ranging from 17 to 58 years; M = 19.41; SD = 4.79; Mdn = 18) from a public university in the north of Portugal was used to evaluate the psychometric properties of the USEI. These students commonly took courses in three main areas: 40.18% were from technology or engineering courses; 29.52% from economics or law courses; and 30.30% from languages or humanities. Most students were women (64.58%) and only 8.57% had a part-time or full-time occupation. With respect to parents’ level of education, 50.65% of mothers had a basic education level, 30.27% a high school level and 19.08% a higher education level; meanwhile, 58.99% of fathers had a basic education level, 24.64% had a secondary level and 16.38% a higher education level.

Measures and Procedures

The USEI (Marôco et al. 2016) is a self-report Likert-type (1 = “never” to 5 = “always”) scale with 15 items organized in three academic engagement dimensions: behavioral (BE; e.g. I usually participate actively in group assignments), cognitive (CE; e.g. I like being at school) and emotional (EE; e.g. I try to integrate the acquired knowledge in solving new problems). This instrument presented good evidence of reliability and factorial, convergent and discriminant validity evidence in a previous research study (Marôco et al. 2016). Exploratory and confirmatory factor analyses confirm systematically specific items for each dimension. Reliability coefficients in terms of the consistency of items are above .63 (ordinal omega values) and above .69 (ordinal alpha values) for three dimensions.

A non-probabilistic convenience sample was considered, with the inclusion criterion being students entering university. Data were collected in the classroom context with the permission and collaboration of teachers. The aims of the study were presented, and confidentiality was ensured. The participants provided informed consent stating their voluntary agreement to participate in the study. Ten minutes were enough to fill in the inventory and give some personal information for sample characterization.

Data Analysis

All statistical analysis was performed with R (R Core Team 2018) and RStudio (R Studio Team 2018). The descriptive statistics were obtained using the skimr package (McNamara et al. 2018). Confirmatory factor analysis (CFA) was conducted to evaluate the psychometric properties of the data gathered with the USEI, namely its internal structure validity evidence. CFA was performed with the lavaan package (Rosseel 2012) using the weighted least squares means and variances (WLSMV) estimation method, which is indicated for nonlinear response scales. Internal consistency reliability estimates for ordinal variables, average variance extracted (AVE) and heterotrait-monotrait (HTMT) were calculated using the semTools package (Jorgensen et al. 2018), while Mardia’s Kurtosis (Mardia 1970) was assessed using the psych package (Revelle 2018).

The CFA was conducted to verify whether the proposed three-factor structure presented an adequate fit for the study sample data. We used as goodness-of-fit indices the TLI (Tucker-Lewis Index), χ2/df (ratio of chi-square to degrees of freedom), the NFI (Normed Fit Index), the CFI (Comparative Fit Index) and the RMSEA (Root Mean Square Error of Approximation). The fit of the model was considered good for CFI, NFI and TLI values above .95 and RMSEA values below .06 (Hu and Bentler 1999; Marôco 2014).

To analyze convergent validity evidence, the AVE was estimated as described in Fornell and Larcker (1981). Values of AVE ≥ .5 were considered acceptable indicators of convergent validity evidence. To determine whether the items that are manifestations of a factor were not strongly correlated with other factors, discriminant validity evidence was assessed. Acceptable discriminant validity evidence was assumed when for two factors x and y, AVEx and AVEy ≥ ρ2xy (squared correlation between the factors x and y), or when the HTMT (Henseler et al. 2015) ratio of correlations is higher than .85 (Kline 2016).

The reliability of the internal scores evidence was assessed through internal consistency measures. The ordinal Cronbach’s alpha coefficient (α; Zumbo et al. 2007) and composite reliability (CR) were calculated. Since alpha has been shown to present evidence of a measure’s internal consistency only when the assumptions of the essentially tau-equivalent model are obtained (Revelle and Zinbarg 2009), the ordinal coefficient omega (ω) for each factor (Raykov 2001) and the hierarchical omega (ωh) coefficient (Green and Yang 2009; Kelley and Pornprasertmanit 2016; McDonald 1999) were also calculated. Higher alpha values are desirable, although excessively high values of alpha aren’t recommended, as this reveals unnecessary repetition and overlap (Streiner 2003). Values of CR ≥ .7 were considered to be satisfactory indicators of internal consistency (Marôco 2014). Omega values show evidence of how much of the overall variance of a factor in the data that is due to that specific factor, ω, was calculated for each of the three factors. As regards the ωh, a higher value will indicate a stronger influence of the latent variable common to all of the factors, and that the observed scale scores generalize to scores for the common latent variable (Zinbarg et al. 2007). The second-order factor reliability was also calculated using the omega coefficient (Jorgensen et al. 2018).

The measurement invariance of the second-order model was assessed with the lavaan package (Rosseel 2012), and we established a set of comparisons within a group of seven different models based on the recommendations for ordinal variables (Millsap and Yun-Tein 2004) and for second-order models (Chen et al. 2005). An initial configural model was set, which served as a baseline (configural invariance) for further equivalence testing (Edwards et al. 2017). Next, metric invariance of the first-order factor loadings was tested with the items’ loadings forced to be equal across groups; this assessed whether the subgroups attribute the same meaning to the different instrument items. The next step consisted in forcing the second-order factor loadings to be equal across groups; this checked whether the subgroups give the same meaning to the factors that compose the second-order latent factor. Afterwards, scalar invariance of the first-order factors was tested, where thresholds were added to be equal across groups (Millsap and Yun-Tein 2004). If scalar invariance was obtained, it meant that the means or the thresholds of the items are also equal across the subgroups, enabling comparisons between the different subgroups. Next, scalar invariance of the second-order latent factor was tested, where the intercepts of the first-order latent variables were forced to be equal across groups. This checked whether the first-order latent levels were equal across groups. Usually, this was enough for measurement invariance, since the next levels are too restrictive (Marôco 2014). After, the disturbances of first-order factors were established as being equal across groups, to verify if the explained variances for the first-order latent factors were equal across groups. Finally, if residual variances were also added to be equal across groups without statistically significant differences, full uniqueness measurement invariance was obtained, which means that the explained variance for all items didn’t change in regard to the subgroup (van de Schoot et al. 2012). Invariance across the different levels can be assessed using two different criteria: the ΔCFI < .01 between constrained and free models(Cheung and Rensvold 2002), and the Δχ2 test comparing the fit of the constrained vs. free models is not statistically significant (Satorra and Bentler 2001).

Results

Items’ Distributional Properties

Summary measures, skewness (sk), kurtosis (ku) and a histogram for each of the 15 items are presented (Table 1) and were used to judge distributional properties and psychometric sensitivity. Absolute values of ku smaller than 7 and sk smaller than 3 were considered an indication of not strong deviations from the normal distribution (Finney and DiStefano 2013). Mardia’s multivariate kurtosis for the 15 items of the USEI was 37.5; p < .001. All possible answer values for each item are also present, and no outliers were deleted. Also interesting is the reduced number of missing answers from the 15 items (11 omissions from item 10 “My classroom is an interesting place to be”).

Table 1 Distributional properties of USEI’s items.

The items’ distributional coefficients are indicative of appropriate psychometric sensitivity, as it would be expected that these items would follow an approximately normal distribution in the population under study. Despite these univariate and multivariate normality indicators, the WLSMV estimator was used to account for the ordinal level of measurement of the items, which can be done without concerns about this estimate.

Factorial Validity Evidence

In light of the previous researches on the USEI structure confirming the existence of three dimensions, a confirmatory factor analysis was performed. The hypothesized three-factor model’s fit with the data was good (Fig. 1; correlations between latent variables, and factor loadings for each item are shown), since CFI, NFI and TLI values were greater than .95, and RMSEA values were less than .06. It is also important that the factor loadings of all items are greater than .50, except for item 6 (the only reversed coded item in the instrument).

Fig. 1
figure 1

Confirmatory factor analysis of the University Students Engagement Inventory (15 items) with first-year Portuguese university students (χ2(87) = 286.665, p < .001, n = 871, CFI = .987, TLI = .985, NFI = .982, RMSEA = .051, P(RMSEA ≤ .05) = .356, IC90].045; .058[. R – Reversed

Convergent Validity Evidence

The average variance extracted (AVE) was acceptable for EE (.54), nearly acceptable for CE (.49) and low for BE (.31). The convergent validity evidence was acceptable for the CE and EE factors and unsatisfactory for the BE factor.

Discriminant Validity Evidence

Comparing data from these three dimensions, the AVE for EE (AVEEE = .54) was greater than r2BE.EE (.36), but the AVEBE = .31 was lower, the AVECE = .49 and AVEEE = .54 were both greater than r2EE.CE = .24, and the AVEBE = .31 and AVECE = .49 were both less than r2BE.CE = .52. The discriminant validity evidence was good for CE and EE, insufficient for BE and EE, and poor for BE and CE. With regard to the HTMT criterion, the HTMTBE.EE = .60, HTMTBE.CE = .73 and HTMTEE.CE = .51, with all being below the recommended threshold. Together, these findings detect strong correlations/overlap among the three latent constructs. This points to a possible higher-order latent factor.

Second-Order Construct

We tested the possible existence of a higher-order latent variable, the meta-construct academic engagement, which was hypothesized by the original authors (Marôco et al. 2016), and suggested also by our lack of discriminant validity evidence findings. In regard to the USEI with a second-order latent factor, overall the goodness-of-fit indices were good (Fig. 2; gamma between the second-order latent factor and the first-order latent factors, and factor loadings for each item are shown). The structural weights for the academic engagement second-order factor model were medium/high: behavioral engagement (γ = 0.93; p < .001); emotional engagement (γ = 0.64; p < .001); and cognitive engagement (γ = 0.77; p < .001).

Fig. 2
figure 2

Confirmatory factor analysis of the University Students Engagement Inventory (second-order model – 15 items) with first-year Portuguese university students (χ2(87) = 286.665, p < .001, n = 871, CFI = .987, TLI = .985, NFI = .982, RMSEA = .051, P(RMSEA ≤ .05) = .356, IC90].045; .058[. R – Reversed

Reliability: Internal Consistency Evidence

In terms of the hypothesized reliability evidence, the results suggest good evidence of internal consistency reliability (Table 2). The alpha values were higher than the omega values for all factors and for the total scale. The hierarchical omega for the total scale was good (ωh = .85), which suggests a well-defined latent variable, thereby evidencing that this latent variable is more likely to be stable across studies, which also suggests that the general factor academic engagement is the dominant source of systematic variance (Rodriguez et al. 2016).

Table 2 Internal consistency of USEI dimensions for the total sample

The internal consistency reliability of the second-order latent variable was good. The proportion of observed variance explained by the second-order factor after controlling for the uniqueness of the first-order factor (ωpartial L1) was .87, the proportion of the second-order factor explaining the variance of the first-order factor level (ωL2) was .87 and the proportion of the second-order factor explaining the total score (ωL1) was .72.

Measurement Invariance by Gender and Scientific Area of Graduation

Finally, to detect whether the same second-order latent model holds in different scientific areas of graduation and genders, a group of nested models with indications of equivalence is needed. The hypothesized full-scale invariance was supported for gender (Table 3) using the Cheung and Rensvold (2002) ΔCFI criterion, while the Δχ2 criterion supported only the second-order metric invariance. In regard to the hypothesized structural invariance among different areas of study, full-scale invariance was supported by the ΔCFI criterion, nevertheless the ΔCFI value for the first comparison was marginal at .010, although the Δχ2 supported it, after we continued with the comparisons; the Δχ2 criterion allowed only the first-order metric invariance (see Table 4). In both cases, the ΔCFI criterion was preferable, since the Δχ2 is too restrictive (Marôco 2014).

Table 3 USEI’s models comparisons for gender
Table 4 USEI’s models comparisons for scientific area of graduation

Discussion

Hypotheses Findings

This study obtained findings that allow our H1 to be confirmed, since the data gathered with the USEI presented good psychometric properties in terms of validity evidence based on the internal structure, something that was observed in other studies with this instrument, which obtained acceptable/good overall goodness of fit (Costa et al. 2014) and good overall goodness of fit (Marôco et al. 2016). The confirmatory factor analysis presented good evidence about factorial validity, since goodness-of-fit indices values ranged from very good to good, and only item 6 had a lambda of less than .50. Analyzing its content, item 6 is the only one reverse coded, which suggests that it should be presented in the same direction as the other items in the future. Maroco et al. (2014) report this kind of improvement in the items’ correlations in student burnout (an opposite construct to academic engagement). The USEI’s convergent validity evidence is acceptable and the AVE values were good for the EE dimension, marginally acceptable for CE and less than acceptable for BE. These values show that the items of each dimension were good manifestations of the factors they load onto. The discriminant validity evidence of the instrument was acceptable for two of the three factors. The lack of discriminant validity evidence for BE may be due to our sample being composed only of freshmen; in the original USEI study (Marôco et al. 2016) with students from other academic years, this lack of discriminant validity evidence was not observed.

Our H2 was confirmed, something that has been tested by the original authors, with whom our results were aligned in terms of structural weights, with behavioral engagement having the highest gamma, followed by cognitive engagement and finally emotional engagement (Marôco et al. 2016).

With regard to H3 and H4, our results brought evidence that allows comparisons to be established between male and female genders using the USEI, and between first-year students from technology or engineering courses, from economics or law courses, and from languages or humanities. This finding was a novelty of our study, and is useful since previous studies only assessed engineering students (Costa et al. 2014; Costa and Marôco 2017) or – even with a sample from different courses – didn’t test measurement invariance for the scientific area of graduation (Marôco et al. 2016). Another novelty of this study was the test of the second-order measurement invariance, since the only study that has tested measurement invariance using this instrument (Marôco et al. 2016) did so only to compare the structure between two independent samples without comparing specific scientific areas of graduation, and regarding the first-order model. This finding will enable future comparisons among these different groups to verify possible differences and their impact on academic adjustment and achievement.

With regard to the evidence obtained about reliability, it was good for CR, ordinal α, ordinal ω and ωh, suggesting adequate reliability of the data measured with the USEI. Our results confirm our H5, and – nevertheless – are aligned with what was found in other studies, where BE obtained lower reliability estimates than EE and CE (Costa et al. 2014; Marôco et al. 2016).

Academic engagement is a relevant construct for describing student adaptation and achievement in higher education. Engaged students tend to invest more in their performance and develop strategies to persist in and to self-regulate their learning (Christenson and Reschly 2010; Dılekmen 2007; Fredricks et al. 2011; Klem and Connell 2004). Consequently, better academic success is expected (Lee 2014). In the literature, some consensus exists defining academic engagement as a multidimensional construct, integrating behavioral, emotional and cognitive dimensions (Fredricks et al. 2004). Our data from the USEI confirm these three dimensions for describing students’ academic engagement. Albite the second-order construct (academic engagement) presents higher path loadings in the behavioral dimension (γ = .93) than emotional and cognitive engagement, respectively γ = .64 and γ = .77. The differences are pertinent and are in line with the expected, since our sample was constituted by freshmen. The literature suggests that the first-years students have less maturity and autonomy to cope with the challenges of higher education (Bernardo et al. 2017; Pascarella and Terenzini 2005). The first-year college students have their academic engagement more expressed on behavioral terms, which can be seen on the academic routines and tasks (e.g. attend to classes, group assignments).

Based on the foregoing discussion, we conclude that the USEI presents good validity evidence about its internal structure, presenting promising results for future studies related to other sources of validity and different university students’ samples. This instrument can become an interesting tool for education and psychology researchers for analyzing the relationship between the different types of academic engagement and other personal and academic variables important for students’ adjustment and academic achievement.

Although domain-specific subject areas aren’t included in the instrument, they may contribute to understanding the extent to which engagement is content-specific, and to what extent it represents a general engagement tendency (Fredricks et al. 2004). Since this was a study carried out with a sample of university students from different courses, it wasn’t desirable to have a different version for each course, and it wasn’t practical either because of time and resource constraints. If one wants to understand and study a specific academic engagement dimension, this kind of more inclusive instrument might be insufficient, although if the goal of the study is to obtain a single measure for each of the three types of academic engagement, this instrument may be a good choice, since it addresses each construct with few items, and the last word is given to the researcher.

Theoretical Implications

This study presents some theoretical findings that can enable a better understanding of academic engagement as a multidimensional construct. USEI revealed a three-factor structure that appears to be indicative of a higher-order construct, academic engagement. This makes USEI unique regarding the potential of its conceptualization of academic engagement as a meta-construct (Fredricks and McColskey 2012), which is important to define well in terms of its subdimensions (Fredricks et al. 2016). The results emphasize that this conception of academic engagement works well in different scientific areas of college graduation courses. There are some other subject specific instruments (Kong et al. 2003; Wigfield et al. 2008), but USEI has the particularity of being a general measure of academic engagement for university students.

This is the first report addressing the USEI validity for students majoring different study areas. The behavioral and emotional components of academic engagement in this instrument didn’t present the desired discriminant validity evidence, appearing to be somehow related to their content. Our validity evidence supports a consistent alignment with the academic engagement construct definition, showing good psychometric properties for the study sample. As a convergence or product of motivation and active learning behaviors, academic engagement works as a relevant variable with a strong impact in predicting the student’s permanence and success in completing his or her course in higher education (Alrashidi et al. 2016; Barkley 2010; Kuh 2001).

Practical Implications

As for practical implications, USEI can be considered a tool with good psychometric properties that can measure the perceptions of academic engagement behaviors, emotions and internal cognitions in first-year university students. It is an instrument that was specifically designed for university students and it is available for free. This can be done across groups of different scientific areas of graduation and genders, without losing the desirable measurement invariance that enables direct comparisons of scores between those groups. This is something that hasn’t been done before across gender or the scientific area of graduation. These findings together can bring confidence to the measures obtained using the USEI, knowing the academic engagement predictive relation with other variables. For example, Costa and Marôco (2017) found that the emotional subdimension of academic engagement had a statistically significant relation with students’ dropout thoughts. Consequently, this is an important implication, since USEI can be useful to assess interventions for specific dimensions of students’ engagement. USEI is particularly useful for measuring cognitive engagement and emotional engagement, that are not directly observed (Fredricks and McColskey 2012). With USEI these subdimensions don’t need to be inferred from behavioral indicators or teacher rating scales, avoiding potential inferences through those other methods (Appleton et al. 2006).

Self-report instruments have several advantages over other methods, they are practical and relatively low cost tools for group or large-scale assessments (Mandernach 2015). This allows to obtain data over several waves and establish different types of comparisons (e.g. universities, courses). The large-scale assessment of academic engagement enables teachers, policymakers and administrative boards to assess students’ learning status and their academic life experiences (Coates 2005), making it possible to obtain relevant instructional feedback to the institution’s decision-makers and to the students themselves regarding the measured constructs (Banta et al. 2009; Kember and Leung 2009). In this sense, due to its psychological, contextual nature and complexity, academic engagement assessment should take a multidimensional approach considering the behavioral, emotional and cognitive aspects (Alrashidi et al. 2016; Mandernach 2015). This multidimensional approach allows for differential analysis. For example, on the levels and types of investment in relation to the scientific areas and to differentiated subgroups of students according to their socio-cultural origin or their vocational career projects. With the Bologna Declaration (1999), governments in European countries advocated for higher education to value and be based on the active participation of students in their skills development and learning. This perspective benefits from brief and multidimensional instruments that ensure a large-scale assessment of the students’ levels of academic engagement as related to their behavioral, cognitive and emotional aspects.

Conclusions

All the research hypotheses were confirmed, pointing to the validity evidence of the obtained findings, something that goes in line with other previous studies (Costa et al. 2014; Costa and Marôco 2017; Marôco et al. 2016). There seems to be evidence that the USEI is an appropriate psychometric instrument for the academic engagement framework adopted, which is multidimensional and comprised of observable behaviors, emotions and internal cognitions. Thus, it can help to capitalize on the potential of academic engagement as a multidimensional construct (Fredricks 2015), with a higher-order dimension, academic engagement. Our findings bring clarity regarding the psychometric properties of this promising instrument, which can successfully measure the three different kinds of academic engagement from the most adopted theoretical framework. This is the first instrument that enables Portuguese university students to do so. Due to its reduced number of items this instrument can be adequate to research proposes in large scale related with academic engagement, and to practical purposes at the intervention levels can allow to identify dimensions where teachers and university staff can design interventions based on the specificities of each scientific areas or students’ subgroups.

Future studies should address longitudinal research designs, such as longitudinal measurement invariance and measurement invariance for public/private universities, with students from different graduation years, something that isn’t often implemented as it should be, since it is a condition for making proper comparisons between different groups (Davidov et al. 2014). Future studies should also look at transcultural validity of the USEI in different languages, other than the European/Brazilian Portuguese for which the USEI was initially developed.

Also, other kinds of validity evidence should be addressed, such as evidence of validity based on relationships with measures of other variables like student achievement, drop-out, burnout and well-being (McCoach et al. 2013). Our sample only included first-year students from a Portuguese public university, and it is desirable that other and more diverse scientific areas of graduation should be included, such as students from private universities and different grades, and also students with another status (such as a student worker).