Introduction

Approximately half of all students who enter a 4-year postsecondary institution will fail to complete a bachelor’s degree within 6 years of enrolling in higher education (National Center for Education Statistics [NCES] 2012). Moreover, students of color attain degrees at rates substantially lower than their White peers. For example, while 62 % of White undergraduates who begin college at a 4-year postsecondary institution will complete a bachelor’s degree within 6 years of matriculation, that figure is only 39, 40, and 50 % for their American Indian and Alaskan Native, Black, and Latino peers, respectively (NCES 2012) In addition, while Asian Americans exhibit high levels of educational success in the aggregate, many ethnic groups within the larger Asian American racial category face significant racial and ethnic disparities in college degree attainment as well. For instance, Vietnamese (26 %), Hmong (14 %), Cambodian (13 %), and Laotian (12 %) Americans hold bachelor’s degrees at rates far lower than the national average of 28 % (Museus 2013).

In light of the aforementioned low rates of bachelor’s degree attainment, it is important for higher education researchers, policymakers, and practitioners to better understand how they can construct institutional environments that maximize success among increasingly diverse student populations in college. To promote such understanding, new conceptual models and instruments for examining, understanding, and explaining the process by which college and university environments can and do influence the experiences and outcomes of racially and ethnically diverse student populations are warranted. Accordingly, the purpose of the current examination was to develop and test the validity of the Culturally Engaging Campus Environments (CECE) Scale among diverse college students.

The overarching question that guided the study is, to what extent is the CECE Scale a valid tool for measuring CECEs? Two overarching questions guided the inquiry: (1) What level of content validity characterizes the CECE Scale? (2) To what extent is the CECE Scale characterized by construct validity? (3) To what extent is the construct validity of the CECE Scale consistent between White students and students of color?

This inquiry contributes to extant literature in multiple ways. First, it is one of the first analyses to use the CECE Model as a conceptual lens in the analysis of campus environments. Thus, the current inquiry serves as an initial test regarding whether the CECE Model might be a useful framework for research on campus environments. Second, to the extent that the current study provides evidence of valid and reliable CECE constructs, it can result in a scale that can be utilized in future studies to examine the impact of culturally relevant and responsive campus environments on student outcomes. Therefore, the current examination contributes to existing literature by generating an analytic tool. Finally, because of the parallel analysis between White students and students of color, this investigation can provide insight into whether the CECE Model and Scale might be more or less relevant for students from different racial backgrounds

In the next section, we provide a brief overview of traditional frameworks of student success and their limitations. Next, we discuss more culturally relevant concepts and perspectives that have been proposed in response to more traditional frameworks and their limitations. Then, we describe the CECE Model, which was proposed to incorporate the strengths and address the limitations of the aforementioned perspectives. Next, we discuss the CECE Scale, which is designed to measure the core environmental construct within the CECE Model, and the forms of validity of the CECE Scale that were the focus of this study. The remainder of this paper details our analysis of the content and construct validity of the CECE Scale among White students and students of color.

Traditional Frameworks of College Success

Several theoretical frameworks have been proposed to understand college student success (e.g., Astin 1993; Kuh et al. 2005; Tinto 1987, 1993). Some of the most commonly utilized frameworks of student success focus on explaining the nature of students’ connections to their institutions. For example, Tinto’s (1987, 1993) theory of student integration is arguably the most frequently cited model of college student success. Tinto argued that students’ levels of integration into the academic and social subsystems of campus are positively related to student persistence and degree completion. The integration model provided the basis for a significant body of literature that has advanced knowledge about the relationship between undergraduates and their respective colleges and universities.

Tinto’s (1987, 1993) student integration theory has also been thoroughly critiqued for its limited ability in explaining success among diverse student populations (Attinasi 1989; Bensimon 2007; Braxton and Lien 2000; Braxton et al. 1997; Hurtado and Carter 1997; Museus 2011; Rendón et al. 2000; Tierney 1992, 1999). In his review of literature, Museus (2014) summarized four main critiques of Tinto’s theory. First and foremost, the culturally biased foundations critique refers to the fact that Tinto’s theory is based on a set of cultural propositions that he borrowed from the field of anthropology that suggest that students must sever ties with their pre-college cultures and integrate (i.e., assimilate) into the dominant cultures of their campuses to succeed (Attinasi 1989; Tierney 1992, 1999). Scholars have argued that this assumption is culturally biased because students of color are more likely to come from cultures significantly incongruent with the cultures of their campuses and the integrationist perspective assumes that these students must sever ties with their cultural heritage and communities to assimilate into the often White, middle-class cultures of their campuses.

Second, the self-determination critique is closely related to the culturally biased foundations critique and has to do with the notion that the integration perspective and research that it has catalyzed focuses on student behavior, while not giving adequate attention to the responsibility of institutions to facilitate student success (e.g., Bensimon 2007; Rendón et al. 2000). They note that this lack of focus on institutional responsibility is particularly problematic for students of color because they, on average, are burdened with making a more significant transition and adjustment to predominantly White postsecondary environments.

Third, the integration viability critique refers to the fact that higher education researchers have questioned the viability of both academic (in general) and social (on commuter campuses) integration as valid and reliable predictors of college student success (e.g., Braxton and Lien 2000; Braxton et al. 1997; Hurtado and Carter 1997; Museus 2011). First, these researchers note that evidence in support of integration as a valid predictor of success is mixed at best (Braxton and Lien 2000; Braxton et al. 1997). Second, integration is often operationalized in ways that are relevant to the activities in which White students engage while excluding common modes of participation among students of color (Hurtado and Carter 1997). Third, it has been argued that the academic-social divide constitutes an artificial dichotomy because campus spaces, programs, and activities can include both academic and social elements (Museus 2011).

Finally, scholars have discussed the psychological dimension critique, or the notion that studies employing the integration model have disproportionately focused on participation in academic and social activities, while not giving sufficient attention to the reality that White students and students of color can experience the same activities in qualitatively different ways (e.g., Hurtado and Carter 1997). These researchers underscore the importance of understanding how students perceive their campus environments and their connections to those environments, rather than simply whether they engage in institutional activities.

Other traditional frameworks, such as those that focus on measuring and understanding student involvement and engagement, have also made substantial contributions to current levels of understanding regarding how student behaviors can contribute to success (Astin 1993; Kuh et al. 2005). While these frameworks have been invaluable in advancing understandings of college experiences and outcomes (Pascarella and Terenzini 2005), they share some of the limitations of Tinto’s theory. For example, it has been argued that researchers studying involvement and engagement often fail to adequately utilize a culturally relevant lens, do not give sufficient attention to institutions’ responsibility to cultivate culturally inclusive environments, and place greater emphasis on student behavior then qualitative psychological dimensions of students connections to college campuses into their frameworks (Dowd et al. 2011; Museus 2014).

Culturally Relevant Frameworks of College Success

In recognition of the limitations of the Tinto model, researchers have proposed more culturally relevant concepts and perspective that challenge the aforementioned lenses that have dominated the discourse around student success (Cabrera et al. 1990, 1992; Guiffrida 2006; Hurtado et al. 2012; Hurtado and Carter 1997; Museus 2011; Museus et al. 2008; Museus and Quaye 2009; Nora et al. 1996; Rendón et al. 2000; Tierney 1992, 1999). These more culturally relevant concepts and perspectives can be categorized into three different groups (Museus 2014). First, scholars have offered model revisions, in which they have added constructs to the Tinto model to show that other factors (e.g., campus climate, finances, etc.) that are not included in the Tinto integration model are critical predictors of success (Cabrera et al. 1990, 1992; Museus et al. 2008; Nora et al. 1996). In the 1990s, these models were instrumental in challenging the limited explanatory power of Tinto’s model in studies of success among diverse populations, but they make incremental revisions and do not account for all of the aforementioned limitations of the integration model (Museus 2014).

Second, researchers have offered conceptual divergences from Tinto’s theory (Hurtado and Carter 1997; Museus 2011; Rendón 1994; Rendón et al. 2000; Tierney 1992, 1999). That is, scholars have created new concepts to challenge researchers and postsecondary educators to think in more culturally conscious ways about student success. For example, Tierney (1992) invoked the concept of cultural integrity, which suggest that institutions should create programs and practices that reflect and engage the cultural identities of college students of color to foster their success. In doing so, Tierney shifted the focus of student success discourse to institutions’ responsibility to offer culturally relevant education. While these conceptual divergences shift the discourse to be more culturally conscious, they do not provided more holistic explanations of how institutions shape students’ experience and outcomes (Museus 2014).

Finally, scholars have offered new holistic perspectives (Hurtado et al. 2012; Kuh and Love 2000; Museus and Quaye 2009). Museus and Quaye (2009), for instance, utilized Kuh and Love’s (2000) cultural perspective of student success and the voices of 30 students of color at a predominantly White institution to generate a new empirically grounded intercultural perspective of minority student success. Their perspective suggests that cultural dissonance—the tension that students experience when they encounter new cultural knowledge—is inversely related to engagement and persistence. They also assert that the extent to which students of color connect with individual and collective cultural agents that validate their cultures and value academic achievement is positively related to student success. Museus (2014) argued that researchers have not adopted these holistic perspectives to the same extent that more traditional frameworks have been applied across the field because they do not provide easily quantifiable and testable models of college success. In response, he proposed the CECE Model, to which we now turn.

The Culturally Engaging Campus Environments (CECE) Model of College Success

The CECE Model emerged from the aforementioned traditional and more culturally relevant frameworks of success, the limitations of these perspectives, and over 100 qualitative interviews with racially and ethically diverse students (Museus 2014). In addition, unlike other comprehensive culturally relevant perspectives of student success (e.g., Hurtado et al. 2012; Museus and Quaye 2009), the CECE Model utilizes 30 years of literature on diverse students’ success to outline a set of quantifiable elements of campus environments that research suggests are associated with success among diverse populations and a complex set of hypotheses that can be tested via quantitative methods (Fig. 1). The CECE Model suggests external influences (e.g., family, employment, and finances), precollege characteristics (e.g., sex, race, age, and socioeconomic status), and campus environments factors influence individual experiences and outcomes in college. The focal point of the CECE Model posits that CECEs are associated with more positive individual factors (e.g., sense of belonging, academic dispositions, and academic performance) and eventual success (e.g., learning, persistence, and degree completion) (Fig. 2). Moreover, the Model indicates that there are nine characteristics of CECEs, which can be grouped in two separate and interrelated categories: The first five indicators measure the extent to which campus environments are measures of cultural relevance, while the remaining four indicators measure the extent to which institutional environments are measures of culturally responsiveness.

Fig. 1
figure 1

The Culturally Engaging Campus Environments (CECE) model of college success

Fig. 2
figure 2

The focal point of the Culturally Engaging Campus Environments (CECE) model

Cultural Relevance

There are five indicators that focus on the extent to which campus environments are culturally relevant to diverse student populations. First, cultural familiarity refers to college students’ opportunities to physically connect with various institutional agents (e.g. faculty, administrators, staff, and peers) who have similar backgrounds and experiences as them. Second, culturally relevant knowledge emphasizes the importance of opportunities for students to learn and exchange knowledge that is relevant to their own cultural communities. Third, cultural community service refers to the opportunities that students have to engage in curricular and co-curricular efforts to positively transform and give back to their own cultural communities. Fourth, cross-cultural engagement refers to students’ opportunities to engage in meaningful interactions with peers of different cultural backgrounds that focus on solving real social and political problems. Fifth, cultural validation has to do with the extent to which students feel like their cultural backgrounds and identities are valued at their respective institutions.

A substantial body of work suggests that cultural familiarity (Guiffrida 2003, 2005; Kiang 2002, 2009; Museus 2008; Museus and Quaye 2009), culturally relevant knowledge (Museus et al., in press; Gonzalez 2003; Museus et al. 2012), cultural community service (Museus et al., in press; Eyler and Giles 1999; Museus 2008; Museus et al. 2012), cross-cultural engagement (Antonio 2004; Antonio et al. 2004; Astin 1993; Astin and Sax 1998; Chang 2001; Chang et al. 2004), and cultural validation (Gloria et al. 2005; Kiang 2002, 2009; Museus 2008, 2011; Museus et al. 2012; Museus and Quaye 2009; Museus and Ravello 2010; Rendón 1994; Rendón et al. 2000; Sedlacek 1987; Tierney 1992, 1999) are positively associated with students’ connections to their institutions and eventual success. However, while scholars have extensively examined the quantitative relationships between meaningful cross-cultural engagement and a variety of educational outcomes, such as civic responsibility, openness to diversity, and cognitive flexibility (e.g., Antonio 2004; Antonio et al. 2004; Astin 1993; Astin and Sax 1998; Chang 2001; Chang et al. 2004), most research that illuminates the impact of the other four indicators of cultural relevance on success is qualitative in nature. In fact, with few exceptions (e.g., Gloria et al. 2005), empirical studies that illuminate the impact of cultural familiarity, culturally relevant knowledge, cultural community service, and cultural validation on college success utilize qualitative approaches and analyze small samples. For example, multiple qualitative inquiries reveal that ethnic student organizations and ethnic studies programs have a positive impact on involvement and success because they constitute spaces of cultural familiarity, opportunities for exchange of culturally relevant knowledge, vehicles for cultural community service, and sources of cultural validation (e.g., Guiffrida 2003; Kiang 2002, 2009; Museus 2008). However, the findings of these studies are context-bound and focus on analyzing the experience of small samples of students of color at single institutions. Thus, the development of quantitative measures of these indicators and the statistical testing of their relationship with success outcomes are needed to draw conclusions about whether these indicators influence success among larger, more generalizable populations.

Cultural Responsiveness

The remaining four indicators focus on the extent to which campus environments are responsive to the cultural norms and needs of diverse populations. First, collectivist cultural orientations refers to the extent to which campus cultures are collectivist and collaborative, as opposed to individualistic and competitive. This indicator is based on the assumption that many students, especially students of color, originate from communities with more collectivist cultural orientations and might encounter increased challenges adjusting to and navigating colleges and universities with more individualistic orientations (Dennis et al. 2005; Thompson and Fretz 1991). Second, humanized educational environments refer to the extent to which students are able to connect with faculty and staff with whom they can develop meaningful relationships and who care about their success. Third, proactive philosophies refer to faculty and staff going beyond making information and support available to students and making extra efforts to bring that information and support to those undergraduates whom they serve. Finally, holistic support refers to the extent to which students have access to sources of support that they are confident will provide the information and support that they need or will serve as a conduit for students to access that necessary information and support.

Several qualitative studies have highlighted the impact of collectivist cultural orientations (Fullilove and Treisman 1990; Guiffrida 2006), humanized educational environments (Guiffrida 2003; Museus 2011; Museus and Neville 2012; Museus and Ravello 2010; Nora 2001; Nora and Crisp 2009; Rendón and Muñoz 2011), proactive philosophies (Jenkins 2006; Museus 2011; Museus and Neville 2012; Museus and Ravello2010; Rendón 1994; Rendón and Muñoz 2011), and holistic support on student success (Jenkins 2006; Guiffrida 2005; Museus 2011; Museus and Neville 2012; Museus and Ravello 2010; Rendón and Muñoz 2011). Museus and Neville (2012), for example, conducted a qualitative analysis of 60 undergraduates of color across six institutions and found that students described institutional agents who had a positive impact on their success as sharing common ground with students, humanizing the educational experience, espousing proactive philosophies, and providing holistic support. However, as is the case for the aforementioned indicators of culturally relevant environments, quantitative analyses of the impact of these indicators of cultural responsiveness on success outcomes is virtually non-existent. Thus, the creation of quantitative measures and the statistical testing of their relationship to success outcomes is important in efforts to advance knowledge about the extent to which these factors foster success among larger populations of diverse students.

Developing and Testing the Validity of the Culturally Engaging Campus Environments (CECE) Scale

The CECE Scale is based on CECE Model and is designed to measure the nine variables that comprise the CECEs construct within the CECE Model. Higher education scholars have developed several scales that can be used to examine college students’ experiences and outcomes. For example, several scales that measure the campus climate have been utilized to analyze students’ perceptions of racial hostility, prejudice and discrimination, the impact of campus climates on student experiences and outcomes, and the climate for cross-racial interactions (e.g., Chang et al. 2006; Hurtado 1992; Hurtado and Carter 1997). In addition, scales have been created to measure student behaviors, such as student engagement (e.g., Kuh et al. 2005). However, none of these scales are aimed at measuring the extent to which campus environments are culturally relevant and responsive for diverse student populations. The current analysis aims to fill this gap in knowledge.

In order for questionnaires aimed at measuring educational phenomena to be deemed valid, several forms of validity must be established. The current study is focused on examining two critical forms of validity of the CECE Scale: content and construct validity. The content validity of a scale that is designed to measure a theoretical construct can be understood as the degree to which the scale encapsulates the domain and captures various components of the construct (Rungtusanatham 1998). In contrast, construct validity refers to the extent to which an instrument accurately measures that constructs that it is intended to measure (Cronbach and Meehl 1955). As we discuss in the following section, our analyses employs several techniques to examine the content and construct validity of the CECE scale.

Methods

This study was executed using survey research methods. First, based on the CECE Model, we developed an initial CECE scale to measure the nine CECEs indicators. Second, we examined the content validity of the CECE Scale and refined the scale items. Third, we utilized exploratory and confirmatory factor analyses techniques, for the purpose of cross validation, to examine the scale’s construct validity or the extent to which each set of survey items reliably measure the variable that they are intended to measure among White students and students of color across three campuses.

Scale Construction and Content Validity Analysis

We developed an initial set of survey items that were intended to measure the nine CECE indicators mentioned above. We began with the article that initially introduced the CECE Model (Museus 2014), and reviewed the literature upon which the CECE Model is based (e.g., Guiffrida 2003, 2005; Kiang 2002, 2009; Museus 2008, 2011; Museus et al. 2012; Museus and Quaye 2009; Museus and Ravello 2010; Rendón 1994; Rendón et al. 2000; Sedlacek 1987; Tierney 1992, 1999). We utilized these literature reviews and findings across these studies to generate a set of 4–5 items that we believed spanned the domain of each respective construct. This item construction resulted in an initial 41-item scale. All of the survey items were measured on a Likert scale with five values: strongly disagree, somewhat disagree, neither agree nor disagree, somewhat agree, and strongly agree.

The content validity of the CECE Scale items was analyzed in two phases. In Phase I, we solicited the help of a panel of subject matter experts (SMEs), one common method to examine content validity of a scale (Lawshe 1975; Wilson et al. 2012), and college students. In Phase II, we solicited the support of a second panel if SMEs and calculated content validity ratios (CVRs) for each item, which is another common method that is widely adopted in content validation and that we explain below (Lawshe 1975).

Item Revision

The nine CECE variable names, their definitions, and the survey items intended to measure them were sent to a panel of five SMEs. We asked the SMEs to examine the items and provide feedback regarding whether (1) individual items measured their corresponding indicator, (2) the set of items for each indicator measured the full domain implied it, and (3) the items were worded in a way that students would provide accurate answers to them. In cases in which the SMEs felt that items were inadequate for any of the preceding reasons, they were asked to offer suggestions to help modify the items. We also piloted the survey items with 10 undergraduate students from racially diverse backgrounds and asked them to indicate whether (1) the questions were clear and easy to understand and (2) they had suggestions regarding how to improve clarity. Feedback from the expert panel and suggestions from the college students were utilized to revise the scale that was utilized in the remainder of the study. Specifically, each item was compared to each SME comment and modified, and this process continued until that respective item was congruent with that all SME feedback. In addition, where SMEs identified gaps in the content covered within a particular construct, we added items to capture that area of the domain intended to be covered by that respective construct. This process resulted in a revised scale that included 54 items designed to measure the nine indicators.

Item Rating

In Phase II, we asked another panel of five SMEs to quantitatively score each of the 54 survey items on an adaptation of Lawshe’s 3-item content validity scale (1 = very important, 2 = useful but not important, and 3 = and not important at all). Then, we utilized those ratings to compute CVRs to assess content validity (Lawshe 1975; Wilson et al. 2012). CVRs are values that are measured on a scale from −1 to +1, with a +1 being the outcome when all SMEs rate the item as very important, a 0 indicating that half of the SMEs rate the item as very important, and a −1 resulting when all SMEs rating the scale useful but not very important or unnecessary. They are computed using the equation,

$$CVR = \frac{{n_{{e - \left( {\frac{N}{2}} \right)}} }}{{\frac{N}{2}}},$$

where n is the number of SMEs who evaluated the respective survey item as very important and N is the total number of SMEs on the panel.

Lawshe (1975) originally argued that a CVR must be higher than the critical value in the CVR critical values table (Wilson et al. 2012), it indicates that the SME ratings of an item as very important to the measurement of a construct are beyond a chance occurrence. However, it is important to note that, while CVR tables provide a benchmark for determining whether a SME ratings of an item as essential to the measurement of a construct is beyond the statistical level of chance, researchers have noted that utilizing this table requires a very large number of SMEs (Schmitt and Ostroff 1986; Schultz et al. 2013). In practice, researchers have argued for evidence of content validity of an item when the CVR is positive. In addition, the CECE constructs have only recently been proposed and have not been extensively studied using language utilized by the nine CECE indicators, suggesting that SMEs might not have in-depth knowledge of the specific CECE indicators. To minimize likelihood of rejecting items that might be important and ensure further analysis of all potentially important items in the subsequent construct validity analysis, we sought positive CVRs as the criterion for retaining items.

In this phase, a new panel of five SMEs rated each of the 54 survey items on a multiple-choice scale (e.g., very important, useful but not very important, and not necessary). We computed the CVRs, using the formula provided above. All 54 items resulted in a positive SME (between 0 and +1), and were therefore retained for the analysis of construct validity.

Scale Dissemination and Construct Validity Analysis

The CECE Scale was administered to one campus on the East Coast and two campuses in the West. The East Coast campus is a 4-year urban research university just over 12,000 undergraduates, one rural community college in the West enrolling just under 8000 undergraduates, and the other rural community college in the West has an undergraduate student body just under 2800. With regard to racial composition, at the time of the study, the undergraduate student body of the 4-year campus on the East Coast was 41 % White, 15 % Black, 12 % Asian American, 12 % Hispanic or Latino, and 2 % Multiracial, with 18 % reported as unknown and non-resident alien. The larger 2-year institution had an undergraduate student body that was 37 % Asian American, 30 % Multiracial, 12 % Hispanic or Latino, 10 % White, 7 % Native Hawaiian or Pacific Islander, and 2 % Black, with 2 % reported as unknown and non-resident alien. And, the smaller of the 2-year campuses was comprised of 40 % Multiracial, 18 % White, 16 % Asian American, 14 % Hispanic or Latino, 10 % Native Hawaiian and Pacific Islander, and 1 % Black, with 1 % reported as unknown and non-resident alien.

We worked with the three participating institutions to distribute the survey on their respective campuses. All three participating institutions distributed the survey electronically via email to all enrolled undergraduates on their respective campuses. One week after the initial survey was disseminated, a follow up email was sent to students, soliciting their participation in the survey. Finally, one week after the first follow-up email, a third email was sent to all undergraduates as a final solicitation of their participation in the study. A total of 499 undergraduates responded across the three campuses. It is important to note that participating institutions informed us that many students on their campuses do not regularly check their university email accounts and we did not have the capacity to track how many students were able to actually access the email, so accurate response rates could not be computed. However, researchers have questioned the assumption that small samples cannot generate statistics that accurately estimate population parameters (Groves 2006; Massey and Tourangeau 2013; Peytchev 2013). In fact, some researchers have analyzed national data and concluded that response rates are not necessarily deterministic of generalizability and, depending on institutional size, sample sizes between 25 and 75 can often generate fairly accurate estimates of institution populations (Curtin et al. 2000; Fosnacht et al. 2013; Groves 2006; Locker 1993; Perneger et al. 2005; Pike 2012). Given that our survey administration generated over participating institution, we concluded that our sample could be used for the current analysis.

A comparison of demographic data from the target undergraduate population across the three participating campuses and our study sample are included in Appendix Table 3. The data presented in the appendix convey that key sample demographics were fairly reflective of the characteristics of the institutional populations. Asian Americans or Pacific Islander (29 %) students, combined, constituted the largest racial group in the sample, followed by White (27 %), Multiracial (9 %), Latino (8 %), Black (5 %), and Native American (2 %) students. Compared to the populations across the three participating campuses, Asian American or Pacific Islander, White, and Native American students were slightly overrepresented, whereas Black, Latina/o, and multiracial students were slightly underrepresented in our sample. However, the proportion of our sample represented by all racial groups was within 5 % of their share of the target population, with the exception of multiracial students. For mixed-race students, their share of the sample was within 7 % of their proportion of the target population.

Compared to this target population, women were slightly overrepresented in the participant sample (75 %) compared to their share of the target population (67 %), and men were slightly underrepresented (25 % of the sample identified as male, compared to 33 % of the population). With regard to age, similar to the target population, approximately 67 % of our sample was under age 25 and almost 33 % was age 25 or over. The sample was diverse socioeconomically, with 29 % originating from families earning between 0 and $20,000, 27 % coming from families earning between $20,001 and 40,000, 16 % from families earning $40,001 and 60,000, and 28 % from families earning over $60,000 annually. Participants reported an average high school GPA of 3.22, and respondents were slightly more likely to have first-year (25.5 %) or sophomore (29.5 %) status, than junior (22 %) or senior (23 %) status.

To test the construct validity of the CECE Scale, we employed factor analytic techniques. Specifically, we used both Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) for purposes of cross-validation. CFA is often adopted as the appropriate method of analysis when there is an underlying theoretical foundation for the hypothesized factor structure (Williams 1995). However, researchers have noted the value of EFA in evaluating scales in early stages of development because this type of analysis can assist in identifying items that might load on non-hypothesized factors (Kelloway 1995). Thus, we chose to use EFA to gain the most comprehensive understanding of the constructs on which each item loaded (both hypothesized and non-hypothesized) with half of our sample, and utilize CFA to test and (dis)confirm the hypothesized factor structure with the other half of our sample.

Exploratory Factor Analysis

The total sample (n = 499) was randomly divided into two subsamples of equal size (n 1 = 249 and n 2 = 250). Both subsamples were balanced on the major demographic variables (i.e., race, age, gender, income). We utilized the initial 54 survey items that resulted from the content analysis to conduct EFA with the first subsample (n 1), and negatively worded scale items were reverse-coded. EFA was used to examine the extent to which all survey items loaded on all the factors, and factors correlated with each other.

Twenty-two percent of data were missing in the initial dataset, which was well within in the range of missing data proportions commonly reported in educational and psychological studies (Enders 2003). Compared to other commonly used procedures (e.g., listwise deletion, pairwise deletion, mean substitution), multiple imputation techniques are especially effective for producing unbiased estimates of missing data, even if the data are not missing at random (Graham 2009; Graham et al. 2003; Schafer and Graham 2002). Thus, we replaced missing values using multiple imputation with maximum likelihood estimation in Mplus Version 7 (Muthén and Muthén 1998–2012) with 10 imputed datasets for each analysis.

Since the Likert-scale survey items are ordered categorical variables, weighted least squares estimation (WLSMV) was used. WLSMV is recommended to handle categorical data with many factors and relatively few indicators per factor, which are criteria consistent with our theoretical model. WLSMV performs better than the default maximum likelihood (ML) estimation. And the results of WLSMV are similar to what would be expected from principal axis factoring (PAF) (Muthén and Muthén 1998–2012).

Reliability Analysis

After identifying the nine emergent factors mentioned above, we examined the factor loadings of each individual survey item and reliability of each of the nine latent constructs in SPSS Version 22 (IBM Corp 2013). First, to determine which survey items to retain for each indicator, we omitted factor loadings below .40. Then, we conducted reliability analyses and computed alpha reliability scores for each of the nine factors. In cases in which the reliability statistics indicated that one or more survey items could be dropped to reduce the number of items associated with each factor and increase reliability, those individual items were omitted to generate a more parsimonious refined set of items for each factor. After excessive items were removed, reliability analyses were conducted again for each of the nine factors to generate the final reliability coefficients for each indicator.

Confirmatory Factor Analysis

Next, we conducted CFA with the entire second subsample (n 2) in Mplus Version 7 (Muthén and Muthén 1998–2012). Then, we split this second subsample into two groups, of White students and students of color, and conducted a CFA with multiple-group comparison to examine whether the nine emergent factors held equally well for both White students and students of color. We omitted respondents who did not select a race or selected “Other” but did not select any other race from the second subsample. Then, we split the remaining survey respondents into the two aforementioned racial groups. Due to the fact that mixed-race college students can identify as both White students and a student of color (Renn 2000), we utilized an approach that would permit these students to be included in both racial groups. First, we extracted the subsample of participants who identified as White (n = 62) from the omnibus sample for the White group. Then, we extracted the subsample of participants who identified as Asian American, Black, Latino, Multiracial, Native American, or Pacific Islander from the omnibus sample for inclusion in the student of color group (n = 108). Overall model fit was evaluated using χ 2 statistics, other fit indices in the Mplus output, and evaluation criteria provided by Hu and Bentler (1999). A Root Mean Square Error of Approximation (RMSEA) less than .06, Comparative Fit Index (CFI) above .95, and Tucker-Lewis Index (TLI, also known as the Non-Normed Fit Index [NNFI]) above .95 indicate a good to excellent model fit.

Exploratory Factor Analysis and Reliability Analyses Results

In this section, we present results of our EFA and reliability analyses. The statistics shown in Table 1 include each emergent factor, the factor’s respective alpha reliability score, the survey items associated with each factor, and the factor loading for each survey item. As mentioned above, across the 10 imputed datasets generated through the multiple imputation procedure in Mplus, the EFA revealed nine factors, each corresponding with one of the nine CECE indicators. Cultural familiarity (CF) exhibited an alpha reliability score of .81 and was associated with three items. Culturally relevant knowledge (CRK) was associated with an alpha score of .91, and was comprised of four items. The reliability coefficient for cultural community service (CCS) was .92, and it was associated with five items. The cultural validation (CV) factor was also comprised of five items and it was associated with an alpha reliability score of .91. Collectivist cultural orientations (CCO) exhibited an alpha score of .89 and were measured by three items. The analysis revealed that five items reliably measured meaningful cross-cultural engagement (CCE), and exhibited a score of .90. The humanized educational environments (HEE) construct was associated with a reliability score of .87, and was comprised of four survey items. Proactive Philosophies (PP) revealed an alpha score of .87, and was comprised of five interrelated items. And, finally, holistic support (HS) was measured by five items and was associated with an alpha reliability score of .94. In sum, the EFA and reliability analysis generated nine factors associated with the nine CECE indicators and measured by a 39-item scale. And, each of these nine factors was associated with high reliability in the first subsample.

Table 1 Survey items, alpha scores, and factor loadings from the exploratory factor analysis

Confirmatory Factor Analysis and Validation of the Factor Structure

The nine-factor structure emerging from the EFA and reliability analyses was further analyzed with the second subsample (n 1) for purposes of cross validation. With multiple imputation generating 10 imputed datasets, this CFA yielded excellent model fit (χ 2 = 1204.80 with df = 745, p < .001; RMSEA of .05 with a 90 % confidence interval; CFI of .977; TLI of .975). Table 2 displays the results of the CFA model including standardized factor loadings that resulted from this omnibus CFA and the invariance test discussed later in this section. As the table demonstrates, the empirical factor structure derived from the EFA of the first subsample was cross-validated by our CFA of the second subsample. All factor loadings resulting from the omnibus CFA ranged from .65 to .99, with all but one of these factor loadings being .76 or higher. As Table 2 shows, all of the standardized factor loadings emerging from the CFA were statistically significant at the .001 level. In sum, the results of this omnibus CFA confirm the nine-factor structure that emerged from the EFA.

Table 2 Standardized factor loadings from omnibus and invariance confirmatory factor analyses

Based on the omnibus CFA results, a multi-group CFA model was further tested to explore the measurement invariance of the factor structure among both White students and students of color. In the multi-group CFA model, all factor variances, loadings, and item residual variances were free to be estimated while the first item loading on each factor and the loadings for residuals were set to 1, which is the default in Mplus. All free parameters were constrained to be equal across the White student and students of color subsamples to examine factor invariance.

The multi-group invariance model still yielded excellent model fit results (χ2 = 2001.99 with df = 1551, p < .001; RMSEA of .048 with a 90 % confidence interval; CFI of .977; TLI of .976). Factor loadings resulting from the invariance test ranged from .55 to .98, and all but one loading was .70 or higher. These multi-group CFA results indicated that the nine-factor structure holds equally well across the White student and students of color subsamples. Due to the excellent fit results that resulted from this fully constrained invariance model, no partial invariance models needed to be tested.

Discussion

The current study examined the construct validity of the CECE scale across racial groups. From this analysis, we draw three major conclusions. The first conclusion emerges from our analysis of the literature, and we underscore the need for more culturally relevant frameworks to examine and understand student success. Our review of literature revealed that, for several years, higher education scholars highlighted the value of modifying of Tinto’s model of student integration (e.g., Cabrera et al. 1999; Nora et al. 1996). More recently, however, scholars have recognized the value of developing, adopting, utilizing, and testing more culturally relevant and responsive frameworks that challenge traditional ways of thinking about student success and shed new light on the college experience (Dowd et al. 2011; Museus 2014; Museus and Quaye 2009). The generation of such culturally conscious models and the utilization of research to examine them are critical in efforts to advance current levels of understanding regarding how institutions shape the experiences and outcomes of racially diverse student populations.

Second, the current examination contributes to extant research by helping generate a scale that fills an existing gap in the higher education literature. Postsecondary education scholars have created and analyzed scales that measure aspects of the campus racial climate, but those instruments primarily focus on measuring perceptions of hostility in the climate, experienced prejudice and discrimination, or (dis)satisfaction with the campus climate (e.g., Cabrera et al. 1999; Hurtado 1992; Hurtado and Carter 1997; Museus et al. 2008; Nora et al. 1996). In contrast, these scales do not thoroughly measure the extent to which campus environments are relevant and responsive to diverse student backgrounds. Higher education researchers have also constructed and utilized scales that measure student behaviors, such as integration, involvement, and engagement (e.g., Astin 1993; Kuh et al. 2005). Yet, these scales have been critiqued for not sufficiently taking into account the ways in which culture differentially shapes the experiences and outcomes of diverse populations (Dowd et al. 2011). Thus, there is a need for scales that capture the extent to which campus environments engage the backgrounds, communities, and identities of diverse populations, and our results suggests that the CECE Scale might fill this existing gap in higher education research and discourse.

Third, and closely related to the previous conclusion, is the conclusion that our results offer some initial evidence that the CECE Scale is a statistically valid tool for measuring the CECE indicators among both White students and students of color. This is the first analysis of the construct validity of the CECE Scale across racial groups. Thus, the findings contribute to existing knowledge by generating initial evidence validating the Scale and its utility in the examination of campus environments in postsecondary education.

Limitations of the Study

The current analysis has several limitations, and we highlight four of them herein. First, our response rate was low and generated a relatively small sample of 499 students across the three participating campuses. Although researchers have offered evidence that low response rates do not necessarily equate to biased survey results, if the low response rates are coupled with self-selection bias, they can potentially influence estimates of population parameters (Curtin et al. 2000; Groves 2006; Locker 1993; Perneger et al. 2005). Second, given that our sample was drawn from three institutions, generalizations cannot be made beyond these campuses. Third, our data collection procedures only allowed us to capture a single snapshot of students’ perceptions of their campus environments, and do not test the reliability or stability of the CECE scale over time. Therefore, the results of our analysis should be interpreted with caution, until they are replicated with different racial groups, with larger populations across a broad range of institutional types, and using longitudinal statistical techniques that allow researchers to assess the reliability of the constructs over time. Finally, although we analyze two critical forms of validity in our analysis, it is important to acknowledge that predictive validity is another important form of validity that must be examined in order to assess the validity of a new measurement scale. However, the predictive power of the CECE scale in relation to several outcomes in the model (e.g., sense of belonging, self-efficacy, motivation, performance, persistence) was beyond the scope of the current study.

Implications for Research and Practice

The current investigation has several implications for higher education research and practice. With regard to research, the findings provide a foundation for future studies to examine the construct validity of the CECE scale. Our analysis was limited to 499 students across three institutions, and future examinations should analyze the construct validity of the CECE Scale with larger samples and across a variety of institutional contexts. Such analyses are critical in helping generate a more comprehensive understanding regarding whether the results of the current analysis are generalizable across larger populations and institutional contexts.

Future analyses should also utilize test–retest reliability analyses to extend knowledge of the construct validity of the CECE Scale. Indeed, it is possible that participants’ perceptions of their campus environments are malleable. Test–retest reliability analyses would permit researchers to examine whether the CECE Scale is reliable across time and would help offer additional insights into the validity of the scale and individual indicators within it.

As mentioned, analysis of predictive validity of the CECE scale was beyond the scope of this inquiry. However, it is important for future research to examine the extent to which the CECE Scale predicts a variety of intervening and long-term outcomes included in the CECE Model, including sense of belonging, self-efficacy, motivation, performance, satisfaction, learning, persistence, and degree completion. Such analyses are essential in determining whether the CECE Scale is, in fact, a useful tool for examining campus environments.

With regard to practice, our findings provide initial evidence that the CECE Model might constitute a useful framework for institutional agents (e.g. faculty, administrators, and staff) to understand key components of campus environments. Thus, educators should consider how the CECE Model and Scale might help guide their efforts to measure and understand their campus environments and how they shape the experiences of diverse populations. For example, faculty can utilize the framework to assess whether their curriculum and pedagogy is culturally relevant, which might mean ensuring that curricular content reflects the voices of diverse students in the class and validating the knowledge and perspectives that students bring with them to the classroom. It can also include assignments that allow students to connect with members of their own cultural community, analyze issues related to their communities, engage in projects that give back to those communities, and participate in such activities with students who come from different cultural backgrounds. Student support programs can engage the CECE framework in the development of programs to maximize the likelihood that they are creating programs that are culturally inclusive. For example, educators might utilize the CECE indicators as a conceptual lens to understand how they might be able to create more collectivist, humanized, proactive, and holistic cultures and structures within their academic support units. This might mean placing emphases on teamwork and collaboration, developing meaningful relationships with students, pressuring students to take advantage of opportunities that are available to them (e.g. scholarships, study abroad, internships, etc.), and underscoring the importance of practitioners proactively serving as useful conduits for students to access larger support networks on campus.

Finally, postsecondary educators should consider the utility of the CECE Scale as an assessment tool. If administrators and faculty are committed to fostering the conditions to generate more positive experiences and outcomes among diverse college student populations, they should consider utilizing tools such as the CECE Scale to critically analyze the current conditions of their campuses. Institutional leaders can also utilize the results of such assessments to pinpoint hotbeds of culturally relevant and responsive practice, so that they can understand how to leverage and scale the work being done in these spheres of their campuses. And, such assessment data can be employed to pinpoint areas of growth, which can help administrators determine where to target resources to cultivate more culturally relevant and responsive policies, programs, and practices. Such assessment efforts are critical for institutions to focus and invest their energies and resources on fostering more culturally relevant and responsive environments.