Introduction

Learning communities create a confluence of scholastic and social interaction as they bring together students sharing the same interest under one umbrella. Learning communities often bring together groups of college students in cohorts who take linked courses together (Price, 2005). Colleges and universities are expanding learning communities’ implementation to forge closer bonds between students, students and faculty, and students and the institution (Price, 2005). The ultimate goal is that the students’ learning experience is enhanced when “the learning that evolved from these communities is collaborative, in which the collaborative knowledge of the community is greater than any individual knowledge” (Johnson, 2001, p. 34). Learning communities can have different configurations such as curricular learning communities, classroom learning communities, residential learning communities, and student-type learning communities (Lenning & Ebbers, 1999). However, what all of them have in common is that all are forms of communities of practice. Communities of practice (Wenger, 2011; Wenger & Snyder, 2000) are groups of people who engage in collective learning in a shared domain of interest.

Research has revealed that learning communities are an effective learning initiative that can improve student retention, surges in academic achievement, diminished faculty isolation, and increased curricular integration (Lenning & Ebbers, 1999). However, while universities are promoting and investing in the concept of community to enhance the student learning experience, there is still a need for more evidence on their effectiveness, focusing on the student experience (Dawson et al., 2006). The majority of research has focused on institutional outcomes, but little has focused on how the learning community experience impacts the motivation, beliefs, and perceptions associated with student success (Barefoot, 2000; Browne & Minnick, 2005). To contribute to the body of research focused on students’ experiences in learning communities, this study identifies the effectiveness of a learning community that uniquely integrates four academic and socialization elements: (1) a learning community comprising of shared coursework in the domain of data science; (2) a faculty-mentored, team-oriented data science research experience; (3) a living community in the same residence hall with social and academic activities, and (4) participation in a professional development seminar for career exploration in data science. This study identifies the psychosocial effects of participating in this learning community in terms of motivation and interest, research self-efficacy, sense of belongingness and socialization in this context, and career awareness in the field. The study’s specific aims are to first identify the psychosocial effects of participating in a residential learning community regarding students’ interest and motivation on pursuing research-oriented careers, research self-efficacy beliefs, sense of belongingness with the learning community, and socialization, and career awareness in research-oriented fields. Second, to describe how student levels of research self-efficacy beliefs correspond to differential gains on career awareness, motivation and interest, and sense of belongingness and socialization in the learning community.

Learning Communities

Constructivist learning theories, including situated learning theory and communities of practice (Bielaczyc & Collins, 1999; Brown & Campione, 1994; Daly et al., 2003), define learning communities as curated academic and social learning experiences designed to support peer-to-peer and student-faculty interaction (Jessup-Anger, 2015). Learning communities, which have become a mainstay of the higher education landscape over the last three decades, exist in various forms (Fink & Inkelas, 2015). The earliest learning communities focused on providing bridge experiences to acclimate first-year students to college (Garcia, 1991; Tinto & Goodsell, 1994). Many learning communities in this tradition are focused on supporting students’ retention and success from groups at risk of leaving college (Contreras, 2011; Tinto & Engstrom, 2008). Some studies have demonstrated greater retention, academic performance, and self-reported engagement from first-year students in learning communities (Baker & Pomerantz, 2000; Taylor et al., 2003; Tokuno & Campbell, 1992; Zhao & Kuh, 2004).

These positive effects have largely been attributed to enhanced social relationships among students (Carrino & Gerace, 2016; Franklin, 2000; MacGregor, 1991; Tinto & Russo, 1994; Zhao & Kuh, 2004). Studies have documented that learning communities foster increased student-to-student and student-faculty interactions (Baker & Pomerantz, 2000; Kuh, 2008; Pike, 1999). Indeed, these increased social interactions appear to generate a sense of belonging among students; learning community participants report greater feelings of affiliation and inclusivity (Dodd, 2002). Also, living or residential learning communities intentionally focus on combining students’ residence hall experience with curricular and co-curricular experiences. The goal is to build strong connections between the academic and social aspects of college life and at the same time enable formal and informal forms of peer learning (Inkelas et al., 2008).

Despite the abundance of research on the effects of learning communities, more work is needed that investigates the effectiveness of specific learning community initiatives. For example, most learning community research has focused on large first-year experience-based learning communities. But there has been limited study of increasingly popular discipline-specific learning communities (Dagley et al., 2016; Solanki et al., 2019). Dagley et al. have reported that a residential learning community for STEM students led to greater first-year retention, long-term retention, and graduation than a comparison group (Dagley et al., 2016). Solanki and colleagues showed increased academic performance (first-year GPA) among participants in a bioscience-focused learning community (Solanki et al., 2019). Additionally, studies of learning communities have remained focused on quantifiable aggregate statistics such as GPA, retention, and graduation rates (Holt & Nielson, 2019; Lardner & Malnarich, 2009). A more granular study characterizing differential outcomes of students within learning communities is mostly lacking.

The implications of the previous work on learning communities for this study’s design relate to measuring the psychosocial effects of participating in a living-learning community. The primary goal is to engage students in collaborative activities aimed at developing research and data science skills and increasing their awareness of career occupations in this field.

Psychosocial Effects of Participation in Learning Communities

Participating in learning communities provides students with learning opportunities to develop their knowledge and skills to solve real-world problems (Solanki et al., 2019). However, it is equally important for learners to build confidence in mastering a concept or skill, be self-motivated to learn, and apply the acquired knowledge in other contexts (Peters-Burton et al., 2015). Such skills can be generally characterized as forms of psychosocial factors or effects, herein defined when one’s views or beliefs about the self (psychologically or socially) are being affected (Carrino & Gerace, 2016). Specifically, we focus on psychosocial effects such as self-belief, motivation and interest, sense of belonging, and career awareness. For this, we grounded our study on Bandura’s theory of self-efficacy (1994). Students’ self-efficacy beliefs, their interest, and motivation are highly related (Schunk et al., 2012). Self-efficacy is defined as “people’s beliefs about their capabilities to produce designated levels of performance that exercise influence over events that affect their lives” (Bandura, 1994, p. 71). Motivation refers to the activation to action; “the level of motivation is reflected in the choice of courses of action, and the intensity and persistence of effort.” (Bandura, 1994, p. 71). Self-efficacy thus determines how individuals motivate themselves and consequently behave (Bandura, 1994). Self-efficacy beliefs include one’s ability to plan for and execute the steps necessary for future success (Bandura, 1977).

When students believe in their efficacy to achieve a task, they become motivated to perform in ways that make their achievement more likely (Bandura, 1977, 2006). Similarly, self-efficacy and sense of belonging are also known predictors of motivation and performance (Blaney & Stout, 2017; Walton & Cohen, 2007). Sense of belonging refers to a subjective personal sense of fitting in and being included as a valued and legitimate member of an academic discipline (Goodenow, 1993). Furthermore, students’ beliefs about their capabilities in disciplinary areas have also been shown to influence their career choice (Zeldin et al., 2008). These relationships can be explained under Bandura’s views that people’s motivation and actions are influenced more by what they believe than what is objectively true (Bandura, 1994).

Research that investigated how self-efficacy and other psychosocial variables might jointly be affected by participating in learning communities include the work from Freeman et al. (2008). They identified that participating in linked classes as part of a learning community positively influenced students’ attitudes, learning experiences, and intrinsic motivation in STEM (Freeman et al., 2008). Similarly, in the context of a virtual learning community (Sun et al., 2012), it was hypothesized that task complexity and self-efficacy – two social learning factors – moderate the relationship between motivation and sustained participation. As part of their findings, researchers identified that motivation significantly influenced sustained participation intention (Sun et al., 2012).

Implications from previous work on living-learning communities (Parker & Ward, 2019) and previous work on the effects of different socialization forms in learning higher education suggested the need for further research. This study contributes to research on disciplinary learning communities by investigating two research questions. (1) What are the psychosocial effects of participating in a residential learning community, such as on students’ interest and motivation in pursuing research-oriented careers, research and data self-efficacy beliefs, sense of belongingness and socialization within the learning community, and career awareness in research-oriented fields? And (2) How do student entering profiles regarding research self-efficacy beliefs correspond to differential gains on career awareness, motivation and interest, and sense of belongingness and socialization in the learning community?

Methods

This study used a pretest-posttest design survey study to the effects of participating in a living-learning community consisting of a year-long, cohort-based program to introduce current statistics majors and interested students from other majors to data science practices, educational and career pathways in statistics and data science, and applied research utilizing statistics and data science.

Context and Participants

The study’s context focuses on a statistics and data science learning community at a large midwestern university in the United States of America, consisting of a year-long, cohort-based program. This program was conceived as a bridge program between the strong existing first-year experience at the university and the more variable second-year experiences as students move into more specialized courses in their disciplines. The program’s holistic approach includes (1) a learning community (block-scheduling) comprising three courses on computing with data, probability, and statistics; (2) a faculty-mentored, team-oriented, data science research experience involving applied investigations appropriate for sophomores; (3) a living community in the same residence hall floor, with social and academic activities to enhance the sophomore experience; and (4) participation in a professional development seminar each semester, to prepare for graduate school and career exploration in data science.

Participants of the study consisted of five cohorts of the learning community, each composed of twenty students. For each year of the program, sophomore students about to start their second year in the statistics major were invited to apply to be part of the learning community. About 55 students applied each year, and from those, 20 were selected to participate. There were no specific selection criteria, but for each candidate, administrators followed a holistic review process aimed at identifying those students for whom this would be an impactful and meaningful experience. Students in each cohort participated in a voluntary assessment at the start and the end of the semester. For this study, we considered all the 100 students who participated in the learning community between the Fall of 2014 until the Fall of 2018. From the total of 100 students, 84 of them responded to all the questions in the survey; therefore, we considered 84 students as the sample of our study.

Data Collection Method and Procedures

The data collection method consisted of a 41-question survey that was administered to all participating students at the beginning and at the end of the academic year herein called the Statistics and Data Science Learning Community Student Survey (SDSLCSS). The Likert-scale survey covered four main constructs: Interest and motivation, research self-efficacy, sense of belongingness and socialization, and career awareness (Appendix Table 4). Two investigators developed this survey, an expert in statistical research and an expert in education research and evaluation. The survey instrument was created by adopting and adapting survey items from literature in motivation, self-efficacy, undergraduate research, and career awareness. As the sample size for the study was too small to allow for construct validity using factor analysis (Anthoine et al., 2014; Williams et al., 2010), we used a content validity approach, described in the following section. Content validation is widely used in education and is particularly appropriate for the validation of new instruments (MacKenzie et al., 2011; Podsakoff et al., 2016). Further, instrument reliability was assessed by measuring the internal consistency of each construct, as described in the following section.

Content Validity Analysis & Internal Reliability for the SDSLCSS Scale

Content validation is an effective technique that requires an expert’s judgment to validate the constructs or themes that emerge from an assessment or test (Crocker, 2001). Experts’ knowledge and proficiency can determine the construct relevance, representation, and quality of the instrument to measure a particular domain or dimension (García-Valderrama & Mulero-Mendigorri, 2005; Messick, 1995). The content validation method was conducted in a two-step process. Three experts, two education researchers, and one statistics researcher were provided with the four-domains, eleven subdomain definitions, and items represented under each domain. The experts were asked to critically review each domain and subdomain definitions and all items under each domain, and independently score the items on a four-point relevance scale (1 = not relevant, 2 = somewhat relevant, 3 = quite relevant, 4 = highly relevant) (Polit & Beck, 2006). Then two types of content validity scores were calculated; the content validity Index score for items(I-CVI) and the content validity score for the scale (S-CVI). The I-CVI is the proportion of experts who rate any given item as a 3 or 4 on a 4-point scale (DeVon et al., 2007, p. 158). The I-CVI scoring revealed that 41 items were rated as either 3 or 4 on the relevance scale by all three raters. The S-CVI, the proportion of valid items in the scale, is 0.86 (DeVon et al., 2007, p. 158). The CVI scores obtained for items (I-CVI =1) and scales (S-CVI = 0.86) meet the acceptable criteria for content validity in the case of three raters (Polit & Beck, 2006). Table 1 presents the domain definition, subdomains, and the number of items. Also, refer to Appendix Table 4 to see the complete survey used for the study.

Table 1 Psychosocial effects of learning community participation: Domains and Subdomains

We calculated the internal consistency (Cronbach’s alpha) for the items within each of the four constructs measured with the survey instrument to account for reliability. Table 2 presents the results for internal consistency. According to the scores, reliability was considered acceptable. The generally accepted value for Cronbach’s alpha (a) denotes that a = 0.6 to 0.7 implies an acceptable level of reliability, a = 0.8 or greater determines a very good reliability level (Ursachi et al., 2015). Based on the acceptable level of alpha values, we concluded that the instrument as a whole (a = 0.88) and the three themes of motivation/interest, self-efficacy, and career awareness (see. Table 2) demonstrated a very good level of internal consistency. For the case of the sense of belonging, alpha values demonstrated an acceptable level of internal consistency.

Table 2 Psychosocial participation effects domain reliability

Data Analysis

Descriptive and inferential analyses were used to describe the students’ psychosocial effects of a year-long learning community experience. Scores were rescaled between 0 and 1, and those were interpreted as follows: scores between 0 and 0.33 were interpreted as low or negative levels of psychosocial effects, scores between 0.34 and 0.66 were considered as moderate psychosocial effects, and between 0.67 and 1 were considered as high or positive levels of psychosocial effects.

Additionally, we used a clustering algorithm to identify groups of students that demonstrate high intra-class homogeneity and high inter-class heterogeneity (Battaglia et al., 2015). Clustering is an unsupervised machine learning technique, which is particularly useful for grouping unlabeled/unclassified data (Kogan, 2007). Prior studies have demonstrated the use of a hierarchical clustering algorithm to group the students based on their responses on a multiple-choice test (Ding & Beichner, 2009) and survey instruments (Medová & Bakusová, 2019), for identifying significant and useful clusters. For our study, we specifically used Ward’s minimum variance clustering (Ward Jr, 1963), a type of hierarchical clustering algorithm to cluster students’ self-efficacy domain responses on the SDSLCSS for identifying significant student groups from a limited sample size (N = 84). A prior study by (Antonenko et al., 2012) demonstrated the use of Ward’s minimum variance algorithm to cluster a small sample of 59 students enrolled in an introductory instructional technology course. Specifically, Antonenko et al., (2012) used hierarchical clustering to divide the students into four groups of unequal sizes: Cluster 1 (n = 13), Cluster 2 (n = 7), Cluster 3 (n = 19), and Cluster 4 (n = 20). Further, they conducted one-way ANOVA for the clusters as between-subject factors and considered problem-solving performance as the dependent variable. The results from Antonenko et al., (2012) revealed that there was a significant difference among the clusters.

Hierarchical clustering is a commonly used method for the small sample size. For example, a study by (Medová & Bakusová, 2019) used Ward’s minimum variance clustering to group 30 in-service mathematics teachers based on their response to a questionnaire related to teacher’s beliefs and current pedagogy. The studies also revealed that a hierarchical clustering method using Ward’s minimum variance is an appropriate clustering method when there is no preconceived notion about the number of clusters that could be formed from a particular dataset. Since the analysis was exploratory and the data set was small, we found Ward’s minimum variance clustering algorithm appropriate for our study. The student responses to the questions related to the self-efficacy domain for T1 and T2 were then clustered using Ward’s minimum variance method. Since Ward’s minimum variance method follows an agglomerative strategy of hierarchical clustering, at every step of Ward’s minimum variance clustering, a metric is computed, which is the sum of Euclidean distance of each student response from its cluster’s mean. Further, the algorithm combines the different clusters sequentially to find the pairs of clusters to be merged, to minimizes the increase in the sum of Euclidean distance of each student response from their cluster’s mean. The output of Ward’s minimum variance approach resulted in three clusters:

  • Cluster 1: is the group of students that demonstrated a high self-efficacy, and high motivation and interest

  • Cluster 2: is the group of students that demonstrated moderate self-efficacy, and moderate motivation and interest

  • Cluster 3: is the group of students that demonstrated moderate self-efficacy, and high motivation and interest

To verify the number of clusters, we used the elbow method (Yuan & Yang, 2019). The elbow method aims to identify the optimal number of clusters sufficient to explain the variance in the observations; adding extra clusters does not significantly improve the model’s ability to explain variability in data. In our case, using the elbow curve, we found that three clusters were optimal (see Appendix 2). Further, based on a Welch’s Test (F(2, 47) = 42.46, p < .001), it was identified that the clusters were statistically significantly different within each domain and subdomain over time. Finally, survey question responses were rescaled to allow valid comparison across questions.

Results

The results are organized into two main sections. The first section addresses the first research question by providing an overview of trends regarding students’ psychosocial effects in a statistical learning community. The second section describes three distinct clusters of students based on their initial research self-efficacy beliefs who participated in the learning community and the different growth patterns of these student clusters during their participation in the learning community. Comparison of these clusters in the second section addresses the second research question regarding the role of research self-efficacy beliefs and their relationship to career awareness, motivation and interest, and sense of belonging and socialization in the statistics learning community.

Psychosocial Effects of Student Participation in a Statistical Learning Community

As shown in Table 3, at the beginning of the year (T1), the learning community students ranged between moderate and high in all psychosocial domain scores. Scores in the high range included research self-efficacy, motivation and interest, and a sense of belonging. Only the Career Awareness domain had an average student score in the moderate range. By the end of the year (T2), the average student score for participating students indicated high or positive knowledge and self-beliefs in career awareness, research-self efficacy, and sense of belonging and socialization. A paired t-test was used to compare domain scores from the two time-points. The analysis revealed that the mean scores of the data science learning community students’ scores increased significantly in all domains other than the motivation and interest in research fields (Table 3). This domain stayed relatively constant over the year.

Table 3 Aggregate measures of psychosocial domains of student participation in a statistical learning community

Cluster Analysis of Students

Cluster analysis was used to uncover students’ subgroups with differential responses to participation in the learning community. Specifically, a cluster analysis on research self-efficacy scores uncovered three distinctive groups (see Fig. 1).

Fig. 1
figure 1

Psychosocial effects on domain scores a T1 for each cluster

One group (cluster 1, n = 41) started with high research self-efficacy, high motivation and interest, and a high sense of belonging. The second group (cluster 2, n = 26) started with moderate research self-efficacy, moderate motivation and interest, and a high sense of belonging. The third group (cluster 3, n = 17) started with moderate self-efficacy, high motivation, and a high sense of belonging. Each of the three groups is described in the following sections as well as their post-participation psychosocial effects.

  • Cluster 1: High self-efficacy and high motivation and interest

Cluster 1, as observed in Fig. 2, is characterized by those students who, at the time they entered the learning community, primarily exhibited high levels of research self-efficacy (M = .75, SD = .06), high levels of motivation and interest (M = .74, SD = .13), and high levels of sense of belonging and socialization (M = .83, SD = .11). These students also exhibited a moderate level of career awareness in research (M = .56, SD = .19). These students, after one year of having participated in the learning community, reported high increases on each of the psychosocial domains including research self-efficacy (M = .85, SD = .06), t(40) = 7.32, p < .001, motivation and interest (M = .75, SD = .15), t(40) = 0.43, p = .67, career awareness in research-related fields (M = .87, SD = .12), t(40) = 10.16, p < .001, and sense of belonging and socialization (M = .89, SD = .10), t(40) = 3.69, p < .001. From these four domains, the increases were statistically significant except for the measure of motivation and interest, which remained constant.

Fig. 2
figure 2

Psychosocial effects experienced by students in Cluster 1 at T1 and T2

Appendix Table 5 shows the subdomain scores for the students in cluster 1 at both time-points and the change and statistical significance. The largest changes in research self-efficacy were related primarily to data analysis and interpretation skills and data communication skills, and overall research knowledge. Research planning skills, group work skills, and the ability to pursue a research-oriented career changed only minimally. Students in Cluster 1 also experienced large changes in career awareness. Regarding their sense of belonging and socialization, students experienced the largest gain in terms of a sense of belonging with their peers. Finally, it can be observed that this group of students started with high levels of motivation and interest, which remained unchanged throughout their participation in the year-long experience.

  • Cluster 2: Moderate self-efficacy and moderate motivation and interest

Cluster 2, as observed in Fig. 3, is characterized by those students who, at the time they entered the learning community, primarily exhibited moderate levels of research self-efficacy (M = .64, SD = .06), moderate levels of motivation and interest (M = .66, SD = .14), and high levels of sense of belonging and socialization (M = .76, SD = .09). These students also exhibited a moderate level of career awareness in research (M = .55, SD = .17). After one year of having participated in the learning community these students reported higher levels in the psychosocial domains of research self-efficacy (M = .71, SD = .04), t(25) = 5.87, p < .001, and career awareness in research-related fields (M = .71, SD = .15), t(25) = 4.67, p < .001. This group also demonstrated higher levels of sense of belonging and socialization (M = .80, SD = .09), t(25) = 1.91, p = .07, and lower levels of motivation and interest (M = .64, SD = .16), t(25) = −1.01, p = .32, but these changes did not reach statistical significance.

Fig. 3
figure 3

Psychosocial effects experienced by students in Cluster 2 at T1 and T2

From Appendix Table 6, it can be observed that the largest changes in research self-efficacy were related primarily to the subdomains of research knowledge, data analysis/interpretation skills, and research communication. However, this group also experienced a statistically significant decrease in their perceived ability to pursue a research-oriented career. In general, students also experienced large changes in career awareness regarding research-oriented career opportunities, research in graduate school, and other career options they could specialize in. Regarding their sense of belonging and socialization, students experienced equal gains in terms of perceived institutional support, socialization, and sense of belonging with their peers, but not changes in the institution’s perceived sense of belonging. Finally, it can be observed that this group of students started with moderate levels of motivation and interest, which slightly decreased after the year-long experience but not significantly.

  • Cluster 3: Moderate self-efficacy and high motivation and interest

Cluster 3, as observed in Fig. 4, can be characterized by those students who, at the time they entered the learning community, primarily exhibited moderate levels of research self-efficacy (M = 0.63, SD = 0.04), high levels of motivation and interest (M = .72, SD = .15), and high levels of sense of belonging and socialization (M = .76, SD = .09). These students also exhibited a moderate level of career awareness in research (M = .77, SD = .13). These students, after one year of having participated in the learning community, reported higher levels on each of the psychosocial domains including research self-efficacy (M = .85, SD = .06), t(16) =11.42, p < .001, motivation and interest (M = .79, SD = .18), t(16) =2.60, p = .02, career awareness in research-related fields (M = .87, SD = .10), t(16) =9.05, p < .001, and sense of belonging and socialization (M = .90, SD = .11), t(16) = 4.52, p < .001. This is the only group of students who on average, experienced statistically significant increases in the four psychosocial domains.

Fig. 4
figure 4

Psychosocial effects experienced by students in Cluster 3 at T1 and T2

From Appendix Table 7, it can be observed that the largest changes in research self-efficacy were related primarily to the subdomains of research knowledge, followed by research communication skills. Then, gains were about the same for research planning skills, research application skills to solve real problems, and data analysis and interpretation skills. In general, students also experienced large changes in career awareness regarding research-oriented career opportunities, research in graduate school, and other career options they could specialize in. Finally, regarding their sense of belonging and socialization, students experienced the largest gains in their perceptions of socialization and a sense of belonging with their peers, but no statistical gains regarding the perceived institutional support. Finally, it can be observed that although this group of students started with high levels of motivation and interest, they still statistically increased their perceptions in this regard.

Discussion and Implications

Findings from our study aimed to understand factors that contribute to student positive experiences in the context of a year-long, research-oriented living-learning community. In this context, positive experiences can be characterized by students’ overall high levels of interest and motivation, research self-efficacy beliefs, sense of belongingness and socialization, and moderate levels of career awareness of related fields. Overall, we identified that a learning community that integrates (1) block-scheduling comprising three courses on computing with data, statistics, probability; (2) a faculty-mentored, team-oriented, data science research experience involving applied investigations appropriate for sophomores; (3) a living community in the same residence hall floor, with social and academic activities to enhance the sophomore experience; and (4) participation in a professional development seminar each semester, to prepare for graduate school and career exploration in data science, can result in positive experiences for the students. This positive trend can be observed from overall increases in the average scores regarding the whole sample’s four psychosocial effects (see Appendix 6).

However, we also identified subgroups of students that on average, entered into the learning community with different levels of research self-efficacy beliefs, interest and motivation, and sense of belonging and socialization. For instance, regarding research self-efficacy beliefs, one group of students (Cluster 1) entered the learning community with high levels of self-efficacy, with statistically significant increases at the end of the year-long experience. The other two groups of students (Cluster 2 and Cluster 3) entered the learning community with moderate self-efficacy levels and experienced statistically significant increases in their research self-efficacy at the end of the year-long experience. However, Cluster 3 that started at the same self-efficacy level as Cluster 2, reached parity with the high self-efficacy cluster 3 by the end of the year. In contrast, Cluster 2, despite its statistically significant increase, failed to reach even the starting level of Cluster 1. Additionally, the difference in the self-efficacy endpoints of Cluster 1 and 3 versus Cluster 2 may be attributed to a ‘ceiling effect’ in the self-efficacy questions.

A similar pattern played out in the sense of belongingness and socialization domain. Although all three clusters demonstrated statistically significant growth over the year, they did not have congruent responses. As in the self-efficacy domain, students in Clusters 2 and 3 entered the year at lower levels of sense of belonging than students in Cluster 1. By the end of the year, Clusters 1 and 3 had reached ‘near saturation’ levels of sense of belonging. Similar to the self-efficacy domain, students in Cluster 2, despite its improvement, did not reach the starting point of students in Cluster 1. Interestingly, part of these effects seems to be mediated by a change in the subdomains of sense of belonging. For this subdomain, Cluster 3 started in the moderate range with Cluster 2, but ended with a higher mean than Cluster 1, whereas Cluster 2 had a relatively modest improvement. This pattern continued within the sense of belonging to the institution subdomain, but in this category, all three clusters were relatively high, so the differences were muted. The three clusters all reported similar feelings of institutional support.

Regarding career awareness, all three groups of students entered the learning community with very similar, moderate levels. By the end of the year, all the clusters had substantial gains and reached high levels. As with the other domains, the ending levels of Cluster 1 and 3 were higher than those of Cluster 2, but unlike self-efficacy and sense of belonging, the career awareness of Cluster 2 finished higher than the starting point of Cluster 1.

Across the domains of research self-efficacy, sense of belonging, and career awareness, Clusters 2 and 3 both begin at very similar levels and experience statistically significant growth. However, Cluster 3 experienced much greater increases than Cluster 2 and reached parity with the higher scoring Cluster 1. Understanding the differential response of these two clusters may offer valuable insight into which students benefit the most from learning communities and why. The motivation and interest domain may be informative; unlike the other three domains, in this domain, Cluster 3 started at a high level, closer to Cluster 1, compared to the moderately scoring Cluster 2. Additionally, while Clusters 1 and 2 did not change, Cluster 3 increased and ended up as the highest scoring cluster.

Perhaps the higher level of motivation and Interest in Cluster 3 can explain its greater growth in the other domains than Cluster 2. Expectancy-value theory describes the nature of achievement motivation (Wigfield, 1994). This theory posits that “individuals’ expectancies for success and the value they have for succeeding are important determinants of their motivation to perform different achievement tasks” (Wigfield, 1994, p. 50). Expectancies were defined as individuals’ anticipations that their performance will either succeed or fail (Atkinson, 1957). Expectancies thus encompass individuals’ beliefs about how well they will do on a task (upcoming, currently, or in the future), as well as individuals’ beliefs about their current competence or ability (Eccles, 2005). Expectancies are associated with self-efficacy beliefs regarding individuals’ capabilities to accomplish a certain task and measures along with individuals’ confidence in how well they can perform the task (Schunk & Pajares, 2009). On the other hand, value can take any of the three forms: (a) attainment value refers to how an individual assigns importance to a particular task as that task is a representation of individuals’ identity (Cooper et al., 2017) and defines their competence. (b) Intrinsic value refers to the delight that an individual attains after accomplishing the task (Cooper et al., 2017). (c) The utility value or usefulness, on the other hand, refers to how a task fits into an individual’s plans (Wigfield & Cambria, 2010). Utility value has been related to motivation because by performing a useful activity, the activity itself becomes a means to an end, where such end can be the attainment of a certain occupation (Wigfield & Cambria, 2010).

Through the lens of Expectancy-value theory, perhaps the higher motivation and interest of Cluster 3 can be interpreted as greater utility value on the learning community. The similar initial sense of belonging scores argues against a difference in attainment value between the two clusters. The public perception of statistics and data science as a popular career may direct students to perceive participation in this learning community in terms of future plans. The substantial growth in career awareness of all three clusters is indicative of the occupational focus of the learning community. Additionally, differential growth in the research self-efficacy subdomains of self-efficacy to pursue a research career, research communication, and research applications are potentially instructive. Cluster 3 had very large gains in these three categories and ended up surpassing the higher-achieving Cluster 1. These three subdomains are all applied career-focused skills, further reinforcing greater utility value for Cluster 3.

Other studies that have used Expectancy-value theory as a mechanism to study disciplinary communities have identified relevant results. For instance, in the context of introductory biology courses, achievement-related behavior has been reported as a joint function of individuals’ expectancy of success and the subjective value placed on such success (Sullins et al., 1995). Specifically, as might be expected, biology majors were found to place higher subjective value on success in the course than non-majors. Additionally, a subjective value significantly predicted students’ intent to enroll in future biology courses (Sullins et al., 1995). Similarly, a study of a virtual community also found that motivation significantly influences sustained participation intention (Sun et al., 2012).

Participation in linked classes as part of a STEM learning community has been reported to positively influence students’ attitudes, learning experiences, and motivation in STEM (Freeman et al., 2008). Indeed, the motivational benefits of a curricular focus on student interests are the reason dêtre of discipline-based learning communities. Notably, the decline in both the motivation and interest subdomain scores for Cluster 2 stands out as the only subdomain scores that decline over the year. Meanwhile, in addition to a higher starting motivation and interest level, the students of Cluster 3 were unique in experiencing growth in this domain. Perhaps these trends point to feedback loops between motivation and interest, engagement in the discipline, and the other psychosocial domains. As the students of Cluster 2 learned more about the discipline of research in the context of statistics and data science, they may have grown less motivated and interested by the field, and therefore less engaged with the learning community resulting in more limited growth in the other psychosocial domains. In contrast, the students of Cluster 3 may have experienced a virtuous cycle wherein higher motivation and interest in the discipline lead to greater engagement and further growth in the other psychosocial domains. For example, the impressive growth by Cluster 3 in the aforementioned three applied research self-efficacy subdomains could reflect participation in extracurricular research or other activities by this cohort.

One major limitation of our study was the inability to match our survey results with student background data. Could the more moderate prior research self-efficacy of Cluster 2 and 3 reflect differences in student preparedness entering the learning community? The fact that all three cohorts reported similar levels of career awareness argues against this possibility, but it remains an important avenue for further study. Similarly, data regarding demographic background or academic outcomes were not considered, thus reflecting our study’s second limitation. It would have been very valuable to understand whether there were differences in academic preparation associated with the three student clusters. Additionally, it would have been useful to know the extent to which self-reported self-efficacy scores aligned with GPA, undergraduate research participation, and other independent metrics for student academic success.

Implications for Learning

Our findings suggest that the alignment of the learning communities’ curricular focus and the students’ motivation and interest may be instrumental in resulting in positive psychosocial effects on students. Particularly for those who start with more moderate levels of self-efficacy and a sense of belonging. This study has implications for implementing learning communities that intentionally integrate the four components of cognitive apprenticeships. Cognitive Apprenticeship (Collins et al., 1991; Collins & Kapur, 2014) describes the components of learning environments that promote the development of cognitive and metacognitive skills by merging “the content being taught, the pedagogical methods employed, the sequencing of learning activities, and the sociology of learning” (Collins et al., 1989, p. 3). While learning environments often integrate (a) the content in the form of types of knowledge required for expertise, (b) the method regarding the learning strategies, pedagogical approach, or teaching methods used, and (c) the sequencing regarding the structure and the order of the tasks to optimize meaningful student engagement; less often a focus is placed on the sociology of learning. Sociology refers to the context within which learning experiences are situated via applying skills to realistic problems. Living learning communities place a stronger emphasis on the sociology of learning. The additional value of living together in the same residence hall allows for further synergies between teams and other faculty mentors, resulting in stronger self-reported learning outcomes (Inkelas et al., 2008) and moderate to strong psychosocial effects as shown in our study.

Conclusion and Future Work

This study investigated the effects on interest and motivation, self-efficacy beliefs, sense of belongingness and levels of socialization, and career awareness of participating in a year-long residential research-oriented learning community. Our analysis identified three groups before entering the learning community and their changes after the year-long experience. Similarities among the three groups at the end of the year-long experience were that all of the students ended with high levels of research self-efficacy beliefs, high levels of sense of belonging and socialization, and high levels of career awareness. Differences among the three groups at the end of the year-long experience related to their levels of motivation and interest, where one group reported high levels and stayed the same, a second group reported moderate levels and stayed the same, and the third group reported high levels and still experienced statistically significant increases in their levels of motivation and interest. These findings suggest that the intentional orchestration of content, method, sequencing, and sociology aspects of the learning community, irrespective of students levels of motivation and interest, had an impact on their research and data self-efficacy beliefs, sense of belongingness and socialization within the learning community, and career awareness in research-oriented fields.

Our study’s limitation relates to the self-selection process for participation in the program and the moderate to high levels of self-efficacy beliefs and motivation and interest that students reported before starting the program. Consequently, a limitation of the study was the small sample size, which precluded the assessment of construct validity. However, we did validate our scale on the principles of content validity. The sample size limitation also resulted in two clusters (i.e., Cluster 2 and Cluster 3) with fewer students within each of them. Nevertheless, the clusters were indeed statistically significantly different from each other, as confirmed by a Welch’s Test.

Similarly, since the sample was a mixture of students from five implementations of the learning community, each having the potential for changes, this might have reduced the reference value for the results. Despite the limitations of our study, and although studies reporting on institutional outcomes of learning communities in higher education are growing, studies focused on students’ psychological factors are not common. Our study thus contributes to this literature by providing a specific example of a comprehensive learning community that integrated aspects of (a) shared coursework in the domain of statistics, probability, and data science, (b) a faculty-mentored, team-oriented data science research experience, (c) a living community in the same residence hall with social and academic activities, and (d) participation in a professional development seminar for career exploration in research-oriented fields; all providing a cognitive apprenticeship. Therefore, our future work will continue to investigate the characteristics of effective learning spaces that embed situated learning theory and communities of practice, along with their effectiveness on student learning, achievement, and attitudes.

Data Availability Statement

Data available on request from the authors.