Keywords

1 Introduction

“I believe that education is the fundamental method of social progress and re-form.”

 ~ John Dewey (1897).

“There’s no such thing as neutral education. Education either functions as an instrument to bring about conformity or freedom.”

 ~ Paulo Freire (2000).

I begin this essay with two quotes. One each from these pillars of modern educational philosophy, John Dewey and Paulo Freire. In the first, Dewey, the father of modern pragmatism, states in his early essay “My Pedagogic Creed” something we likely all take as benign and universally understood. And yet it would be difficult to articulate a more powerful sentiment. Education is simultaneously how we are and how we become, and the links between knowledge, experience, and social growth are inseparable. We and our destinies are shaped by education. Complementing and adding further context to Dewey’s assertion, Freire’s quote from Pedagogy of the Oppressed highlights the significance of making choices that liberate rather than shackle the learner. Liberating education, for Freire, is about individuals living up to their fullest potential, and it is contrasted by “banking constructs” of education, which lead to intergenerationally reproduced power dynamics and social injustices. Taken together, these quotes indicate a path toward a better tomorrow: Our past must inform but not dictate our future.

1.1 My Context

I grew up in Alaska in the 80s, and distinctly remember one winter afternoon in first grade, while my classmates were out building a fort and throwing snowballs, when I got into a fight with a classmate. We were both escorted to the office, and when we were brought before the principal, she produced a paddle and asked if we knew what it was. Of course, I’d heard stories about paddles. It’s doubtful any kids in my school hadn’t, but this was my first time actually seeing such a device in person. There were holes drilled through it, which I’d interpreted as a design enhancement to reduce air resistance and improve velocity. Someone had clearly put some considerable thought into engineering this apparatus to maximize its impact – both metaphorically and literally – on getting children to act differently. Ultimately, however, it was just a well-crafted stick.

I’m happy to report that I was not, in fact, paddled that day, and was released with a stern warning. (Many of my elders can attest to not getting off so easily with their school authorities in similar situations.) The incident, however, did have a long-lasting influence on how I think about the role of behaviorism in education, including punishments, as well as positive and negative incentives, particularly when juxtaposed with my later academic and research experiences. The overarching framing to consider for purposes of this discourse: How do we assure our modern intervention strategies, whether for behavioral or academic purposes, aren’t similarly judged as unethical or even abusive in retrospect? How do we employ powerful new tools of educational analytics and intervention with intention and precision towards holistic student growth? These are complicated questions demanding nuanced solutions along with thoughtful consideration not only whether we can do something, but also if and when we should do it. Both intentions and impacts must be considered carefully.

1.2 A Brief History of Our University’s Intervention Strategies

UMBC, a “mid-sized mid-Atlantic R1 institution” as I often refer to it at conferences, developed a data warehouse nearly two decades ago to address institutional reporting and accreditation needs, particularly related to student persistence, graduation rates, and time to graduate. This dynamic is of central importance to the eventual development of internal predictive models because it is typically much easier to work with the historic data that’s already been ingested rather than to create new workflows to corral additional sources. In turn, our existing risk modeling is based on these scaled metrics, rather than upon direct measures of learning such as one might assess with a test or rubric (Harrison 2020) for which there is no state-level reporting mandate.

Over the past decade, various pilot projects have emerged to gain real-time insights into our student population and inform behavioral nudging campaigns.Footnote 1 Initially, these strategies were based on vendor-provided products. Using the outputs from these algorithms, we were able to identify some important gateway courses and hone strategies for student outreach, including use of messaging based on students’ predicted chances of success. We found, however, that the black-box architecture of these solutions limited our ability to not only know exactly what measures were in the model, but also to gain insights through modifying feature inputs. In short, we were at the mercy of the party that designed the product, and if we don’t know what’s going into a model, how do we know we aren’t unintentionally reproducing culturally biased outcomes (Cheuk 2021)? In the past several years, internal efforts have led to the development of our own set of predictive models and data visualizations. Building, testing, and deploying our own predictive models has led to greater opportunities for customization resulting in improved institutional fit along with a potential benefit for data-use transparency. However, the alignment between what is being predicted and the corresponding intervention strategies, as well as strategies for monitoring and evaluation, leave important questions unanswered.

1.3 The Status Quo: “The Right Message to the Right Person at the Right Time”

“Our records indicate that you are failing your Chemistry 101. You currently have a 59% in your class. There are only 3 weeks remaining in the semester and you will need 100% the remainder of the term to earn a C”.

This slightly fictionalized message has a very punitive feel, doesn’t it? My colleague Robert Carpenter has argued it’s a bit like receiving a speeding ticket (Carpenter and Penniston 2019). Indeed, such a nudge, as the Nobel Prize winning economist Thaler and his partner Sunstein (2009) popularized, or intentionally designed intervention that can result in an individual or group behavioral modification, could have the unfortunate effect of pushing students away from class rather than redirecting them back to it. And who is this nameless, heartless harbinger from the void, and why couldn’t this notification have been sent earlier in the term? Because the absolute last thing educators want to tell students at the end of the term is that all they need to do to pass their course is to suddenly become something they haven’t been all semester: an A student.

Now let’s consider this second message, a version of which was actually sent to students:

“I know this time of year can be busy and stressful, and while the end of the semester seems like it may take forever to get here, it will arrive before you know it. I am checking in with you to make sure that you are okay and to offer you some resources and support if you need them”.

This second message attempts to articulate that the instructor cares and values the student as a human. Accompanied with some links to educational support services and sent early in the term with the course instructor’s signature line at the bottom of the email and receiving this message may not be the kiss of death. There might – perhaps – be hope for the students who receive such a nudge?

2 Literature Review

Without digging into the diverse offerings from the humanities, there are three basic buckets of traditional scientific research: qualitative, quantitative, and mixed methods. Qualitative and quantitative methods will be described in further detail below, while the latter combines aspects of both, with the intention complementing meaningful statistical insights with rich contextual description (Johnson and Onwuegbuzie 2004).

2.1 Qualitative and Quantitative and Machine-Learning (Oh My!)

Qualitative research can take many forms, from focus groups to interviews and open-ended surveys, to content analysis, and so forth (Check and Schutt 2012). The resulting data from these analyses are descriptive and interpretive in nature and explain the context of and surrounding various constructs. These analyses typically tell us about what's going on with richer and more nuanced perspectives, but the scale of the data is too unwieldy to leverage for on-demand adjustments, or intervention strategies.

Quantitative research, on the other hand, can be broken out into two primary branches: Classical econometrics/statistics (Wooldridge 2009), and newer machine learning (ML) based methodologies (Hastie, Tibshirani and Friedman 2009). Of the former branch, there is a continuum of possibilities, included in a range from descriptive statistics to nonparametric and parametric inferential statistics, to causal and predictive modeling. These methods are helpful for considering the relationship between two or more variables and are particularly relevant for project monitoring and evaluation.

The key point here is in using classical statistics, one typically starts with representative data and then attempts to make population-level inferences through hypothesis testing. ML, however, does not work this way, and in many respects actually operates opposite to this orientation.

ML approaches fall into a range from supervised to unsupervised (Hastie, Tibshirani and Friedman 2009). An easy way to conceptualize them is deductive vs inductive. Supervised models start with an end point. You are using data to map, explain, or fit to an outcome. Predicting students’ actual negative outcomes such as DFWs (i.e., a grade of D, F, or W), retention, or persistence, for example.

Unsupervised models operate in the opposite way. They take data and put them into digestible chunks, or buckets. So, we can start with a trove of attributes, and where human eyes may not see them, discern patterns and relationships. In educational terms, it might help to think about this as distilling learning management system (LMS) data (e.g., Canvas and Blackboard interactions, or clicks on content) and student information system (SIS) data, such as bio-demographic indicators and past academic performance variables, in order to determine different typologies of students’ attributes and/or behaviors.

“Machine Learning” is a term that gets bandied about across traditional and social media quite frequently these days, but it may help some readers if we take a moment to demystify what the term actually means operationally as it is important to have a fundamental understanding of the methodologies to recognize the potential pitfalls discussed in this essay. Sharkey (2015) has suggested that ML is akin to dropping a bunch of Plinko chips down a peg board. In the supervised models, we would take a sample of the chips, dump them down, use the resulting outcomes to predict the pathway of the other chips, then map those results over to a wider population. And we can do this iteratively to improve a model’s recall and precision moving forward. In an unsupervised model, we could establish a predetermined number of perhaps three or four groups and then look at how the chips cluster on the board to establish archetypes based on their proximity of the chips to one another.

In general, with ML approaches the more data the better the analyses. However, the quality of the data is also key; “garbage in, garbage out.” as they say. (There’s perhaps no better example of this dynamic than Microsoft spectacular failed release of its Twitter bot, “Tay.”). If one uses ML to predict students’ chances of earning a preferred outcome of an A, B, or C in a given course during their first term at a university, then they need to have some kind of data with which to develop and train a (supervised) mode. In this case, we can look back to our Plinko chips, which include various indicators that can be mapped to the successful outcome. Students standardized test scores and high school GPAs. I can also look at when folks registered for class (registering early is associated with positive outcomes, while registering late is associated with negative outcomes). Basically, anything that’s been included in the student application that does not directly contribute to unethical modeling. So, although one should stay clear of including markers like race/ethnicity, gender, and nationality – three of the most studied algorithm bias inputs (Baker and Hawn 2021) – in predictive models of student success due to justifiable legal concerns and widely held (and socially constructed) attitudes, one could include Census data with median household income by zip code to control for socio economic status (SES). Note that one typically exercises no meaningful locus of control over their childhood household SES, but there is not the same degree of social concern for including this measure in a predictive model.

This dynamic begs the question of whether any attribute over which students have no direct control should be included in predictive modeling in education, which is a key argument of this paper. After all of the model inputs are determined, one would engineer features. Features are just operationally defined variables, which may, for example, require one to transform nominal variables into binaries, standardize numeric variables as z-scores, or apply other similar manipulations to create a functional data frame. One can include numerous features, but, as with traditional econometric, there is a risk of overfitting the model, which would reduce its plasticity when applying to a wider population (Hastie, Tibshirani and Friedman 2009). Given these features and subsequent model training, a probability of success is generated, ranging from 0 to 1, where 1 is 100% chance of achieving the given outcome.

3 The Analysis

As a particular example to illustrate the current challenges of traditional learning analytics, consider the following, recent UMBC findings. During the Spring 2022 15-week term, an ongoing ML-base nudging initiative included 2,421 students in 17 high-enrollment gateway STEM courses. Using historic data to initially train it, the model created a weekly predicted probability of students earning a DFW in the course participating in this pilot nudge project described earlier in this paper. ML modeling demonstrated .81 precision, meaning 81% of students identified as earning a DFW actually went on to earn one. Below is an illustration of the formula for calculating precision:

$$ \frac{{{\text{Correctly}}\,{\text{IDed}}\,{\text{DFWs}}}}{{{\text{Correctly}}\,{\text{IDed}}\,{\text{DFWs}} + {\text{False}}\,{\text{Positives}}}} $$

The modeling also demonstrated .4 recall, or sensitivity. In turn, this 40% would represent all of the cases correctly identified as earning a DFW divided by the true positives and false negatives. Below is an illustration of the formula for calculating recall:

$$ \frac{{{\text{Correctly}}\,{\text{IDed}}\,{\text{DFWs}}}}{{{\text{Correctly}}\,{\text{IDed}}\,{\text{DFWs}} + {\text{False}}\,{\text{Negatives}}}} $$

Predictions informed empathetic nudges that were sent to students with a ≥ 50% predicted chance of earning a DFW for the term. Given the precedence of past UMBC projects, we know that by approximately Week 4 of the term, behaviors within a semester supplant pre-measures as most predictive features in our home-grown models. We see that 14.49% of students are predicted to earn this negative outcome according to the Week 4 values. That number drops down to 5.59% by Week 7, presumably because of student behavior in the given course and within their other courses. The actual DFW rate was 18.13%. Nudges were sent from an internal campus system during term weeks 4 and 7. A third alert based on a manual, instructor-referred rather than ML-informed message was also sent out. Only 6 of the participating courses made use of this legacy system, representing 286 nudges, or 11.8% of the total cases.

Notably, 34% of the students who received the first nudge went on to receive a DFW, while only 12% of the untreated students did. Of those students receiving both the Week 4 and Week 7 nudges, 66% earned a DFW. And 81.8% of students who received all three nudges earned a DFW. In other words, as students accumulate more nudges, their chances of success in their course greatly diminished. However, given the precision of the model described above, we see that almost exactly the same percentage of students would have likely earned a DFW independent of the intervention. In turn, given students appeared predisposed to earning a negative outcome, the relationship between nudging and observed DFWs appears corollary rather than causal. So, how do we determine whether this nudge was in any discernible way successful rather than a reification of the prediction – a self-fulfilling prophecy – and in turn help inform improved processes moving forward? Given that the chance of a negative outcome doubles between receiving the first and second nudge, this analysis focuses on the identifiable behavioral differences between the first and second nudge. From a classical econometrics/statistics perspective, the first question we must investigate is the extent to which the first nudge results in any measurable lift in terms of the dependent and independent predictive model inputs: DFWs and interactions, respectively.

There are a lot of individual constructs that get wrapped into a DFW. Instructors employ direct, observable behaviors as well as indirect measures. These data take the form of formative assessments, such as polling or quizzing, all the way up to the dreaded high-stakes test, which, in many people's minds, epitomizes summative assessments. Mixed into these measures might also be journals, blogs, portfolios, presentations, interpretive dances, and other means to infer learning along a qualitative to quantitative continuum (a blog, for example, could represent either qualitative or quantitative data depending on whether one developed a reliable and valid rubric for grading).

So, we have all these pieces which we typically roll up into an end of term percent based on the course grading schema and weighting. Assuming one is not auditing the given course, or has selected a P/F option, that percent is associated with an A-F letter grade, and only at this point can one create our institutionally meaningful binary, the DFW (Fig. 1).

Fig. 1.
figure 1

The more nudges a student receives the more likely they are to earn a DFW.

The problem, as hopefully illustrated by the above description of the grading workflow, is there are an incredible number of attributes and measures that must be distilled to arrive at this either/or outcome before we even begin to consider exogenous influences, or those external to the model (i.e., life). Only after boiling this sea of data might we begin to look at the relationship between our interventions and the given outcomes.

Since there are so many variables comprising the DFW, to change a student’s state from a 1 to 0, or from a Yes to a No – i.e., redirect a student from failing to succeeding – the impact of any intervention must be great enough to overcome the initial trajectory in much the same way one must overcome a physical object’s inertia to redirect it. In turn, the more at-risk students are, or the later in a semester they are identified, the more impactful the intervention must be to benefit them.

What does this all mean for our present conversation? Well, it is exceptionally difficult to measure the efficacy of an intervention strategy using DFWs alone. There’s typically too much inertia to overcome to help students as we’d like without something more dramatic than a nudge. At the same time, from a pragmatic perspective, measuring student awareness of resources is only consequential to the extent such recognition leads to action with measurable lift in an observable behavior associated with learning (i.e., a direct measure of learning). Also, given sample size and environmental factors, it’s impossible to state X caused Y ceteris paribus, mostly because we don’t know what all things being equal actually means under the best of circumstances, let alone amidst a pandemic and its associated trailing effects. In this particular analysis, determining a causal relationship between a treatment and a state change (1 → 0) is exceptionally difficult. Using propensity score matching based on the initial ML-generated predictions, for example, results in collinearity since all students achieving a certain threshold received a nudge.

Early identification of at-risk students helps but does not fully mitigate this dynamic. Particularly regarding the behavioral nudge campaign, it perhaps makes sense to look at other behavioral measures, such as interactions and time spent in the LMS because those are the variables that we have available to us, and because they have over the course of multiple projects proven reasonable proxies for engagement.

As Fig2 above below, students identified for messaging not only start off below their peers in LMS engagement, but they fall even further behind following the initial notification. Part of the problem is that even as our insights are enhanced and we merge our data silos, there remain gaps in what we know about our learning ecosystem. It’s reasonable, for example, to assume that if a student receives a nudge from the system, then they may seek out resources that are not LMS specific. That student could chat with their professor or visit the Student Success Center, for example. Doing so wouldn’t be captured in our existing data and this self-efficacy might benefit students in their other classes (also unmeasured in the current analysis). These outcomes could perhaps indicate the nudging was successful, and we simply aren’t yet fully aware of the complex picture to evaluate it. However, any academically successful connection with a human advocate would result in students being redirected to do their class work, which would in turn (if the advice were followed), result in greater interactions within and signal from the LMS. Key here is how we operationally define, measure, and track these successes. We would hope these interactions would amount to more than just clicking on stuff to improve one’s grade, but who knows? In any event, we are unfortunately not seeing this outcome with our current data.

Fig. 2.
figure 2

Nudging is not associated with increased LMS engagement.

3.1 Why’s the Status Quo Problematic?

Reich (2022) succinctly summarized the tendency of using big data in learning analytics as revealing the Truth that “students who do stuff do other stuff, and students who do stuff, do better than students who don’t do stuff” (p. 192). Yes, one can use interactions within an LMS as proxies for engagement but acting to increase these measures can be perceived by some students as Orwellian and potentially contribute to increased stress for students who are already identified as at-risk (Brown, Basson., Axelsen, Redmond, and Lawrence, 2023). And although we believe more engagement is better than less, the link between interactions and learning isn’t yet substantiated. The question that arises, therefore, is how do practitioners move beyond using clicks to predict and increase clicks into a new paradigm of leveraging meaningful insights to support direct measures of students’ success? And just as consequential: How would we know if any of our data-informed interventions have a positive impact?

These questions help establish an ethical operational starting point. If one uses ML to predict students’ chances of earning a preferred outcome of an A, B, or C in a given course during their first term at a university, we begin with some kind of data to develop and train a (supervised) model. In this case, I might create features with various indicators that can be mapped to the successful outcome. Students standardized test scores and high school GPAs. I can look at when they registered for class (registering early is associated with positive outcomes, while registering late is associated with negative outcomes). Basically, anything that’s been included in the student application that does not directly contribute to unethical modeling. But what do we do if a student has a sufficiently high probability of an undesirable outcome? In other words, given learning analytics is analysis in action, the $64,000 question is: What ethical action should we take?

3.2 First, Do no Harm!

The first concern in moving forward is doing so ethically. Would it be acceptable, for example, to use predicted values to advise students? Our university Provost has justifiably said “no.” Afterall, if we include premeasures that may be closely correlated with SES into that model and then advise based on these values, aren’t we simply promoting social reproduction, and perhaps using this “unbiased” tool to sort students based on historically unjust dynamics? In turn, if a student grows up in a poorer neighborhood associated with poorer outcomes, do we accept it would be OK to suggest to them to take a lower-level math course based on the probability of their success, which would ultimately lead them to a different major, different, lower paying job, reduced lifetime earning potential, and a perpetuation of the cycle? Does doing so, as Freire argued, leads us toward elevating freedom over conformity? On the other hand, if we have data and we know of risk, aren’t we compelled, ethically, to act if we have a valid treatment (Fritz and Whitmer 2019)?

In general, students’ chances of success in a course can follow a simple x, y plot, whereby the weeks of the semester are along the x axis while their cumulative engagement relative to their peers is plotted along the y axis. Students’ use of the university’s LMS, for example, or their interactions with various digital tools can be included as model inputs. Perhaps attendance, or network pings, can be included. All of these proxies for engagement are included in the models, and over the course of a few weeks, replace these pre-measures as most consequential in determining student success. It’s important, however, to emphasize that these behaviors are not available at Week Zero to act upon models informed by them.

That’s not to say that proxies aren’t valuable or that we shouldn’t use descriptive and predictive analysis to close the gap between access, success, and upward mobility. Nor is it problematic to use adaptive learning based on ML in a closed model, such as through use of courseware that reinforces probability and statistics skills (Van Campenhout, Jerome, and Johnson 2020). Our institution is piloting use of these types of platforms to scaffold student content acquisition in high-enrollment gateway STEM courses (Carpenter, Fritz, and Penniston, in press). Indeed, leveraging ML to triage students into buckets to address allocation of finite resources is a reasonable, scalable, and widely accepted business proposition (Prinsloo and Slade 2017). It makes sense from an economic standpoint to address persistence and graduation rates. The ethical difficulties, rather, arise when we think in terms of data driven instead of data informed practice, particularly when the outcomes we predict are tied to institutional reporting outcomes that don’t necessarily align with students’ best interests, including learning.

Higher education often tries to thread a semantic needle by differentiating between learner vs learning analytics (Bishop 2017), but this is largely a distinction without difference if we’re focusing first on our students’ needs. They are, after all, our business. It’s incumbent, therefore, upon higher education pedagogues and administrators alike to look for collaborative ways to flip the existing paradigm by measuring success as student learning (which may subsequently contribute to improved persistence and graduation rates) rather than trying to improve institutional measures in the hope of student learning as a byproduct. Reflecting back on our opening quotes from Dewey and Freire, highlighting the interconnection of knowledge, experience, and social growth, we must vigilantly re-evaluate our business practices to assure students’ learning needs are prioritized. So, what do we do?

The overarching flaw in having a general predictive model is it directs us to general rather than targeted intervention strategies. We send all too easily ignored behavioral nudge notifications, which despite all our best intentions to encourage self-efficacy, can be interpreted by the recipient as thinly veiled threats. “Come to office hours”. “Go to class”. We refer them to case managers who may direct them to one-on-one tutoring. They, in turn, must spend additional time on content remediation without additional benefit, such as micro-credentialing, which is easily interpretable as a negative outcome. A stick, for some. Thus, the status quo would be greatly improved if the intervention were designed to affect marginal gains for direct measures of students’ learning rather than increases in proxies for engagement, or indirect measures such as reduced DFW rates. We use ML as a tool to triage students into buckets, informed by models that, depending on when the runes are read, can be reliant on pre-measures closely tied to features over which students have no locus of control, such as where they were born and raised. The institution might use these approaches to move the needle on 6-year graduation rates, but is that inherently a win for our students, or is it a win for the university, and then only secondarily, and hopefully by extension, the students?

4 Conclusion

And so, I return to my opening question: How do we assure our modern intervention strategies aren’t similarly judged as misguided, callous, or even abusive in the future? It would be ironic, given the quotations I started this paper with, to now provide a prescriptive cookie-cutter framework that could possibly fit all schools’ needs, let alone the needs of their students. Practices evolve over time through democratic dialogue at individual institutions, and throughout the preceding essay I’ve attempted to remain descriptive in discussing existing considerations, albeit not entirely neutral. I don’t shy away from my positionality. Education, to channel my inner Freire, is inherently political. So, in what ways might we shift our existing practices to better scaffold students’ growth to help them live up to their fullest potential? As I’ve alluded to throughout this paper, there are certain areas that all schools might consider—my own included—in moving toward an improved version of their own unique learning analytics practices.

4.1 Authentic Experiences: Measure What’s Needed, not Merely What’s Available to Us and We Can Get Away With

To begin with, we should be intervening in alignment with our learning outcomes.

Well-designed courses, such as those that meet the best-practice framework of Quality Matters standards, should explicitly articulate a direct link between what is being taught during any given class, back to unit-level objectives, continuing up the hierarchy to course-level goals, and ultimately on to institutional functional competencies. Such alignment is a hallmark of best practices and an institutionally scalable intervention (Fritz, Hawken and Shin 2021) while more holistically designed courses can help improve the insights instructors and administrators gain within a term (Fritz, Penniston, Sharkey, and Whitmer 2021). Surely in our new reality in higher education ushered in by the pandemic, we can, and indeed are duty bound, to better instrument all courses to achieve these ends? Evidence of learning must be directly measurable rather than only available in aggregate form through indirect constructs like letter grades, and easily discernible and available grading may ultimately prove more meaningful than institution-level nudging. Yes, although there was not an increase in student interactions subsequent to their receiving a nudge in the current analysis, intervention strategies may be associated with such gains (Brown, Basson, Axelsen, Redmond, and Lawrence 2023). However, we should be careful not to conflate the signal, in the form of clicks, for example, with what we’re really in the business of supporting: learning.

To that end, whenever possible, intervention strategies will be most effective and ethical when aligned with well-defined learning goals and achievable functional competencies, rather than based on institutional reporting metrics (such as persistence rates) to assure we don’t inadvertently introduce perverse incentives that benefit the university at the expense of the learner. It wouldn’t be, perhaps, the first-time higher education was steered astray by prestige, market forces, and self-preservation in the ongoing quest for improved institutional rankings (O’Neil 2017). To this end, if our students need to demonstrate written and oral communication, then we need to work toward directly measuring them doing so at scale and provide interventions accordingly. And these measures must be taken in advance of or at the beginning of a semester to provide a baseline to assure we don’t find ourselves late in the term telling students they suddenly need to become an A student to pass their class and graduate on time.

It is possible and entirely reasonable to make a substantive paradigm shift in terms of policy beginning institutionally from the top down. If our digital ecosystem is to live up to its yet-to-be-realized potential, then we need improved model inputs, which requires both strategy and significant outlay of resources in terms of money, time, wherewithal, and to achieve these ends, training. Authentic experiences and direct measures will contribute to content salience. What does that mean in layman’s terms? Simply that if university’s hope to address students’ math, science, or reading deficiencies, we must work with faculty to design courses to align with institutional learning goals and functional competencies and then measure these constructs as observable behaviors early and throughout each term. Tag relevant assessment items as evidence of learning in not only online and hybrid, but also face-to-face courses to support real time measurement of progress toward outcomes.

In this way, we might identify students as being at risk by indicators including standardized placement tests, ALEKS scores, or lexical complexity on written work relative to peers, and thereby not only track growth, but also connect tailored interventions to address students’ specific learning needs. If our ML identified students with a low Flesch-Kinkaid score on an institutionally mandated first year reflection, for example, that marker might be useful for automatically connecting those individuals to some form of asynchronous, self-directed learning modules, or, depending on the severity of the need, perhaps mandating tutoring. To address this need, our institution has been developing both a learning record store (LRS), which will include more disaggregated measures than our current data warehouse provides to identify student learning, along with a comprehensive learner record (CLR) system to capture the full picture of learners’ experiences (Braxton, Sullivan, Wyatt, and Monroe 2022). We must also consider incentivizing the work students put into their own content remediation along their customized pathways. Yes, students should take responsibility for their own learning (Tinto 1993); doing so is both virtuous and a reward unto itself. At the same time, we also have a broader system that incentives one to possess a degree, but not necessarily to learn. It stands to reason, therefore, that if we nudge to support self-directed learning (Fritz 2017), and there is no subsequent demonstrated positive lift in learning, then surely we need to re-evaluate not only our methods, but also our incentive model? Offering carrots rather than just sticks is one option to encourage self-regulated growth. Again, our institution is already moving in this direction with a nascent program of micro-credentialing (Braxton, Sullivan, Wyatt, and Monroe 2022). Perhaps there might even be a valid argument made for piloting a program to pay students who participate in certain forms of remediation.

Taken together, these are neither impossible, nor easy things to accomplish. Shifting our orientation will require strong will and collaboration, creative problem-solving skills, leadership from all directions along with capital of every conceivable type, and more than just a little bit of patience, humility, and dare I say moxie. In the long run, however, we will best assure our collective social growth and the fulfillment of our core mission when we prioritize course design and interventions that align with student learning.