Introduction

Football is the most prominent sport in American higher education. While basketball, hockey, and other sports may be focal points at particular campuses, the intense interest of fans, students, and other stakeholders make football a significant part of campus life at many institutions (Thelin, 2019). The benefit of football, it is believed, are multifarious. College leaders use football as a venue to engage stakeholders (Clotfelter, 2019). The sport is thought to be a significant attractor in admissions, not just for players but for students who desire the excitement of game days (Mixon & Treviño, 2005; Toma, 2003). Regional universities have added football teams at the same time they have enhanced other student amenities to shift market perceptions from those of a “commuter school” to those of an established residential university (Toma, 2010a). And athletic programs claim adding football teams can contribute to the racial/ethnic diversity of the student body, given that 39% of football players at NCAA member colleges in 2020 were Black (NCAA, 2021.

As a result, football has grown steadily over the first 2 decades of the twenty-first century. A total of 53 members of the National Collegiate Athletic Association added the sport between 2002 and 2015. The average squad size grew from 94.1 in 2001–2002 to a peak of 109.7 in 2018–2019 before declining slightly during the pandemic years, and in Division I, the average number of players hit an all-time high of 116.4 in 2020–2021 (NCAA, 2021). Among institutions in the NCAA’s Football Bowl Subdivision, the “big time,” teams had an average attendance of 41,613 in 2022 and average operating expenses of $26.8 million (NCAA, n.d. (b), Office of Postsecondary Education, n.d.) Even among those in the NCAA’s Division III, the lowest tier of competition in the association, teams saw an average crowd of 1592, larger than the enrollment at many of these institutions, and average expenses of $534,000. However, the benefits of a football team listed above are not always realized. While Mullins and Teodorescu (2020) found an increase in male and minority enrollment among a sample of colleges that had added football compared to those that had not, Bowen and Shulman (2001) noted that among selective colleges with broad-based athletic programs, sports were not a significant source of minority enrollment but instead tended to draw more white students from somewhat more diverse economic backgrounds than the rest of the student body. The negative attention generated by breaking rules or altercations involving players can damage institutions’ reputations (Downes, 2017; Zavyalova et al., 2016). And concerns are growing about the safety of the sport for its participants, prompting a decline in youth and high school participation (Feudtner & Miles, 2018; Guskiewicz et al., 2003).

College leaders wondering whether to add football teams need reliable information on whether the sport will help their bottom line. This question takes on particular significance in the mid-2020s, when colleges are expected to face three enrollment challenges. The first is the continued growth in public skepticism about the value of a college education, given rapid cost increases particularly at private colleges and concern about the debt load facing former students (Schliefer et al., 2022). The second is the “enrollment cliff”: The U.S. college-going population is expected to contract significantly during this decade thanks to declining birth rates, the Great Recession of the mid-2000s, and disturbances in enrollment patterns brought on by the COVID-19 pandemic (Grawe, 2022; National Student Clearinghouse, 2022). The third is the need to find strategies to increase diversity in the wake of the Supreme Court’s decision to ban the use of racial preferences in higher education admissions (Students for Fair Admissions v. President and Fellows of Harvard College, 2023). College leaders—particularly those at institutions whose financial strategies are based on enrollment growth—will need to know which initiatives to attract students will work, and why. While numerous studies have sought to assess the impact of standout football programs on their institutions (e.g., Anderson, 2017; Goff, 2000; Pope & Pope, 2009), they have not addressed the effects of creating such a program from scratch. The purpose of this study to address this gap by using a new dataset to examine the institutional effects of adding and dropping football teams among four-year institutions.

Background and Conceptual Framework

The literature on football’s institutional effects uses one of two frameworks. First is the contribution of football to institutional identity: having a team helps brand an institution by situating it with moreprominent institutions. This is a form of institutional isomorphism, as first discussed by Powell & DiMaggio (1983) and often applied to higher education and athletics (e.g., Cheslock & Knight, 2015; Washington & Ventresca, 2004). The highest-profile institutions in the country, be they in the Ivy League or the Big Ten, all have football, so some colleges have adopted the sport as part of a broader institutional strategy to attract the same students (Toma, 2003, 2010b). Elevating a football team from the Football Championship Subdivision in the NCAA’s Division I to the Football Bowl Subdivision was found to contribute to a positive image of the university among students, alumni, and the general public (Roy et al., 2008). As Lifschitz, et al., assert, “Football provides a widely-used cognitive map of higher education. It influences how schools see themselves and each other, and how the general public perceives the field of higher education and the place of particular schools within it” (Lifschitz, et al., 2014, p. 206).

The other theoretical framework is resource dependency theory, which posits that units within an organization that can attract external resources will attract internal resources to do so (Baxter & Lambert, 1990; Walker, 2015; Pfeffer & Salancik, 2003. This framework attempts to assess the utility of football by assessing the impact of successful teams on enrollment, applications, and other measures of increased student interest in an institution (e.g., Bremmer & Kesselring, 1993; Cressanthis & Grimes, 1993; Goff, 2000; McCormick & Tinley, 1990; Murphy & Trandel, 1994; Pope & Pope, 2009). Findings have been mixed: While Toma and Cross linked championships in football and men’s basketball to increases in applications to institutions, Smith did not find evidence that athletic success could increase yield rates, i.e., more students choosing to accept an offer of admission (Toma & Cross, 1998, Smith, 2019). Feezell notes that outside the “big time” of the NCAA’s Division I, institutions in Division II and III seem more likely to cite reasons within the resource-dependency framework, noting that institutions added sports to reach enrollment targets (Feezell, 2009).

Ultimately, these theoretical foundations can help us understand the motivations of those seeking to effect institutional change. However, whether colleges are using football to pursue legitimacy or to lure more students, their end goals are the same: to survive and persist as institutions of higher education. The goal of this study is to assess whether adding a football team is a worthwhile strategy for colleges to implement. Scholars such as Toma have discussed adding football in terms of positioning a college for prestige (e.g., Toma, 2003, 2010b), but in the long run, the question becomes, does football as a play for prestige result in better outcomes?

Structure of American College Sports

Just over 2000 institutions of higher learning in the United States maintained some form of intercollegiate athletic program in 2021–2022 (Office of Postsecondary Education, n.d.). The National Collegiate Athletic Association is the largest and most prominent governance organization and the only one that maintains longitudinal data on sports offerings over the entire time period this study covers. Colleges are divided into six main categories based in large part on their conference affiliation and the intensity of their football programs. The top level is known colloquially as the “Power Four” conferences and consists of the Atlantic Coast, Big Ten, Big Twelve, and Southeastern conferences, following the 2023–2024 dissolution of the Pacific Twelve Conference (cite TK). In NCAA parlance, these are the “autonomy conferences” and are allowed more control over rules about paying players and other matters. All members of these leagues have maintained football teams for many years and are excluded from this analysis.

The second tier is known as the “Group of Five” conferences, comprising institutions in the American Athletic, the Mid-American, the Mountain West, and Sun Belt conferences along with Conference-USA. Both first and second tiers together are the Football Bowl Subdivision and are allowed to award 85 full grants-in-aid to cover players’ costs of attendance. Several of these institutions have added football during the period covered in this study, including the University of Texas at San Antonio, the University of North Carolina at Charlotte, Georgia State University and Old Dominion University.

Both of these tiers lie within the NCAA’s Division I, and so does the third tier, the Football Championship Subdivision, whose members compete for berths in a 16-team playoff at the end of each season. Each FCS team is allowed to grant the equivalent of 63 full grants-in-aid, but are allowed to distribute them among as many players as they choose, although some institutions choose not to grant any athletic financial aid. The FCS has fourteen leagues, including the Ivy League and the two most-prominent leagues for historically Black colleges and universities, the Mid-Eastern and Southwestern Athletic Conferences. Several FCS institutions added football during the period of this study, including Kennesaw State University, Stetson University, and the University of the Incarnate Word.

The fourth and fifth tiers are the NCAA’s Division II and III. Division II schools are permitted to award the equivalent of 36 grants-in-aid, while Division III colleges are forbidden from awarding any athletic financial aid. The NCAA also requires colleges to have minimum numbers of men’s and women’s teams, ranging from fourteen in Division I (seven men’s and seven women’s) to eight (four each) in Division III. Also, the federal government mandates that colleges maintain equitable sports offerings for male and female students under Title IX of the Education Amendments of 1972 (U.S. Department of Education, 1979). Colleges adding football often must add women’s teams to remain in compliance with the law, but determining compliance can be done only via an investigation by the department’s Office for Civil Rights or a lawsuit, so it is unclear whether colleges adding football have incurred additional expenses by adding women’s teams at the same time.

As Lifschitz, et al. assert, “Football provides a widely-used cognitive map of higher education. It influences how schools see themselves and each other and how the general public perceives the field of higher education and the place of particular schools within it” (Lifschitz et al., 2014, p. 206). Virtually all of the leading universities in the United States have football teams. The most selective colleges have teams competing largely in the Ivy League, while the best-known public research universities field teams in the Atlantic Coast, Big Ten, Big Twelve, Southeastern or the Pac-12 Conferences. Stanford University is as selective as the Ivies but competes in the Pac-12. Even most small, renowned private institutions compete in NCAA Division III football, such as the Massachusetts Institute of Technology, Williams College, and the University of Chicago, the latter of which abandoned the Big Ten in 1939 but came back as a low-profile program in 1969 (Rasley, 2007).

Most of the colleges adding the sport in recent years are in other leagues. Squads at these institutions will not attract massive crowds in person or on television, do not attract significant revenue from sponsorship or ticket sales, and will rarely get on the field with the Michigans or Harvards of the world. As such, it is difficult to see football as a play for prestige. Instead, officials at football-adding colleges mostly describe the sport as a potential driver for enrollment. Given that student populations among enrollment-driven institutions skew female, an activity that could bring in males who want to play and/or those who want to attend football games would seem desirable, given that admissions officials appear to believe that a student population with roughly the same proportion of males and females will be more attractive to prospective students (Niemi, 2017). In an analysis of Berry College’s decision to add the sport, officials at the Rome, Ga., college noted football games provided a campus atmosphere that convinced students to stick around on weekends instead of heading for home or larger cities, and made the college attractive to students who wouldn’t have considered it otherwise (Suggs et al., 2020).

Colleges also have argued that adding football teams can diversify enrollment, given that roughly 40% of all NCAA football players in 2022 were Black (NCAA, n.d.). “I think we’re gonna have more people on campus, and I think we’re gonna see a wider diversity of people—[diversity] of thought, of size and shape. [T]hat’s the picture that we want at Calvin. It brings a wider diversity of people on campus,” said Calvin College’s director of athletics James Timmer in a 2022 article announcing the addition of football and two other sports (Nyong, 2022, n.p.). Previous research has shown an uptick in male and minority enrollment at small private colleges (Mullins and Teodorescu, 2020), but at the same time, football tends to be a less-diverse sport at smaller colleges outside Division I, with the NCAA reporting that only 23% of Division III football players in 2022 were Black (NCAA, n.d.).

Colleges may be using football to create a new environment that will be reviewed more favorably by existing and prospective supporters (Jones, 2015; Pfeffer & Salancik, 2003). This would reflect the resource-dependency paradigm: organizations will attempt to broaden their network of potential resource providers to minimize uncertainty in supply lines, and fans of or participants in football might be an attractive group of potential resource providers (Pfeffer & Salancik, 2003). Resources include direct dollar support as well as access to resource providers through brand enhancement, increases to prestige, and access to new markets. By adding the highest-profile sport available to colleges, institutions can signal to state agencies, donors, and other stakeholders that they are competing both on the field and also in the market to be the best in their class. When it comes to admissions, adding a football team can attract students who would not otherwise consider a college without football. Such students would support an institution not only via tuition payments, but also by legitimizing this institution to friends, siblings, and other younger prospective students. Selective institutions make admissions decisions based not only on grades and test scores, but on a wide variety of other factors, such as students’ family background, experiences, and even their propensity to attract future students from desirable schools and regions (Ehrenberg, 2002; Killgore, 2009).

The question, pragmatically, is whether using this strategy actually results in access to new resource providers in the way that colleges claim. If football teams are in fact benefiting universities, evidence should be found in measures of institutional growth, controlling for other institutional decisions and exogenous factors affecting enrollment and institutional finances. We pursue such evidence by addressing the following research questions:

RQ1: Does adding a football team diversify enrollment compared to peer institutions that never added the sport?

Given that football teams often have rosters of well over 100 players, a college might hope to attract a net of 100 additional students. However, potential students have been shown to be attracted to “consumption amenities” offered by campus, including intercollegiate athletics (Jaquette et al., 2016; Mixon & Hseng, 2014). Male and female students alike may be attracted to a campus with six or seven large sporting events and the attendant social scene on fall weekend. However, the college may have to divert recruiting efforts and financial aid toward those players, potentially diminishing opportunities to attract other students and thus mitigating the enrollment benefits of football.

RQ1a: Does adding a football team increase male enrollment?

RQ1b: Does adding a football team increase the enrollment of Black students?

With the exception of a small number of place kickers over the years, football players are male, so football could present an opportunity to increase the number of men on campuses where enrollments skew female. (Florida Institute of Technology is the only college in our sample that added football while having more male than female undergraduates.) As noted earlier, a significant number of college football players are Black, so adding a team could enable a college to recruit a more-diverse student body (Dougherty & Dougherty, 2018; Mullins & Teodorescu, 2020). However, the sport also could displace other efforts to attract diverse students, and the mobility of athletes may enable those who are not getting enough playing time or otherwise not having an ideal experience may transfer to another institution.

RQ2: Does adding a football team increase net tuition and fee revenue compared to institutions that never added the sport?

Football is tremendously expensive, given the high roster count; the extensive equipment used; the large numbers of coaches and support staff needed; and the significant capital budget needed for stadiums, locker and weight rooms, and associated facilities. As noted earlier, most of the colleges adding football are not generating significant revenue from ticket sales or television contracts. So if the sport is to be worth the investment, it has to attract students (be they athletes or fans) who pay enough in tuition to offset the cost of the sport (Docking).Footnote 1

Methods

Data

Data on institutions’ athletics offerings were provided by the NCAA, and data on institutional characteristics were from the Integrated Postsecondary Education Data System (IPEDS). We constructed an institution-level panel dataset for the years 2001–2002, the first year for which some of our dependent variables were available in IPEDS, through 2017–2018, the last year in the NCAA’s dataset, cleaning missing or outlier data through cross-referencing. We used information from the Office of Postsecondary Education to reconcile missing or outlier athletics data. (U.S. Department of Education Office of Postsecondary Education, n.d.).

The dependent variables of interest to answer our research questions are first-year enrollment, enrollment of first-year male students, enrollment of students identifying as Black/African American, and tuition and fee revenue (net of discounts and allowances) per full-time equivalent student.Footnote 2 We defined the independent variable, football adoption, as the year in which an institution began competition in football as recognized by the NCAA (see Table 1 for information on adopters).

Table 1 Institutions adopting football, 2004–2016 (N = 36)

We opted to use a difference-in-difference approach to assess how football adoption affected key variables of interest at colleges in comparison to control groups of institutions that never added the sport. Literature on difference-in-difference models encourages the publication of models without covariates (Baker et al., 2021). Following with this approach, tables and figures that follow will be shown without covariates. We selected our observable time-varying control variables based on prior studies of the impact of successful football and basketball seasons (usually among NCAA Division I institutions) on institutional admissions processes (Bremmer & Kesselring, 1993; Chressanthis & Grimes, 1993; McCormick & Tinsley, 1990; Murphy & Trandel, 1994; Pope & Pope, 2009; Tucker, 2004), and on models of college choice (e.g., Perna, 2006; Toutkoushian & Paulsen, 2016). These variables are total enrollment, admission rate, the 25th percentile of combined SAT/ACT score, published in-state tuition and fees, and percent of total enrollment that are enrolled at the graduate level (see Table 2 for information on variables).

Table 2 Variable descriptions

Sample Selection

To ensure consistency in data reporting, we included only colleges that belonged to the NCAA—and remained in the same division—for the entire duration of this time period (2002 to 2018). For example, Hendrix College is in our sample, having joined the NCAA in 1991 and added football in 2015. However, the Warriors’ conference rival Berry College is not, because Berry was a member of the National Association of Intercollegiate Athletics (NAIA) until 2010. In all, we excluded eight institutions that, like Berry, added football but were not NCAA members for the entire 17-year observation window as well as one institution (Lincoln University in Pennsylvania) that changed division. Following common practice for conducting difference-in-difference analysis (e.g., Ortagus & Hu, 2019, 2020; St. Clair & Cook, 2015), we also included only colleges that added football between 2004 and 2016 to ensure at least 2 years of pre- and post-adoption data. This led us to exclude three institutions (e.g., Florida International University) that otherwise met criteria for inclusion. Additionally, we excluded four institutions that re-introduced football after having discontinued offering it during the observation window (e.g., University of Alabama at Birmingham). We also dropped institutions missing data on the dependent or independent variables of interest (accounting for approximately two potential treatment group members and 123 potential control group members). Institutions that are part of athletics consortia such as Columbia University and Barnard College, two-year or technical institutions, women’s colleges, for-profit colleges, or special-focus colleges were also outside the scope of study and excluded. The final sample of football adopters consists of 36 NCAA members that added football during the specified time-period (see Table 1 for list of institutions). Overall, our data is a panel dataset of 308 institutions in each of 17 years, resulting in a total of 5236 observations.’

Empirical Strategy

Within the context of a panel dataset and quasi-experimental design, the difference-in-differences (DiD) design can identify average treatment effect by comparing the outcomes of two groups (treated and untreated) in two time periods (pre- and post-treatment). Assuming that without experiencing the treatment, these two groups would have continued along a similar trend based on the parallel trends assumption being met in the pre-period, the average treatment effect on the treated group (ATT) is the difference between the outcomes of the two groups in the post-period (Cunningham, 2021; Murnane & Willet, 2011). This approach is complicated when treatment does not occur in a single period. The two-way fixed effects (TWFE) approach with year dummy variables allows for accounting for the differences in time, but recent work by Goodman-Bacon (2021) identified several problems applying TWFE in cases of multiple periods of treatment. Specifically, Goodman-Bacon (2021) found that using TWFE in such circumstances suggests applying TWFE in cases of multiple periods of treatment leads to biased results due to differential timing, requires additional weights for each group’s parallel trends assumptions, and overweights treated observations who experience treatment at the middle of the panel. If a standard DiD is a 2 × 2 matrix of two groups in two periods, having multiple treatment periods requires each instance of treatment to have its own 2 × 2 matrix.

This led us to using Callaway and Sant’Anna’s (2021) approach to minimize bias through creating a group-time ATT. Using our sample as an example, LaGrange College and the University of North Carolina at Pembroke both adopted a football program in 2007, so they are the 2007 group or cohort. The institutions adopting in 2009 comprise the 2009 cohort. The group-time ATT offers an ATT parameter for each year for every group, so meaning there is an ATT for the 2007 group in 2008, 2009, and so on. The equation for the group-time ATT for each group in each time is expressed as:

$$ATT\left(g,t\right)=E\left[{Y}_{i,t}\left(g\right)- {Y}_{i,t}\left(0\right)|{G}_{g}=1\right]$$

where g reflects a group in year t which is any year after g (such that t > g always), \({G}_{g}\) is a dummy variable equal to one if the unit is in treatment time group \(g\), \({Y}_{i,t}(g)\) is the outcome variable at time t for treated units, and \({Y}_{i,t}(0)\) is the potential outcome for those units had they not been treated.

Callaway and Sant’Anna (2021) offer three primary functions estimating and aggregating group-time ATT: outcome regression (OR), inverse probability weighting (IPW), or doubly robust (DR). The OR approach requires correct modeling for establishing conditional parallel trends while IPW requires correct modeling of propensity for treatment. The DR approach, following the method proposed by Sant’Anna and Zhao (2020), combines bother the OR and IPW requiring that only one of the two functions requirements are met, not necessarily both. The DR approach estimates a propensity score in the first stage followed by a weighted least-squares regression to estimate the outcome regression. This provides additional robustness and is efficient in case of misspecification. The estimation of group-time ATT using the doubly robust approach (hence the DR subscript) is expressed as:

$${\widehat{ATT}}_{dr}^{ny}\left(g,t;\delta \right)= {E}_{n}[({\widehat{w}}_{g}^{treat}-{\widehat{w}}_{g}^{comp,ny})({Y}_{t}-{Y}_{g-\delta -1}-{\widehat{m}}_{g,t,\delta }^{ny}(X;{\widehat{\beta }}_{g,t,\delta }^{ny}))]$$

where \(\widehat{m}\) is a term acting as a control for parallel trends and the \(ny\) superscript refers to our decision to define our control group as units not yet treated. A complication of staggered treatment or multiple time periods of adoption is that there is not a singular control group as the group of untreated units varies by time. Callaway & Sant’Anna (2021) offer two approaches to this issue. One option is to use those who are never treated. However, depending on sample size, this could lead to a small number of control observations, especially among later years in the panel. The second option, represented by the \(ny\) in the equation above, is to use those who are not treated yet. In the case of our study, we have opted to use the control group of units not yet treated to maximize the sample size of our study. In the discussion of robustness checks that follows, we do run our models using both approaches finding no substantial differences.

A primary assumption in any DiD design is that the control and treated units were following a similar trend prior to treatment (Furquim et al., 2020). To assess the parallel trends assumption, we view pre-treatment outcomes in the dynamic event study design (see Figs. 2, 3, 4, 5). Assuming parallel trends in the pre-treatment period, the coefficients should center around zero. The Callaway and Sant’Anna procedure can also test parallel trends by including pre-treatment covariates when creating the matches between treatment and control units. This is only necessary if the parallel trends assumption holds conditional on covariates. However, we observe reasonably flat unconditional pre-trends for each of our outcomes. As a robustness check, we ran the dynamic event study models with covariates and our results are similar (see Figs. 8, 9, 10, 11 in Appendix).

In the results section to follow, we focus on different conditions of aggregated and disaggregated ATT estimates. Specifically, we look at (1) the aggregation of group-time, dynamic time, and group ATT; (2) the disaggregated dynamic time ATT; (3) the disaggregated group ATT. The group-time estimates have been previously discussed as providing the effects based on time of group assignment. However, our sample has relatively small group sizes (less than five institutions in most groups), so the \(ATT(g,t)\) should be cautiously interpreted. The alternative approach is to focus on the results from the group and dynamic analyses, where effective sample size is the total number of ever treated units, which in our analysis is sufficient. The group ATT is the average effect for each group g across all post-treatment times that group g experienced. This can highlight differences between groups. The dynamic time ATT is by the length of exposure to treatment (1 year after treatment, 2 years after treatment, and so on) and then averaged for each period t before and after treatment. This illustrates how the effects of treatment may vary the longer that treatment is in place. All analyses were performed using R Version 4.2.1 and the “DID” R package created by Callaway and Sant’Anna (2021).

Robustness Checks

Following Callaway and Sant’Anna among others pioneering these methods, we conducted a number of robustness checks.The first robustness analysis considered the alternative control group specification provided by Callaway & Sant’Anna (2021) where only institutions who are never treated, rather than not yet treated, are included. Given this approach adds, at the most, 36 observations into the earliest years of the panel dataset and presents no change in the later years of the panel dataset, we do not see any meaningful change in the trends relative to the “not yet treated” approach. Given the lack of impact, we opt to offer the largest sample size available and maintain the not yet treated approach as previously discussed.

The next set of robustness analyses applied sampling criteria in selecting control groups that may provide a comparison group that may have been more contextually similar to those in the treatment. First, we limited the control group to institutions located within the southeast region as 22 of the 36 treated institutions(61%) are found in this part of the U.S. (Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Virginia, and West Virginia). We then re-ran the analysis with another control group restricted to institutions located in a state with at least one “treated” institution over the course of the sample. Willis (2015) used a similar strategy to compare institutions elevating athletic programs from the NCAA Division I Football Championship Subdivision to the Football Bowl Subdivision, checking institutions against regional peers as a means of establishing robustness for a national sample of comparators. Analyses were then conducted with and without control variables for each of the control group definitions. Models also were run using different forms of the dependent variables in case the logged approach unduly led to increased or reduced variation. The male student and Black student enrollment variables were run as percentages (percent of first-year students who are male and percent of first-time degree- or certificate-seeking undergraduate students who are identified as Black/African American). For the tuition variable, which is per FTE in the main analysis, we ran this as simply the log of revenue from tuition and fees.

Outside of the Callaway and Sant’Anna approach, we conducted our analysis using a traditional DiD with TWFE framework. We completed this analysis for each of our dependent variables and repeated the process to lag the dependent variable by 1, 2, and 3 years to assess anticipatory effects. None of these other approaches led to changes in direction of coefficients or significance levels, leading us to present our results from the Callaway and Sant’Anna approach using the full pool of potential not treated yet control units.

In staggered difference-in-difference estimates, some authors suggest using not-yet-treated units of analysis as an alternate control group to never-treated (e.g., Baker et al., 2021; Callaway et al., 2021. The assumption underlying this recommendation is that that such units are more likely to be similar to units that actually do receive the treatment and thus stronger candidates under the parallel-trends assumption. In our analysis, the number of colleges that start football teams in later time periods than our early candidates is so small that significant results cannot be obtained, so we opted not to use this approach.

Limitations

Readers should consider several limitations when interpreting this report’s findings. First, although DD estimation can identify causal effects under certain conditions, our estimates of football are unlikely to be causal. Interpretation as causal hinges on the assumption that, had they not adopted football, adopters would have experienced the same change in the dependent variable as non-adopters (net of fixed effects, institution-specific trends, and control variables). It is unlikely that our analysis met this stringent assumption even with our comprehensive set of controls. Nevertheless, our approach accounts for a large number of differences between adopters and non-adopters, resulting in some of the best possible estimates of the relationship between football adoption and the dependent variables. Further, we report here the most statistically conservative estimates from our main models.

It is also important to note that the impacts of adopting football likely vary by campus context. Difference-in-difference analysis only captures average effects, and the effects at individual institutions may be above or below average. Likewise, we were unable to capture variation in the ways in which institutions implemented their football programs, such as the duration of the time they planned for launching teams, the size of the staff they hired, marketing efforts to attract new students, and other factors. By defining football adoption as the year in which the college reported a total of football players to the NCAA, we cannot gauge the impact of a college announcing a team a year or 2 years earlier. Along this line of thinking, college campuses do not reflect a policy vacuum where single changes are made and reviewed prior to additional changes. While our methodological approach can control for the year over year changes at a given institution, we have no way of isolating or controlling for the vast number of small and large shifts that happen each year. This means that multiple changes taking place at a given institution in that same year will have been lumped together. For instance, we do not account for if an institution adopted football in 2010 and also implemented another strategy such as a no-loan policy, a guaranteed admissions approach, or a brand-new residence hall. The likely cumulative shift associated with multiple enrollment-oriented changes would not be disaggregated. This again makes the case for a need to dig deeper into these specific contexts to not only understand the implementation of the football program but to also explore the greater institutional context surrounding the decision.

Findings

Descriptive Analysis

Table 3 summarizes each variable by its treatment status. This also assesses significant differences between our two groups using a series of t-tests along each of the dimensions included in our models. The t-tests suggest the treatment group enrolls significantly more male and Black students, generates less tuition revenue, has a lower standardized exam threshold for admission, and is relatively more undergraduate oriented. Because roughly two-thirds of the institutions in our treatment group are in the southeast U.S., this descriptive analysis was repeated with a control limited to institutions in the southeast (see Table 5 in Appendix).

Table 3 Characteristics of sample by football adopter status, full

Adoption in our study is staggered, meaning that varying numbers of institutions adopt football in different years throughout our sample. In each year that an institution moves to the treatment group, the institution or institutions doing so can be considered a cohort with unique characteristics to be considered. Figure 1 visualizes the pattern of adoption captured in our sample. Cohorts consist of one to six institutions with three being the mean (median and mode = 2).

Fig. 1
figure 1

Pattern of football adoption among adopters

Difference-in-Differences Analysis

The three primary analyses focusing on average treatment for the treated group show that in the long term, football adoption does not seem to demonstrate long-term effects on many of our dependent variables. Focusing on the dynamic event study to compare each football adopted against peers that did not add football, we discuss each variable in more depth, but a common theme quickly emerges: adding football can be associated with a short-term spike in the variable of interest, but over time, those effects fade into insignificance (Table 4).

Table 4 ATT from simple aggregation (group-time, time, group)

Enrollment

Addressing RQ1, none of the ATT methods chosen demonstrate any long-term effect of adding football on total enrollment. The dynamic event-study depicted in Fig. 2 shows a statistically significant spike in the year before adding football and in the year football was adopted. However, statistical significance disappears after the first full year following the adoption of football.

Fig. 2
figure 2

Dynamic event study, ATT log enrollment

Male Enrollment

Most of that early enrollment growth appears to consist primarily of male students, which is unsurprising given the size of football rosters. As Fig. 3 shows, the growth happens the year before football launches and continues for a single year, but over the long term, adding football does not enable colleges to enroll more men than peers that did not add the sport at a level of statistical significance.

Fig. 3
figure 3

Dynamic event study, ATT log male enrollment

Black Enrollment

The question of whether adding football would attract more Black students is complicated by the fact that many teams at colleges outside Division I tend to have more white players than at Football Bowl Subdivision institutions. Across all of Division I, 49% of football players in 2019 were Black, compared with 46% in Division II and 23% in Division III, according to the NCAA (National Collegiate Athletic Association, 2021). Figure 4 appears to show no significant growth for Black enrollment following football adoption compared to peers. Unlike with total enrollment, we do not see even a single-year spike of statistical significance.

Fig. 4
figure 4

Dynamic event study, ATT log black enrollment

Tuition and Fee Revenue

For colleges, the number of students is important, but so is the revenue they bring in from tuition (and other fees). This is particularly important for private colleges, which negotiate tuition payments with students by taking discounts off a “sticker price” that few students pay (e.g., Goldrick-Rab & Koble, 2016). To address RQ2, we did not look at sticker price but instead at tuition and fee revenue net of financial aid awarded per full-time equivalent (FTE) student. Here, too, football adoption does not appear to have a significant effect on tuition and fee revenue compared to peers, as shown in Fig. 5.

Fig. 5
figure 5

Dynamic event study, ATT tuition per FTE

General Findings

When controlling for exogenous variables and institution-specific trends, the impact of football appears to be concentrated in the year that colleges added the team. Subsequently, it simply fades out. This would appear to make the promised gains of football evanescent at best. As Fig. 6 shows, the average change experienced by institutions in our sample across all years collected for our panel dataset, 2002–2018. The change experienced by institutions who adopt a football program during this period seems relatively similar to the average change among control institutions. The vertical lines represent the minimum and maximum change values within the treatment group. There is considerable variation among these 36 institutions with the greatest variation being in the log of Black enrollment. This likely reflects the relatively small number of Black students enrolled in our treatment institutions allowing small magnitude changes to be shown as substantial percent changes.

Fig. 6
figure 6

Average change in dependent variable measures by group, 2002–2018

Additionally, Fig. 7 expands our descriptive understanding of the variation experienced by institutions in the treatment group of our study. This figure reports the average change among treated institutions in the period from the year they adopted a football program through the end of our study in 2018. The average change values do not descriptively seem all that different from those displayed in the prior figure which calculated change from the beginning of our sample period (2002). Again, variation is largest among the log of Black enrolled students though the range has decreased when the starting point is the year of adoption rather than 2002.

Fig. 7
figure 7

Average percent change in dependent variable measures among treated institutions, year of adoption—2018

Discussion

In summary, a difference-in-difference analysis suggests that colleges adding football programs see an immediate increase in applications and enrollment, particularly male enrollment, when they launch football teams. This methodology allows us to compare football-starting colleges with others that never added the sport to better assess the effects of football on institutional metrics. The analysis suggests more muted effects on diversifying enrollment, and found no impact on tuition and fee revenue.

Our results represent a novel contribution to the literature on the effects of football on institutional enrollments, diversity, and revenues. Prior research has most often focused on the institutional effects of success on the field (e.g., Murphy & Trandel, 1994; Pope & Pope, 2009; Smith, 2019; Toma & Cross, 1998; Tucker, 2004, 2005) or moves between divisions (e.g., Jones, 2014; Roy et al., 2008). We focus instead on the effects of sponsoring football, compared to not sponsoring football, which represents an important strategic institutional choice. Additionally, although we caution against interpreting our findings as strictly causal, our robust models and multiple robustness checks enable us to come as close as possible to causal estimates given the limitations of observational data. The few previous studies focused on the effects of adopting football have been descriptive or correlational only (Feezell, 2009; Mullins & Teodorescu, 2020).

Our findings do not necessarily imply that football-adopting colleges return to baseline enrollments or shrink from prior levels. Instead, it may well be that a football-adopting school begins drawing from larger pools of potential students than before, thus diversifying its key sources of students. As noted earlier, moving in such a direction reflects a central tenet of Pfeffer and Salancik’s (2003) resource-dependence theory: By diversifying resource providers, institutions can reduce their reliance on particular providers and cushion themselves against exogenous disruptions.

Smaller colleges in the near future may need such diversification in the near future. s. A growing number of forecasts suggest that the national college-going population is likely to fall; one estimate, by Nathan Grawe at Carleton College (2022), foresees a nationwide decline of 15 percent between 2025 and 2029, with smaller colleges especially vulnerable to such declines.

Directions for Future Research

This study could be extended in several meaningful directions. The obvious corollary to assessing the effects of football is to assess the effects of dropping football. However, with only 18 NCAA members in our sample dropping the sport during this period, it is difficult to design a quantitative study to examine this issue. Case studies might be a more appropriate strategy.

Also, football is not the only sport colleges add to reach more students. Many colleges across the country in recent years have added lacrosse and field hockey to attract affluent students from the Northeast and Mid-Atlantic regions. Those sports have much smaller rosters than football teams usually do, and it can be difficult to assess their overall impact on enrollment. But many colleges are adding such sports to attract students from outside their traditional recruiting pools (Docking & Curton, 2015; Hearn et al., 2018; Sander, 2008). Also, colleges adding football may also add women’s sports, attracting more students and incurring somewhat more expenses. Future studies should assess the impact of adding football on Title IX compliance.

Finally, to address the question of average treatment effects and their inability to capture effects at individual institutions, outliers with stronger or weaker results should be studied to understand why those colleges were particularly successful or unsuccessful.

Challenges Facing Football

Two key issues will face all colleges sponsoring football teams in upcoming years. The first, which is not unique to football, is whether institutions are doing enough to make those sports safe, especially in terms of head trauma. Research findings based on studies of deceased football players with significant brain injuries have filtered into the public consciousness, and people are more aware of the long-term dangers posed by the concussions and subconcussive hits suffered by football players (and athletes in other sports) (Guskiewicz, et al., 2003; Omalu et al., 2005; McCrea et al., 2015; Zuckerman, et al., 2015). Whittier College officials noted the potential for head injuries as a justification for dropping football in 2022 (Whittier College, 2022).

The second, and related, issue is whether football will continue to attract enough players with the desire and ability to play at the college level. Nationally, the number of athletes participating in high school football has declined in recent years, from 1.1 million in 2008 to 974,000 in 2021 (NFHS News, 2022). Concerns about head injuries as well as overall population declines among American students approaching college age are reducing demand for the sport at the high school level. At the college level, the overall number of NCAA football players increased by 7% from 2011–2012 to 2019–2020, but declining numbers of white and Asian players offset gains by Black, Hispanic, international, and multiracial students (NCAA, n.d.) A few have ended seasons early due to declining rosters and mounting injuries (Wharton, 2017).

Conclusions

Football is not a pigskin panacea for colleges and universities. The health risks of competing in the sport are real and may well increase. These will create significant moral questions for college leaders to consider if they anticipate starting teams or resuming competition in the face of declining enrollments. The sport has become so interwoven with the fabric of American campus life over the past century and a half, however, that it is hard to imagine the sport disappearing from the landscape anytime soon.