1 Introduction

Two-sided matching models help us understand how economic agents find each other. Whether it is medical schools and medical students in a scramble for residency programs or top rated high school football players and college football programs that sign letters of intent, both parties are searching for optimal matches. In the National Collegiate Athletic Association (NCAA), these matching outcomes can have far-reaching effects on regular season wins, championships, and revenues.

Several sports studies examine the matching process from the athlete’s perspective. This study focuses on the football programs’ choices. Using panel data, it introduces a negative binomial count model of the top 100 football players in Division I (DI) and the factors that may contribute to programs’ signing a larger (or smaller) share of these top-quality high school athletes.

The college football recruiting process is examined from the athlete’s perspective by DuMond et al. (2008). They review the broader literature that addresses college choices by non-athletes. However, there is a noticeable gap in the literature as to the process from the college football program’s perspective.Footnote 1 An interesting study on the market for football coaches by Brown et al. (2007) reveals that good matches improve winning percentages. Berri et al. (2011) consider factors that influence the NBA amateur draft where teams select players. Similarly, Harris and Berri (2015) examine factors that influence the WNBA draft. While the NBA recruits players directly from high school, the WNBA rarely does so. Two-sided matching models of medical interns and hospitals and college students and schools have been considered by Roth (1984) and Roth and Sotomayor (1992). Still, no paper that I have found investigates the college-athlete matching process from the college’s perspective.

The theoretical framework is straightforward: Football programs are organizations that produce wins.Footnote 2 Student-athlete labor is an input in the wins production technology. Because the number of scholarships is fixed by the NCAA and programs are not allowed to bid for athletes using wages, schools compete for the best-quality athletes through the use of non-price competition.

The empirical approach is similar to DuMond et al. (2008) and Harris and Berri (2015). Findings from the count model align with some of the results from the DuMond et al. (2008) study but also contrast in intriguing ways. For example, DuMond et al. report that athletes are more likely to sign with championship programs that consistently win their conference titles, have not been involved in NCAA enforcement actions, and are located close to the athlete’s home state.

I find that DI football programs successfully recruit a larger portion of the “Rivals Top 100” high school players when the programs have a higher number of conference championships, have earned a bowl championship, and belong to the SEC. However, in stark contrast to DuMond et al., I also find that NCAA infractions during the sample period are associated with a larger share of top-quality recruits. This result makes economic sense in light of the extended literature on cheating in the NCAA.

Before I summarize the data, empirical approach, and the results, I present a brief review of the related literatures.

2 Literature Review

There are at least three strands of literature to consider: one that describes the nature and behavior of the NCAA; one that is devoted to two-side matching models; and the niche literature that deals with the college football market. The research in the present paper is closest to the college football literature, with a nod to the broad behavior of the NCAA. Key features of the most similar work are summarized here.

Amateur status is a key component of the NCAA’s labor market power. Kahn (2007) explored this link and concluded that college programs extract rents from revenue-producing student-athletes by limiting their pay and requiring amateur status. Monopsony rents that are earned by the cartel from this arrangement are sizeable: Estimates made by Brown (1993, 1994) and Brown and Jewell (2004, 2006) range from $500,000 to $1,000,000.Footnote 3

Due to the large number of transactions that are involved on the input side of the NCAA’s business, the cartel tends to monitor outputs to decide whether an institution cheats on the agreement. On-field performance is the output measure of choice. Output monitoring behavior is predicted and documented by Fleisher et al. (1988, 1992), and Humphreys and Ruseski (2009).

If cheating is discovered enforcement actions are taken. These actions affect the competitive balance of the organization. Depken and Wilson (2006) report that the greater is the level of enforcement in a conference, the better is the competitive balance. However, they also find that as punishments increase in severity competitive balance erodes. Using only observable variables that are available to all cartel members, Humphreys and Ruseski (2009) predict instances of cheating detection and punishment with reasonable success. The results reinforce earlier findings about enforcement behavior and suggest that the stability of the cartel is important to its members.

These papers all support the notion that crime pays in the NCAA. Schools that break NCAA recruiting rules have much to gain by doing so. I incorporate a school’s violation status to control for this effect and indirectly test whether NCAA crime pays through the cheating schools’ achieving a larger share of the top 100 athletes each year.

It is not just athletic programs at large that benefit from on-field success. Coaches benefit as well. Brown et al. (2007) test the value of good matches between football coaches and teams. They estimate team winning percentages over 35 years of coach-team matches with the use of a generalized least squares approach. Their results suggest that good matches are associated with approximately a 5% increase in winning percentages. I include controls for the number of conference championships that a program has earned as well as national championships. If better coach-team matches increase win percentages, then they probably increase championship ranks as well.

Albrecht and Vroman (2002) explore the variation in demand for high-skilled versus low-skilled labor. Their game theoretic model has two pure-strategy equilibria. Both predict a rise for high-skill workers and a decrease for the less-skilled. They also examine the impact of increasing the supply of low skill workers. My model includes controls for player positions, height, weight, and the “Rivals.com” five-star ranking. Depending on the maturity of their starting line-ups, schools may have different needs for player positions each recruiting year. Therefore, I expect the colleges to compete more fiercely for players in high-demand positions. As a result, the share of high-demand players that go to any single program in a recruiting year should be smaller than the share of low-demand players.

The football recruiting process is a vestige of the NCAA’s market power. DuMond et al. (2008) conclude that students tend to sort themselves in accord with the two-sided matching literature: Better students seek out better schools, and vice versa. With respect to athletes, this means that better-quality athletes tend to seek higher-quality schools (and vice versa). The strongest signals of quality from the school are past on-field performance and membership in the six largest conferences (these include the ACC, Big 10, Big 12, Big East, Pac 10, and the SEC, which constituted the Bowl Championship Subdivision for the years 1998–2013). Not surprisingly (given the broader school choice research), athletes choose schools that are closer to their home state. In that study, athletes are about 10% less likely to choose a school that is on NCAA probation or rumored to be soon. Those results are most closely related to my analysis.

3 The Model

Football programs are organizations that are engaged in maximizing wins subject to physical, ethical, and budget constraints. One of the key inputs for wins is student-athlete labor. Standard theory predicts that wins will be maximized when each of the inputs—including athlete labor—is hired up to the point where the marginal revenue product is just equal to the price paid for the input.

The market for athlete labor is unusual in that athletes are prohibited by NCAA rules from being paid wages (one of the ethical constraints). Athletes can only receive full or partial scholarships to attend the host program’s school. Thus, these firms use non-price methods of “payment” for the labor inputs. Since the number of athletes that are recruited in the labor market each year is also limited by NCAA rules, the programs have an incentive to recruit the highest quality athlete labor that they can, given their resources. Put differently, schools want to recruit athletes with the highest possible marginal products of labor.

Of course, as is discussed in DuMond et al. (2008), the athlete must agree to be “hired” by the school. This two-sided matching process is dynamic and can occur over a period of weeks or months; schools may extend an early recruiting offer or may wait until later in the matching process. Athletes may sign early or wait until the last moment to commit to a program.

The number of high-quality athletes that the program acquires will be based, in part, on school specific effects, the relative demands for different types of players by position, and other player characteristics, as well as player preferences (such as being in the same state as their family home). I specify a reduced-form model of the school’s recruiting choice in the following way:

$$ {\text{SHARE}}_{{{\text{jk1}}\sim{\text{n,t}}}} \, = \,\left( {\Psi _{{{\text{k1}}\sim{\text{n,t}}}} , \,\Omega _{{{\text{jt}},}} \Gamma _{\text{jt}} } \right) $$
(1)

Equation one says that college (j) successfully recruits a share of athletes (k1 to n) in year (t) dependent on: athlete-specific characteristics, Ψk1~n,t; college specific qualities, Ω jt; and conference affiliation, championships, and other college-athlete matching characteristics, Γjt.

4 Data and Method

The data set includes the top ranked 100 athletes from www.rivals.com, the colleges that signed them, and the characteristics of both the players and colleges for the period 2012–2016.Footnote 4 Descriptive statistics of key variables are highlighted in Table 1. Although the original data contains observations for 500 athletes, the effective sample is 168 observations: Football programs may recruit one or twelve (or more) athletes in a year. For this reason, there are not 500 independent observations in the study.

Table 1 Descriptive statistics of key variables n = 500

SHARE is the number of athletes (out of the top 100 athletes in the recruiting class) a school signed in the given year. These college-athlete matches come from the rivals.com website. The data from rivals.com also included the athletes’ ranking (RANK from 1 to 100), height, weight, high school, hometown and state. CHAMP is the number of conference championships the football program has won. These counts were obtained from school websites and cross-referenced with sports-reference.com reports. DSEC is a dummy variable equal to one if the college belongs to the Southeastern conference and zero if otherwise; about one-third of the programs in the dataset belong to the SEC. Conference affiliations were obtained from school websites and sports-reference.com. DV is a dummy variable equal to one if the college was on probation, experienced a violation, or had come off a disciplinary action within the five-year period and zero otherwise. Dchamp is a control for bowl championships; if the program was a bowl winner during the sample, the variable is equal to one and zero otherwise. DSTATE is a dummy variable equal to one if the college signed a recruit from within the home state. For example, if Florida State University signs a running back from Orlando, FL, then DSTATE is equal to one. Figure 1 shows the five states with the largest number of recruited athletes during the sample period.

Fig. 1
figure 1

The total number of top 100 players from each state

Almost half of the college-athlete matches are between athletes and schools in the same state.

STATEqual is the difference between the percentage of recruited athletes in a cohort year from the school’s home state and percentage of athletes already on the roster from the school’s home state. On average, the recruited cohort has 6% fewer players from the school’s home state relative to the school’s existing roster. HTqual and WTqual are formed the same way. Recruited cohorts are slightly taller than the existing team but about eight pounds lighter than the roster athletes. Since the observations are grouped by university, these three variables allow the player characteristics to enter the model as a relative quality index.

Controls are used for player position.Footnote 5 The categories are: athlete, defensive back, defensive tackle, defensive end, linebacker, offensive lineman, quarterback, running back, tight end, and wide receiver. Of the 500 athletes in the study, about 16% are defensive backs; a little over 14% are wide receivers; 14% are offensive linemen; 13% are defensive ends; and 10% are running backs. Quarterbacks represent only 7.6% of the sample. The remaining positions range between 3 and 8% of the sample. DuMond et al. (2008) report similar distributions in their study.

In general, recruited football players are tall: six foot or above. Weight varies more and is correlated with position played. Offensive linemen and defensive tackles tend to be the heaviest recruited players. Defensive backs and wide receivers are lighter, as expected. The literature suggests that physical attributes (similar to height for basketball players) are significant when decision-makers build their rosters. I expect, other things the same, that college programs might prefer “big” players over smaller players.

5 Empirical Strategy and Results

I use a maximum likelihood estimator to regress the number of high-quality athletes each college recruits in a given year on college program fixed effects, athlete characteristics, and interacted college-athlete effects. Because the observations on SHARE are counts of the number of top 100 athletes that the school successfully signs each year, a negative binomial distribution is used to model the data with over-dispersion.

The main specification is

$$ {\text{SHARE}}_{{{\text{jk1}} - {\text{n}},t}} \, = \,\upbeta1\, + \,\upbeta2\,\left( {\text{RANK}} \right)_{{{\text{k1}} - {\text{n,t}}}} \, + \,\upbeta3\,\left( {\text{Dposition}} \right)_{{{\text{k}}1 - {\text{n,t}}}} \, + \,\upbeta4\left( {\text{Ht}} \right)_{{{\text{k}}1 - {\text{n,t}}}} \, + \,\upbeta5\left( {\text{Wt}} \right)_{{{\text{k}}1 - {\text{n,t}}}} \, + \,\upmu{\text{Z}},_{\text{jt}} \, + \,\upeta \uptheta _{{{\text{jk}}1 - ,{\text{t}}}} \, + \,{\text{e}}_{{{\text{jk}}1 - {\text{n,t}}}} . $$
(2)

Equation (2) tests whether the athlete-specific characteristics (the variables with the beta coefficients), the college-specific characteristics (the vector Zjt), and some combination of interacted effects (the vector θjk1–n,t) have any influence on the number of high-quality athletes that a college program signs in a year. The main specification is estimated using both the negative binomial and Poisson distribution, with OLS as a robustness check. In addition, the economic significance of key marginal effects is examined.

The estimated marginal effects from the main specification are stable across the three models. No signs switch, and most are significant at the 5% level or above. To clarify, a positive sign on a coefficient means a positive change in the regressor and results in a larger number (or share) of the top 100 athletes’ being recruited by the school and a negative sign means a smaller share of the top 100 athletes would be recruited by the program. By this reasoning, a negative sign on a coefficient could indicate stronger competition between the schools for athletes with that attribute (since greater recruiting resources would be needed for such athletes, with fewer resources that would be left over to recruit other top athletes); but it could also indicate athletes’ reduced disposition to sign with such schools.

One surprising feature of the results is that neither height nor weight seems to affect the colleges’ recruiting outcomes. This might be true for several reasons: First, there is not a great deal of dispersion around the mean height for players in the sample. Second, teenage males who graduate from high school may not have reached their maximum physical development, and college conditioning coaches are good at bulking up players once they arrive. Finally, the signal from the RANK variable may be dominant. College programs may learn all they need to know about the athlete’s physical ability from that ranking alone.

Another striking result is that the in-state residence of an athlete does not increase the share of top athletes a school recruits after other influences have been taken into account. If the percentage of top 100 athletes in the cohort from the school’s home state is greater than that on the existing roster, the share of top 100 athletes signed decreases. Even though almost half the recruits in the sample end up playing for programs in their home state, this attribute alone does not seem to positively influence the number of top athletes that are signed by the schools. The sign on RANK makes intuitive sense: As the rank number increases (say from 25th in the class to 50th in the class), it is more likely that colleges can sign multiple athletes out of the pool. Conversely, if a college signs the number one athlete, the college is likely to have to expend more resources to do so and, therefore, will have a smaller overall share of the total top 100 class. As DuMond et al. (2008) suggest, a reputation as a Bowl champion and the overall number of conference championships won by the program both increase the share of athletes that are recruited out of the top 100. If a school does any of the above and belongs to the SEC, the likelihood they will have more top 100 recruits more than doubles (Table 2).

Table 2 Estimation results main model dep. variable = SHARE n = 168

The sign and relative magnitude of the marginal effect from cheating (DV) is notable. In fact, as Table 3 shows, this variable has the largest “economic” impact when I factor in standard errors. As the literature reveals, breaking NCAA rules to attract better quality players is a rational economic strategy for DI football programs—especially those in BCS eligible conferences. This result supports the idea that crime pays in the NCAA. It also stands in contrast to the results DuMond et al. (2008) report. High school athletes may not prefer a school on probation or rumored to be soon, but the data here suggest those schools nevertheless end up with more of the top-ranked athletes. My current approach does not offer any insight into the precise mechanism or timing of the violation behavior and its influence on successful recruiting. But, it does indicate that the relationship exists.Footnote 6

Table 3 Economic impact of key variables on SHARE

Of all the positions, running backs from Alabama have the greatest negative coefficient, followed by for defensive backs from Alabama, and offensive linemen from Florida: During the period 2012–2016, college programs that signed players in these positions from these states were less likely to have a large share of the top 100. A possible explanation is that the bidding competition for such players was fierce, which meant that large amounts of recruiting resources were required for such players. However, linebackers from California may have been a relative bargain during the same time.

Possible explanations for these particularly strong position and state-of-origin effects involve unique college-athlete matches. For example, USC is located in southern California which is the second highest producer of top 100 athletes in the sample. USC also has a strong tradition of recruiting from feeder high school programs that emphasize family and alumni relationships. This type of strong peer effect could have the unintended consequence of making California linebackers more of a bargain than they otherwise would be. In similar fashion, Florida State might be expected to have some type of advantage in recruiting offensive linemen and other positions, since they are the number one producer of top 100 athletes in the sample.

However, the data do not support this. In spite of these anecdotally strong peer effect cases, the results do not suggest any one single program (i.e., Alabama, Ohio State, Florida State, or USC) has a location or positional advantage when it comes to ending up with a larger share of the top 100.

Finally, I include Table 3 to give some context for the relative magnitude of the marginal effects. Although being a member of the SEC had the largest estimated marginal effect from Table 2, its impact is dwarfed by a program’s violation status. Essentially, the data suggest that cheating potentially increases a college’s share of the top 100 athletes by 725%. Winning a bowl game increases the recruiting share by 94%. This table may shed some light on past, present, and future NCAA infractions. There is only one Nick Saban; if a college wants to increase the number of high-quality players that it recruits out of the top 100, it may rationally decide to bend the recruiting rules to do so.Footnote 7

6 Discussion and Conclusion

Division I football programs successfully recruit a larger portion of the Rivals top 100 high school players when they have a larger number (historically) of conference championships, have earned a bowl championship, belong to the SEC, or have committed NCAA infractions during the sample period. Generally, programs appear to sign more athletes from Florida, California, and Alabama. Competition may be particularly fierce for defensive and running backs from Alabama, line backers from California, and offensive linemen from Florida. In contrast to other research on athlete choice, there is no significant home-state matching of athletes with football programs.

Taken together, these results reveal something about what constitutes a “best” match between student athletes and football programs. DuMond et al. (2008) find that the best players seek out the best schools. My results suggest that the best players—at least those in highest demand—may come from Florida and California. These players gravitated to Alabama, USC, and Florida State.

One limitation of my approach is the restricted sample of just the top 100 athlete-school matches. A clear next step for this research is to expand the number of observations to include the top 250 college-athlete matches (and beyond). Theory suggests that the competition between football programs for the top 100 best high school athletes might differ significantly from the competition for the lower-end of the top talent distribution. I could test for the importance of weight, position, and ranking (and other athlete specific characteristics) in the larger sample for comparison.

As I complete this article, National Signing Day has just passed. One additional extension of this research is to attempt some out-of-sample predictions. If the model predicts or explains a reasonable likelihood of matches from the 2017 recruit class (or past classes), it may shed light on a number of ongoing issues in NCAA research. For example, competitive balance could be negatively affected if SEC schools consistently recruit the largest percentage of the top 100 athletes in the nation. This point is made in DuMond et al. (2008).

Both athletes and programs benefit from improved information about the matching market. For example, a linebacker might avoid holding out for his top school (and missing his second-best option) if he knows running backs are the high demand position that year. If these results lead to improved information and better decision-making about recruiting resources, the matching market could see efficiency gains.

Finally, this model indicates that breaking NCAA recruiting rules increases the likelihood of recruiting a larger number of the top-quality athletes. The notion that “crime” pays in the NCAA is not newFootnote 8; when programs are prohibited from using price competition for athletes, they resort to other means.

Even though the NCAA labor market is atypical, these results can potentially provide insight into other employer-employee matching puzzles with uncertain information. Academic job markets routinely try to match the top Ph.D. candidates with the top employers; medical residents scramble for their top choices to complete their educations; and, to some extent, all businesses face uncertainty about the quality of their job applicants. When price competition is fierce (or when it is prohibited), studying the behavior of the NCAA helps us understand how other two-sided matching markets may operate.