Introduction

Student drop-out has been one of the primary focus of the literature on higher education. Indeed, university financing issues as well as the employment implications of university drop-out have made the understanding of withdrawing decisions a central concern for higher education policies and institutions’ organization. In Italy, in particular, the reduction of drop-out rates is also at the core of recent reforms of the national university system, as increased retention has become the goal of many quality assessments and reorganizational efforts in Italian higher education institutions. Nevertheless, drop-out decisions still remain an entangled matter.

In this paper, we analyze the university student drop-out determinants, exploring an Italian case study. Our aim is to improve the genaral understanding of the students’ withdrawing, focusing on the educational background, student performance and personal characteristics of students rather than on the institutional aspects of universities. Doing so, we attempt to contribute to the identification of those categories of students which are the most exposed to the risk of drop-out.

The problem involving retention of students is not due to a single factor that can be taken in isolation. Studies devoted to the analysis of higher education outcomes and post-secondary drop-out have fostered a high interest in different countries (see e.g. Smith and Naylor 2001; Montemarquette et al. 2001; Arulampalam et al. 2004; Bennett 2009), where various policy initiatives have also been implemented in recent years. In Italy, at the opposite, the research in this area is not so well-established, because reliable national databases with full individual student records are not available, thus making difficult any empirical work on the issue. Compelled by this problem, only in recent years, when several surveys have been performed and case study data have been collected, researchers have started to systematically explore the determinants of Italian university students performance (see e.g. Arcari 2004; Di Pietro 2004; Biggeri and Bini 2004; Boero et al. 2005; Grilli and Rampichini 2007; Cingano and Cipollone 2007; Di Pietro and Cutillo 2008). Although the existing empirical literature points to the fact that there is no one simple explanation or solution to help students towards degree completion or fulfillment of their goals, it also shows that one factor affects university drop-out more than others, namely the educational background, while academic performance is substantially irrelevant. In this paper, we partially review this conclusion. In particular, we perform an empirical analysis using administrative data collected in the faculty of Economics and Business of Sapienza University of Rome, which is the biggest athenaeum in Europe. Our data-set covers 9,725 undergraduates enrolled in three-years bachelor programs during the academic years from 2001−2002 to 2006−2007. We develop an estimating model based on a logit function. Since the basic logit model may fail to fit data in the presence of possible misspecification of any of the elements defining the generalized linear model, we try to capture possible non considered fundamental covariates in the model specification, assuming their joint effect summarized by adding a set of unobserved Gaussian random variables. The likelihood function can be computed directly; in fact, a numerical integration method (e.g. Gaussian Quadrature or Adaptive Gaussian Quadrature) can be applied. The main contributions of our paper consist both in using records from administrative data, that include all the relevant information on individual exams and student progression, and in employing a Generalized Linear Mixed Model, that allows us to take into account latent variables. Although our analysis is limited to a case study and our findings may not be generalized, we show that, contrary to the predominant literature’s claim, the academic performance may in fact be relevant on withdrawal decisions and that students with a better educational background are even more likely to drop-out. Moreover, our research confirms some general trends observed in the literature and provides suggestions for further research.

The plan of the paper is as follows. In Section “Existing studies on university drop-out” we provide a brief overview of the existing literature on university withdrawal; in Section “Methods” we present the model adopted for studying the dynamics swept under the carpet of the drop-out, along with a description of the utilized data and variables. In Section “Results” we provide the results of the adopted methodology and estimation. Finally, some conclusions and remarks are pointed out in the last section.

Existing studies on university drop-out

The academic research on university drop-out generally argues against the common belief that students withdraw because of academic failure. For example, Tinto (1987), in one of the seminal studies on retention, stresses the importance of academic integration and social integration (participation in college life) in predicting retention in a US university setting. Further developing these intuitions, Kalsner (1991) indicates that withdrawing decisions in US may be shaped by uncertainty both about what to expect from college and its rewards, by transition/adjustment problems, by financial difficulties, and by academic underpreparation. Thus, the author concludes that the student motivations are one of the main determinants of retention along with student’s preparedness, so that selection processes are an effective way in which universities can increase persistence. The empirical evidence in the literature seems to lead to similar conclusions. For instance, Johnes (1990) analyzes a sample of students in UK and find that the student’s academic ability (as a function of the school background) is one of the main determinants of the likelihood of non-completion. Corroborating evidence is also found by Noel and Levitz (1985) and Fielding et al. (1998). In an influential study, Smith and Naylor (2001) examine a sample of about 70,000 UK students and find further relations: drop-out probability increases with age, married students have a lower probability of dropping-out, and foreign students are more likely to withdraw as well as those live at the parental address and off campus. Implementing a bivariate probit model on a sample of 3,400 Canadian students, Montmarquette et al. (2001) offer some evidence consistent with that above discussed. In particular, they show that a better relative academic performance in college does not reduce the probability to drop-out, while, if a student is enrolled in a program with an entrance quota, the probability of perseverance is significantly higher. See also Jakobsen and Rosholm (2003) for some evidence from Denmark.

The non existence of a reliable Italian national dataset with complete individual students records has limited for many years empirical analysis on the Italian experience. In recent years, however, institutional datasets and survey data have become available, so to allow the development of a growing number of studies on the university drop-out of Italian students. Employing data from the Italian National Institute of Statistics (ISTAT) for a representative sample of the university population, Cingano and Cipollone (2007) unveil the influence of parental educational background on university withdrawal; in particular, they show that the drop-out probability is decreasing in father’s years of formal education. Similarly, using data from the five waves of the Italian Longitudinal Household Survey, Triventi and Trivellato (2009) examine the dynamics of Italian higher education in the twentieth century with the aim of studying changes in higher education participation (enrollment, transition, and graduation rates) and performance (drop out and delayed graduation rates, average delay duration) and exploring how these differ among different social classes. Employing cross-tabulation and multinomial logistic regression models, the two authors find that the percentage of drop-outs is higher among high-intensity workers than among full-time students and low-intensity workers. Differently, Di Pietro and Cutillo (2008) investigate how the duration, structure and content of the supply of university education has been changed by the 1999 reform (which came into effect in 2001). In particular, they employ a decomposition methodology on data collected by ISTAT and find that the changes in the probability of dropping-out occurred in the last ten years are due to changes in students’ behavior rather than to changes in students’ observable characteristics. In a similar way, D’Hombres (2007) provides an estimate of the impact of the Italian university reforms on university drop-out rates and students status (active versus inactive) using national surveys on students graduated before and after the 1999 reform, pointing out the internal inefficiency of Italian tertiary education institutions.

Institutional data on the Italian university population, such as those provided by ISTAT or by similar institutions, that were extensively employed in the papers mentioned above, may not include individual explanatory factors potentially relevant on the drop-out probability. For this reason, in some recent works, administrative data (which are those collected by the university administrative offices) have been used in order to evaluate universities’ performance and student success. For example, Boero et al. (2005) offer an econometric analysis of the impact of the 2001 higher education reform on Italian students’ retention and progression by using administrative data on students characteristics. Specifically, Boero et al. (2005) implement a probit model for the student drop-out and logit transformation for the student progression and, consistently with the existing empirical literature, find that the high school type and final mark have a statistically significant effect on drop-out probability.

Starting from the existing literature, we provide a more comprehensive analysis of different students’ cohorts, by pooling survey data, for a better understanding of students’ drop-out choice. In order to provide empirical evidence on this issue, we gather data containing information about continuation through the university course in addition to the number of exams and credits obtained by each student for every year. The model specification adopted includes random effects for taking into account omitted covariates in model specification.

Methods

Data

As we have just above mentioned, data provided by ISTAT or other institutional surveys often disregard important features of dropping-out decisions. Consequently, in this paper we have decided to use administrative data. Although they lack information on family educational background, they provide information on the actual performance of university students, which is overlooked by all the empirical analysis previously available (a notably exception is Boero et al. 2005). Specifically, our administrative data contain records of individual exams, in such a way that student progression and academic results can be followed over time. Furthermore, we focus on both observed and latent individual characteristics. In particular, our empirical analysis is performed upon a dataset that collects information on the individual students enrolled in the faculty of Economics and Business of Sapienza University of Rome, and that is constructed from the administrative archives of the athenaeum. The analysis is conducted on time series, based on cross-sectional data from administrative sources, which are useful for describing changes in university attendance year after year. By pooling survey data, we can monitor the dropping-out decisions of students belonging to different cohorts. The choice of this dataset is also motivated by the high number of observations and the rich array of student specific information (such as grades of individual exams) that it collects.

Before explaining the composition of the dataset, we need to clarify the definition of university drop-out that we use in our analysis. We consider the effective drop-out, rather than the formal drop-out defined by the administrative officies. Drop-out is recorded formally in the universities archives if the individual student has either officially withdrawn from the faculty or transferred to another university. In our definition, we include also the cases of those students who have not renewed their registration within the second year of the degree program. Thus, our definition of drop-out refers to leaving the degree program students have been enrolled.

The dataset comprises all the students enrolled during the academic years from 2001–2002 to 2006–2007 to study for a full-time undergraduate degree. The overall sample is composed by 9,725 students, classified according to two different disciplinary sectors (i.e. Economics and Business), and the information provided by administrative sources are referred to the six academic years (see Fig. 1); to avoid censoring problems, we also use information of the academic year 2007–2008. The study is strictly hierarchical in structure, in the sense that each student has been awarded a unique study measurement. In addition to the information collected at the time of the first enrollment (sex, type, date and mark of secondary school degree, age, citizenship, place of residence) the data include following-up information about the advancing of each student during the university career. In fact, we have information on students progress throughout the university course, that concern the number of exams, vote and credits obtained by each student in every year. Moreover, our dataset allows us to include in the analysis a measurement of the student’s household economic situation (ISEE), which takes into account the household income, personal estate and number of members.

Fig. 1
figure 1

Relative frequency of students enrolled by year of enrollment

Two problems may arise due to the lack of information of our dataset. A first problem concerns the fact that the same courses could be taught by different professors; however, we are compelled by the data availability to assume the homogeneity of the courses. The second problem is related to the influences of other unobservable variables, such as exam failures (notice that the students which drop-out of the university have often a high ratio of failed exams to passed exames, expecially in the first year of their program), on dropping-out decisions. Those unobservable factors are related to the students attitudes which we try to capture introducing a student specific latent variable.

Variables

Our dataset allows us to construct several variables. A complete statistical description of the variables is provided in Table 1. Several studies have already unveiled the significant effects of social and family variables on university student behavior (see e.g. Bennett 2009 and reference therein). In this paper, we focus on students’ individual charactersitics, using a specific dataset that allows us also to explore students’ exams and progression. We consider three dimensions shaping withdrawal decisions: the individual student educational background, his or her actual performance in a degree course, and personal characteristics such as sex and place of residence.

Table 1 Variables definition

The educational background can be analyzed considering the type of high school attended as it acts as a proxy for academic preparedness along with the high school degree mark. To study the effect of these two variables on university students dropping out is particlarly interesting. Indeed, in Italy, higher education institutions often link candidates’ admission to their secondary school mark, when they need to restrict the number of students enrolled. This policy relies on a positive relation between student’s credential and university success, so that the higher the individual student’s secondary school mark, the higher the probability that he or she will succeed in a higher education program. Among other things, we want to investigate whether this is generally true or not. In doing so, we consider two types of high school diploma: general high schools (licei) and technical schools (istituti tecnici and istituti professionali). While general schools provide an educational background well-suited for a degree career, technical high schools explicitly aim at providing vocational education; furthermore, general high schools curriculum is designed to provide a more comprehensive student preparation, which, in general, does not bind students to particular degree courses. Furthermore, the latency period, as the number of years between secondary education diploma and enrollment in the university, is explicitly taken into account. To discuss specificities of the disciplinary sectors, we include a dummy variable to capture if students of Economics perform differently from those of Business.

In accordance with our research purpose, we include in the operative model the university student actual performance as one of the covariates. In particular, the university student performance is measured by an index based on the exam grades and the number of credits which every exam assigns; since we consider the withdrawal in the first two years only, such an index is expressed as an average of the first two years performance and divided in four classes (null, low, middle, high).

As control variables we include several personal characteristics. Firstly, we consider sex, citizenship, and the place of residence. Indeed, several authors have suggested that these variables may be relevant on student behavior with respect to accademic carreer (see e.g. Mastekaasa and Smeby 2008, on the role played by the student gender). Secondly, we consider the economic situation of students, using a measure of their financial capability to undertake a higher education program. In this case, we want to investigate whether financial pressure may affect students’ behavior or not. Nevertheless, dealing with income in econometric specification requires some empirical solutions. On the one hand, since different individuals belong to households with different size, we consider, rather than the raw household income, a synthetic indicator of the household economic situation (ISEE), which is based on household income, wealth and number of members. On the other hand, we adjust the ISEE by the Harmonized Index of Consumer Prices (HICP), to make valid inter-year comparisons; this variable is considered through four class-dummies. Figures 2, 3 and 4 report the drop-out rates by, rispectively, academic year, student’s performance and economic situation.

Fig. 2
figure 2

Drop-out rate by academic year (%)

Fig. 3
figure 3

Drop-out rate by university performance (%)

Fig. 4
figure 4

Drop-out rate by ISEE (%)

Statistical methods

The reasons why students drop-out of the university are various and complex. To derive a theoretical analysis in such a multidimensional context is an entangled matter, unless we make some assumptions.

In this spirit, we start assuming that observed binary variables \(y_{i}, i=1,\ldots ,n\) are realizations of independent Bernoulli random variables with parameter λ i . The interest is focused upon the parameter vector \({\bf \lambda}=(\lambda _{1},\lambda _{2},\ldots ,\lambda _{n}) \) , which is usually modeled, in a regression context, by defining a Generalized Linear Model (GLM, see McCullagh and Nelder 1989) for the analyzed response.

When some fundamental explanatory covariates are not included in the model, the specification of a GLM model may fail to fit the data. Further, it is well known that a failure to control for any unobserved individual-specific effects that may affect the response variable will result in misleading inference due to inconsistent estimators. A natural way to overcome this problem is to add a set of unobserved variables to the linear predictor:

$$ \hbox{logit}(\lambda _{i})=\nu _{i}={\bf x}_{i}^{T}\beta +b_{i},\quad i=1,\ldots ,n $$
(1)

where b i is the random effect for the individual i. These random effects represent individual heterogeneity, which is not captured by the observed covariates.

In (1), the random effect enters into the model on the linear predictor scale; this is convenient but also natural for many applications. More specifically, the random effects may represent heterogeneity caused by omitted explanatory variables and may be related to methods for dealing with unmeasured predictors or other missing data.

Model (1) can be viewed as a two-stage model. At the first stage, conditional on the random effects, observations are assumed to follow a GLM; while at the second stage, the random effects b i are drawn from a distribution f(b i ). The likelihood function is therefore given by:

$$ L(\cdot) = \prod\limits_{i=1}^n\int f(y_{i}\mid b_i,{\bf x}_{i})f(b_i)db_i $$
(2)

where f(b i ) can be the standard normal density or any other parametric distribution.

The integral dimension depends on the random effects structure and it may not have a closed form; therefore, parameters estimation in Generalized Linear Mixed Models typically involves numerical approximation to likelihood function. As a general point, the solutions are usually iterative and numerically quite intensive. As pointed out by Aitkin (1999), if the distribution of the random effects is conjugated to the model distribution, then maximum likelihood (ML) is straightforward in principle from the marginal distribution of the observed data.

In our application, random effects are assumed to have a Gaussian distribution. The most commonly used approximate methods of estimation under this assumption are Gaussian quadrature and adaptive Gaussian quadrature. Here we adopt the latter, that it is known to work well when a dichotomous response is considered (see Rabe-Hesket Skrondal 2002) and since the exact positioning of the quadrature locations is not crucial.

One further issue that needs to be addressed is that of possible clustering of drop-out rates given the hierarchical structure of the data. We can treat the unobservable year effects either as fixed or as random. In the following, we will adopt fixed coefficients to capture year effects, since we think it is more appropriate for our analysis as the data analyzed cover the full population of Economics and Business in Sapienza University of Rome.

Results

Estimation results are reported in Table 2, where estimated coefficients and standard errors are collected. The coefficients give the sign and the value of the partial effects of each explanatory variable on the probability of dropping-out (response probability), while the statistical significance of each explanatory variable is determined by whether the null hypothesis β j  = 0 can be rejected at a sufficiently small significance level. Maximum likelihood estimation is carried out by applying a Newton-Raphson based algorithm, once the adaptive Gaussian quadrature has converged.

Table 2 Estimation results (dependent variable: drop-out of the faculty, 1; otherwise, 0)

Firstly, our results unveil the positive and statistically significant effect of the student’s ex-ante credential on his or her probability of university dropping-out. In particular, on the one hand, contrary to what might be intuitively expected, the higher the secondary school final mark, the higher the probability of university withdrawal, on the other hand, those students who attended general high schools are more likely to drop-out of the university. We interpret this twofold result as a signal for “consumer oriented” behavior of well-trained students, that easily withdraw from the university once they have realized that they do not enjoy the subject. Probably, individuals with a high educational background are more sensitive to a low performance at the university. Consequently when they do not perform well in a program, they change faculty, or even withdraw from the university. Secondly, as expected, our estimates show that the ex-post individual performance is the crucial determinant of dropping-out decisions. Indeed, being the null performance (that is the performance of students which did not pass any exam) the benchmark, higher performances imply extremely lower withdrawal probability. This latter result is still consistent with the previous ones, indeed it may suggest that when students like what they do, they discover that there is nothing more rewarding.

Other interesting results are obtained. Contrary to what the predominant literature have suggested (see e.g. Mastekaasa and Smeby 2008), we find that, with respect to the faculty we are considering, male students drop-out less likely than women. Also other individual characteristics may play a significant role. First of all, the citizenship comes out as a relevant factor. Indeed, while to study in Rome when it is the own city does not affect the probability of withdrawal, non Italian students drop-out of the university less likely than Italians do. We think that the foreign systems of secondary education are probably better than the Italians in supplying students with the educational skills required to successfully undertake a degree program. Moreover, foreign students that study in Italy have a high incentive to conclude the university program in which they enrolled, because of the high fixed costs (both financial and psychological) that they bear in tranferring to an other country different from their own. Also the student economic situation seems to affect dropping-out decisions. Being the lowest income class the benchmark, having a medium economic status does not have any significant effect, while those students in the highest income class are more likely to drop-out. This result suggests a non-linear relation between economic status and university withdrawal probability. The fact that lower class students (ISEE < 10,000 €) drop-out less likely than rich ones is probably due to financial pressures, which still influences university student success. Furthermore, we also find that the higher the number of years between the secondary education diploma and the enrollment in the university, the lower the dropping-out probability. This may indicate that adult students (often workers) have strong motivations to conclude the degree course once they have enrolled; accordingly, other studies have showed that adult students, in general, outperform younger students in university programs (Hoskins et al. 1997).

With respect to specificities of the disciplinary sectors analyzed, our results unveil that students of Economics perform differently than the others, i.e. they are more likely to drop-out.

Finally, an interesting result is that the university students dropping-out probability is increasing during the considered period, though this case study result may not reflect the national trend.

In conclusion while our estimation has identified several variables as statistically significant, these variables, collectively, do not fully explain student performance and motivations, suggesting that also other factors may play a crucial role in explaining the student success. For this reason we have considered a model with random effects which take into account unobserved variables and also help us to allow for individuals heterogeneity.

Concluding remarks

In this paper, we have tried to investigate some of the determinants of dropping-out decisions. In order, to pursue this goal, we have used data on the faculty of Economics and Business of Sapienza University of Rome and implemented a Generalized Linear Mixed Model. Our empirical analysis is mainly devoted to study how students’ characteristics affect the withdrawal from the university. Thus, we have included measures for the student’s educational background, student performance, household income and other personal characteristics in our operative model. In particular, we use data provided by the university’s administrative officies, so that our dataset collects student specific information such as grades of individual exams (by means of which we explore students’ progression). Although our analysis is limited to a single case study and allows us to qualify drop-out only in terms of individual choice and not in terms of success or failure, it may anyway suggest some useful insights for both institutions’ organization and further research. Indeed, contrary to the predominant literature’s claim, we have found that the higher the student’s secondary school final mark, the higher the student’s probability of withdrawal. Moreover, those students who perform well in their degree course are less likely to withdraw from it.

We may argue that well-trained students may have a “consumer oriented” behavior once they have enrolled into the university, thus they may be more demanding and easily drop-out of the university once they have discovered they have been sold something they do not want. Further, if the university performance of students is affected by their ability to undertake the specific program which they chose, those students who chose a program not appropriate for them may show a high drop-out probability. It follows that the use of the secondary high school mark as a screening instrument to reduce the number of enrollments in individual universities may result ineffective in order to contrast drop-out, to the extent to which it implies the selection of those students who are more likely to drop-out. Moreover, universities should invest more in pre-enrollment orientation programs; indeed, career advisers may play an important role in helping students to choose the most appropriate program for them, in which they would perform the best. Finally, we have found that also the financial pressure may affect students’ withdrawing decisions and that foreign individuals that study in Italy are less likely to drop-out of the university. The latter is a good reason to enhance the internazionalization of university programs in Italy, so to improve both the quality and efficiency of the Italian system of higher education.