Introduction

The assignment theory developed by Sattinger (1993) suggests that education-job mismatch is a source of heterogeneity in return to schooling. By definition, education-job mismatch refers to a situation in which an employee’s individual level of education is different from the level of education appropriate for the job (Quintini, 2011; Vichet, 2018). This concept focuses on the interaction between the supply of graduates from the education system and the demand for educated workers on the labor market. It is the deviation between the worker’s skill level and the level required by the job. Scholars paid attention to two types of mismatches: overeducation and undereducation (Freeman, 1976; Rumberger, 1981; Tsang & Levin, 1985). Overeducation refers to a state of disequilibrium, whereby workers possess excess educational qualifications relative to those required by their jobs, while undereducation refers to a situation where workers hold jobs for which they are not qualified enough. On the labor market, undereducated people benefit from wage premium as compared to well-educated workers with the same level of education. A general consensus is that the return to years of overeducation (undereducation) is substantially smaller (bigger) than the return to required years of education (Duncan & Hoffman, 1981). These heterogeneous returns associated with education-job mismatch have important implications in terms of return to schooling.

Education-job mismatch represents one of the most stylized facts and most important costly factors for Cameroon’s labor market. To understand the main factors, it is important to question not only the quality of the education acquired by graduates but also the absorption capacity of local labor market. A reason for education-job mismatch is an excess or shortage in the higher educated labor supply on the market (Lin & Hsu, 2013). During the post-structural adjustment period, labor market functioning in Cameroon witnessed structural change. On the supply side, the country experienced a phenomenal expansion of education since the 1960s. In particular in higher education, the number of students increased exponentially during this period: around 162 in 1962 against 520,000 in 2018, a situation which is fueled by the strategy of lengthening studies adopted by young people who want to avoid unemployment. The proliferation of diplomas associated with this expansion of education has generated an increase of labor force relatively more educated. Thus, the more educated workers, after a longer period of job search, might be forced by circumstances to accept a job with lower educational requirements, with a resulting negative impact on wages (McGuinness, 2006).

On the demand side, labor market creates very few jobs that require high levels of qualification. Furthermore, there is a narrow formal sector. According to the National Institute of Statistics, the informal sector represents about 91% of total employment. Therefore, the competition for jobs among graduates is frightened in the formal sector. According to the congestion hypothesis, Cameroon’s economy has got to a level where either qualification or specialty labor demand is above labor demand in the formal sector. Sometimes in this kind of context, most employers are more likely to hire an employee with a higher than required level of education provided the opportunity is given (Bishop, 1995; Thurow, 1975; Verdugo & Verdugo, 1989). With the presence of mismatch problems, the return to schooling seems to be doubtful or perhaps different than it would have been. In this perspective, many challenges are encountered and need to be resolved: (1) What is the extent of educational-job mismatch? (2) What are the determinants of education-job mismatch? (3) What are the education-job mismatch effects on earnings? (4) What is the education-job mismatch effect on the return to education?

The objective of this study is to highlight the role of education-job mismatch in understanding the heterogeneity in return to schooling. More specifically, our study aims to assess the differentiated returns to education due to overeducation and undereducation in wage employment in Cameroon. In order to achieve this, we used OLS, instrumental variable (IV) techniques, and the Heckman selection model. Double selection models have also been used, with one selection for participation in the workforce and another for choice of wage employment. Several empirical studies investigated on the returns to education and overeducation in African countries (Herrera & Merceron, 2013; Morsy & Mukasa, 2019). Some scholars focused to analyze the labor market behavior in Cameroon (Atangana Ondoa, 2019; Baye et al., 2016). A few of these studies have attempted to tackle some methodological problems while others have been heavily criticized for not taking into account these issues. Our further margin in this literature is to provide a new insight on the heterogeneity in the returns to education due to education-job mismatch and at choice of different selection models’ selection.

The remaining structure of this article is as follows: the “Education-Job Mismatch and Earnings: a Brief Review of the Literature” section summarizes some evidences drawn from the literature on education-job mismatch and returns to education; the “Some Stylized Facts About Education Mismatch and Earnings” section describes data used and presents some stylized facts related to descriptive statistics; the “Methodology and Econometric Challenges” section displays the econometric modeling and estimation strategy by discussing different model extensions; the “Results and Discussion” section is devoted to results analysis while the “Conclusion” section concludes the study.

Education-Job Mismatch and Earnings: A Brief Review of the Literature

We begin with a presentation of theoretical explanations of mismatch. The term “mismatch” is often used to refer to rather different concepts in economics, thus creating a certain confusion in an area that is attracting more attention. The present analysis is based on a micro approach which refers to each single worker and his job. The empirical study review highlights our contribution to economic literature.

Theoretical Foundations of Mismatch Micro Concept

The micro concept of mismatch is attracting more scholar attention. About the theoretical foundations, several explanations can be evoked. According to human capital theory (Becker, 1964), education mismatch such as overeducation is the result of substitution behavior between two components of human capital (general and specific). The general component can be assimilated to education and the specific component to professional experience. Generally, young applicants for first job have a disability related to the lack of work experience required by employers. Sometimes, employers use the surplus of education as a substitute for missing experiences. Employees with different backgrounds may hold the same position but overeducated because they have no professional experience or undereducated because of its low education level although having a strong experience (McGuinness & Wooden, 2007). In the job matching theory (Jovanovic, 1979; Pissarides, 1994), educational mismatch results from the asymmetry of information between employers and employees. On the labor market, there is a double heterogeneity of workers distinguished by their level of qualification, and job positions that differ in their level of requirement qualification. The lack of information on the efficient matching of these characteristics would be the reason explaining the mismatch situation.

In the theory of career mobility developed by Sicherman and Galor (1990) and Sicherman (1991), wages tend to grow over time together with the work experience accumulated by individuals. According to this theory, some workers choose an initially mismatched post that enables them to acquire the necessary skills, through on‐the‐job training and work experience, which will later on enable them to achieve more rapid career progression in the future. The Job Competition Model emphasizes the importance of job availability and argues that workers are subject to a fixed distribution of jobs with individuals investing in education in order to preserve their place in the jobs’ queue (Thurow, 1975). Once an individual reaches the top, they are allocated to a job, so their wage will be predetermined solely by the productivity characteristics of the job, with overeducation occurring where the skill requirements of allocated position are below those acquired by worker. The job search theoretical model (Lippman & McCall, 1976) assumes instead that unemployment is largely a voluntary choice. People accept a job offer when it brings with it a wage higher than their expected wages. The most skilled graduates prefer to wait into the non-employment pool until when they get the best job offer. Compared to the least skilled graduates, high skilled individuals have higher expected wages and wait for a longer time in unemployment situation they tend to decrease their wage claims. Therefore, overeducation arises when they accept the lower wage offer than their expected wage.

Some Empirical Evidences

This study densely contributes to the empirical literature. First, it relates both to the literature on overeducation that was pioneered by Freeman (1976) and the one on returns to investment in education well documented since the 1950s (Becker, 1964; Mincer, 1974; Psacharopoulos, 1994; Psacharopoulos & Patrinos, 2018; Schultz, 1961). A large body of research has tackled the question of overeducation return at the individual level. In sub-Saharan Africa, Herrera and Merceron (2013) studied mismatch using data from the 1–2-3 surveys conducted in urban areas for seven West African countries (Benin, Burkina Faso, Ivory Coast, Mali, Niger, Senegal, and Togo), Cameroon, Madagascar, and the Democratic Republic of Congo between 2001 and 2005. Morsy and Mukasa (2019) used school-to-work transition survey datasets from 10 African countries (Benin, Congo, Egypt, Liberia, Malawi, Togo, Madagascar, Uganda, Tanzania, and Zambia) in order to explore the relationship between mismatch and wage. Estimation results suggest that overeducated youths earn on average 17.9% less and undereducated 44.8% more than employed youth with the same level of education who work in matched jobs. However, the study of Herrera and Merceron only covers urban areas while Morsy and Mukasa’s study does not include the Cameroon country. With regard to returns to investment in education, several scholars have tackled this issue. For instance, Psacharopoulos (1994) shows that primary education is more profitable both economically and socially than secondary and tertiary education in developing countries. With a sample of six countries (Ghana, Ivory Coast, Kenya, South Africa, Nigeria, and Burkina-Faso), Schultz (2004) found that the private returns to education increase with the level of education. In Nigeria, Aromolaran (2002) shows that the average wage of workers increases with their level of education. Mwabu and Schultz (1996) concerning South Africa reported that the returns to tertiary education for white people rise significantly up the wage scale. Girma and Kedir (2005) show that the return to a year of education in the first decile is twice as high as in the ninth decile in Ethiopia. In a case study from Uganda, Kavuma et al. (2015) conclude that returns to education decrease with quantiles considered for salaried and self-employed. Psacharopoulos and Patrinos (2018) review and highlight the latest trends and patterns based on a database of 1120 estimates in 139 countries. The review shows that the private average global rate of return to one extra year of schooling is about 9% a year and very stable over decades. In the case of Zamo-Akono and Tsafack Nanfosso (2013) analyzed returns to education according to employment sector, while Baye (2015) assessed the effect of education on wage quantiles. From this perspective, Njifen and Pemboura (2021) have highlighted the heterogeneity in the return to education in the earnings distribution. While there is a wide range of literature focusing on the return to education, the scarcity of papers showing the contribution of mismatch to the heterogeneity in returns to schooling is striking. As genuine, this paper highlights the differentiated returns to schooling due to overeducation and undereducation.

Second, the contribution of this study is obvious on the methodological viewpoint. There are two main empirical specifications used in the literature to assess wage effect of mismatch (Rumberger, 1987; Sicherman, 1991; Zhu & Chou, 2020). The first from Duncan and Hoffman (1981) decomposes years of completed schooling into years of required, surplus, and deficit schooling. Under this format, overeducated and undereducated workers are compared to their colleagues. The second model is from Verdugo and Verdugo (1989) where two dummy variables are introduced and each equal to 1 if the individual is classified as overeducated and undereducated respectively. Most studies that adopt the Verdugo and Verdugo model have found that overeducated workers receive significant pay penalties and undereducated workers receive substantial wage premiums (Cohn & Khan, 1995; Mateos-Romero & Salinas-Jimenez, 2018; Wincenciak, 2018). Hartog’s study shows invariant results of earning function estimation for five countries (Netherlands, Spain, Portugal, the UK, the USA) with different data sources and for different periods, overall a series of 45 regression results (Hartog, 2000). In this valuable literature, there is no clear-cut evidence with regard to the multiplicity of methodologies that have been used for this purpose. There is the scarcity of studies that integrate the problem of double selection in participation on the labor market. In fact, according to Bairagya (2020), estimation is biased if we completely ignored the issue of second selection such as suggested by Ham (1982) and Tunali (1986). To extend this literature, a double selection model is used in this study, with one selection for participation in the workforce and another for choice of wage employment. Furthermore, most studies attempting to estimate the mismatch effects on wages did not treat overeducation or undereducation as endogenous variable. If there is a correlation between ability (unobservable) and education-job mismatch, the coefficients of the parameters obtained by ordinary least squares will be biased (Leuven & Oosterbeek, 2011). Education-job mismatch is potentially endogenous. In practice, we might obtain a biased analysis if we do not take into account the issue of endogeneity of overeducation variable. Finding instruments that are simultaneously strong predictors of education and also satisfying the exclusion criteria are challenging. We deal with this problem using the instrumental variable method. Unlike other authors, we use the non-self-cluster mean variables. According to some authors, especially Alderman and Garcia (1994), Handa (1996), and Bredenkamp (2008), non-self-cluster means, although rarely used in the literature, are proper instruments, in the sense that, by definition, they are uncorrelated with the error term, and are highly correlated with the instrumented variable. Following Bairagya’s study on the comparison of returns to schooling in India across different selection models (Bairagya, 2020), we attempt to analyze the sensitivity of returns estimates to the choice of a type of selection model.

In fine, our paper furthers applied studies that have examined the labor market behavior in Cameroon. In several papers, scholars identify the determinants of wage differential. For example, Yogo (2011) investigates the capital social effect on wage. He reports that users of social networks exhibit a wage premium of 1.53% of average wage. Moreover, he also finds that social network contributes to explain wage differential according to gender and institutional sectors (formal versus informal). Baye et al. (2016) attempt to analyze gender wage differentials in Cameroon. They indicate that the effect of education on wages increased with percentiles and a larger part of the wage differential stems from differences in characteristics between men and women. This analysis confirms previous studies, especially Ndamsa et al. (2015) who indicated that the majority of the observed gap arises from potential labor market discrimination rather than from gender wage differences in the mean values of productive and job-related characteristics. Atangana Ondoa (2019) analyzes the education effects on wage inequality in the informal sector. Findings show that both wage and wage inequality in the informal sector increase with education, with tertiary education offering the highest return but also resulting in the greatest within-group wage inequality. Our study extends these valuable studies on the labor market in Cameroon.

Some Stylized Facts About Education-Job Mismatch and Earnings

The data used in this study are drawn from the second survey on employment and informal sector in Cameroon named EESI II. The survey was carried out by the National Institute of Statistics in order to analyze the labor market outcomes while comparing formal and informal sector. In the survey methodology, informal sector is defined as a set of activities and enterprises that are not recorded and not included in the national accountability. Nationwide, this survey was realized with 8160 households, that is, 38,580 individuals including 50.23% of women and about 21,490 young people (under 35 years old). The household considered is the ordinary household residing in the national territory. The base of sampling is constituted by twelve enumeration areas, a portion of territory limited by visible details and containing between 700 and 1100 residents, that is, 140 to 230 households. Overall, 756 enumeration areas have been defined. Except Douala and Yaounde towns, every region is subdivided into urban and rural stratum. The database includes about 14,550 salaried workers. Average monthly earning generated by the main and secondary jobs amounts to 81,817 XAF. The minimum of earnings in the sample is 40,000 XAF and the maximum is 540,000 XAF. Table 1 highlights quantile distribution of monthly wage on the labor market.

Table 1 Quantile distribution of earnings from wage employment

Table 1 shows that 25% of the lowest paid earn less than 50,000 XAF while the top 10% each earn more than 150,000 XAF. The median wage, the one with exactly half the data above it and half below it, is 60,000 XAF. Thus, 50% of employees earn less than 60,000 XAF and another 50% also earn more than this amount. Interpercentile ratio reports that 10% of the highest paid earn about 3.33 times more than the 10% of the lowest paid on the labor market.

Besides, the survey allows dealing with education mismatch issues. For instance, the dataset includes information about each acquired degree from different schools or universities, which supports an objective determination of the presence of a mismatch. Due to data availability, we focus mainly on normative (job analysis) and statistical (realized matches) approaches of education mismatch that are known as objective measures, because the survey questionnaire does not take into account the worker self-assessment of the required skill to hold the job. It is not possible to approach subjectively the mismatch using these data. Based on the job analysis measure, each occupation classified by the International Standard Classification of Occupations Code is assigned to required level of education mentioned in the International Standard Classification of Education.Footnote 1 For example, occupational levels such as clerical support workers and elementary occupations do not require higher education. Graduates in these occupations will be considered as overeducated. One may also look at market realizations such as the mean educational attainment in a given occupation or as hiring standards used by firms’ personnel departments (Groeneveld & Hartog, 2004; Verdugo & Verdugo, 1989). These matches are the result of demand and supply forces (Leuven & Oosterbeek, 2011). Table 2 provides the extent of educational mismatch.

Table 2 Evidence on the education-job mismatch in percentage

The main observation that can be drawn from Table 2 is about the high (low) rate of workers which are overeducated (undereducated): 46.6 (28.8) % for the normative approach and 52.6 (21.7) % for the statistical approach. Too many unemployed people are forced to accept jobs that do not match their educational attainment, while demand factors might be related to a relatively undeveloped business environment (a lack of jobs). Besides, overeducation phenomenon is not a particularity of Cameroonian context but it is relatively high when compared to other developing economies. Scholars report overeducation incidence of about 42% in Pakistan (Abbas, 2008), 32.2% in Mexico (Mehta et al., 2011), about 15% in Colombia (Herrera et al., 2013), 21.3% in sub-Saharan countries (Herrera & Merceron, 2013), then 57.2% in Congo, and 62% in Mali. In short, countries with high levels of education achievements have relatively high level of overeducation. Like many developing countries, Cameroon is failing to tap into and apply the vast stock of expertise to its needs (Wirba, 2021). Table 3 highlights some characteristics of mismatch in Cameroonian context.

Table 3 Education mismatch characteristics (in percentage)

According to gender, men experience more education-job mismatch compared to women. When men are the prime income earners in a household and when the choice of location is determined by the man’s labor market prospects, men are necessarily constrained to accept any job despite its quality. This may translate into an increased probability to be overeducated. The education-job mismatch is not necessarily a persistent phenomenon at the individual level. For a better understanding of the fact, it is helpful to know more about its persistence at an individual level. For instance, the structure per age of overeducated or undereducated workers has an inverted U shape over the lifecycle, independently of mismatch measure. According to normative approach, around 38.0% of workers are working in overeducated employment at an average age of 23 (mean of 17–30 age range). This percentage increases until age 37 (mean of 30–45 age range) attaining 55.85% and then starts decreasing again. By the age of 52 (mean of 35–59 age range), around 06.13% of workers are overeducated. Older workers are less likely to be overeducated than their younger colleagues. In average, high school graduates enter the labor market around age 28 in Cameroon. The downward trend at second stages of the career is consistent with existing models of labor mobility in which mismatch in worker skills and skill requirements of the job decreases over time as workers overcome search and learning frictions. This fact is consistent with the theory of career mobility where workers who are overeducated in their first job have a higher probability to be promoted. It is also consistent with the view that the labor market rewards workers’ entire bundle of human capital in which extra schooling can compensate for lack of experience. To the more robustness in analysis, the stylized fact on the persistence of mismatch over the life cycle requires a particular study in order to show the shape of relationship between mismatch and age. Furthermore, a majority of overeducated workers have a secondary level of education while undereducated people are dominated by workers which have a primary level with the statistical measurement. The share of overeducation phenomenon in private employment sector is about 45%, and the share of undereducation in the informal employment sector is about 59%, according to different measure approaches.

In Cameroon like in sub-Saharan countries, one of the common factors behind the education-job mismatch is the imperfection of information. Informality, itself, is a cause of skills mismatch as it impacts on individuals’ decisions to accept mismatched employment. The lack of attention from training providers to the requirements of the informal economy is often mentioned (Adams et al., 2013; Palmer, 2018). Finally, one of the main observations drawn from descriptive statistics is that more than 60% of overeducated and undereducated people are technicians and clerical workers. This occupational class contributes more to mismatch than executive staff or workmen. To measure the incidence of mismatches, we use the realized-match approach because it is naturally updated regularly according to labor market functioning.

Methodology and Econometric Challenges

The methodological approach unfolds in three steps. First, we specify an econometrical model based on earning function. Second, we discuss about selection problems. Finally, we debate about the choice of instruments used in order to correct the endogeneity bias.

Model Specification

The standard Mincerian model (Mincer, 1974)Footnote 2 has become a cornerstone of applied labor economics. It has been extended in various forms (Duncan & Hoffman, 1981; Verdugo & Verdugo, 1989). More specifically, Verdugo and Verdugo (1989) propose to use two dummy variables for grasping overeducation and undereducation then controlled for the actual years of education attained. By taking inspiration of Verdugo and Verdugo’s model, we can specify the following earning function:

$${Y}_{i}={\alpha }_{0}+{\alpha }_{1}\;{E}_{i}+{\sum\limits_{k=2}^{3}{\alpha }_{k}}\;{EM}_{ik}+{\sum\limits_{k=4}^{m}{\alpha }_{k}}\;{X}_{ik}+{\varepsilon }_{i}$$
(1)

where \({Y}_{i}\) is the logarithm of monthly wage for an individual \(i\);\(E\) represents years of schooling; \(EM\) is a set of two dummy variables either taking 1 when overeducated and undereducated; \({\alpha }_{k}\) is a vector of coefficients; \({\alpha }_{0}\) is the constant and \({\alpha }_{1}\) measures the rate of education return; \(\varepsilon\) is a disturbance term; and \(X\) are a set of control variables (age, squared age, experience, employment sector, and sex) assumed to affect earnings. Besides, we introduce the interaction variable (mismatch index × educational level) in order to assess the education-job mismatch effect on returns to schooling. We provide descriptive statistics about these variables in Appendix Table 8.

Model 1 is traditionally estimated using ordinary least squares (OLS) methods. It is well established that an OLS-based estimation of the earning equation suffers both from a selection bias due to non-randomness associated with the sample selection and from endogeneity bias to be corrected for. The statistical association between individual labor market outcomes and education-job mismatch is likely to be biased by unobserved individual heterogeneity, which requires the implementation of more sophisticated econometric techniques than a simple OLS estimation.

Discussion About the Selection Problem

From the empirical viewpoint, the study consists to identify the determinants of earnings with a focus on mismatch. We certainly have a selection bias for participation in the labor force. Following the Heckman (1979) procedure, we use two stages in correcting the selection bias. At the first stage, a probit model of employment selection is specified (Eq. 2), and after estimating, we incorporate the Mills ratio into the earning function to correct for selection bias.

$$\mathrm{Prob}(P=1|Z)={\beta }_{0}+\sum\limits_{k=1}^{m}{\beta }_{k}\;{Z}_{k}+\mu$$
(2)

where \(P\) represents the participation in employment and \(Z\) and \(\beta\) represent vectors of explanatory variables and parameters, respectively. These covariates are age, residence area, sex, religion, education level, and household children in charge. The correct identification of this model is ensured by the addition of an instrumental variable, according to Maddala (1983). This instrument, named “non-self-cluster of activity,” is a variable which represents the share of participants per household excluding the member’s household in his/her cluster. Then, using Eq. 2, one can estimate the predicted probability of individuals engaging in employment. We correct the sample selection bias by way of including the above predicted probability as an added explanatory variable in the earning model. Therefore, Eq. 1 can now be written as:

$${Y}^{*}={\alpha }_{0}+{\alpha }_{1}\;\mathrm{E}+{\sum\limits_{k=2}^{3}{\alpha}_{k}\;\mathrm{EM}_{k}}+{\sum_{k=4}^{m}{\alpha }_{k}\;\mathrm{X}_{k}+\varepsilon}$$
(3)

where \({Y}^{*}\) cannot be observed for individuals who are not salaried. The conditional expected earning of employed with the error terms of Eqs. 2 and 3 following a joint normal distribution is written as:

$$\mathrm{E\left[Y|X, P=1\right]= {\alpha }_{0}+{\alpha }_{1}E+{\sum\limits_{k=2}^{3}{\alpha }_{k}\mathrm{EM}_{k}}+{\sum\limits_{k=4}^{m}{\alpha }_{k}\;X_{k}}+\rho {\sigma }_{\mu }\uplambda \left(Z\beta \right)}$$
(4)

where \(\rho\) represents the correlation between unobserved factors that determines the labor market participation and unobserved factors related to earnings; \({\sigma }_{\mu }\) represents the standard deviation of error term; and \(\lambda\) is the Inverse of Mills Ratio (IMR), computed on the basis of the vector of explanatory variables that determines the employment participation (\(Z\)) and the vector of parameters (\(\beta\)) from Eq. (2).

However, while selection bias correction with one selection equation is common in the literature (Gronau, 1979), it has not gained much appeal for more than one stage. We certainly have a double selection bias: one for the participation in the labor force and another for the choice of wage employment, given the option of self-employment for those participating in the workforce. In fact, we might obtain a biased estimation if we completely ignored both independency and interdependency of the two selection decisions, using two separate models. Following Ham (1982), Tunali (1986), and Bairagya (2020), we have first estimated two IMR based on two separate probit models and then using these correction terms as an exogenous variable in the earning equation. Then, we have estimated an IMR based on a bivariate probit estimation and including it in the earning model. The first equation of sample selection captures the participation in the labor force (Eq. 5), while the second includes the choice of wage employment types (Eq. 6).

$${P}_{1}^{*}={\beta }_{01}+\sum\limits_{k=1}^{m}{\beta }_{1k}\;{Z}_{k}+{\mu }_{1}$$
(5)
$${S}_{2}^{*}={\beta }_{02}+\sum\limits_{k=1}^{m}{\beta }_{2k}\;{Z}_{k}+{\mu }_{2}$$
(6)

where \({P}_{1}^{*}\) and \({S}_{2}^{*}\) are the latent variables. \({P}_{1}\) and \({S}_{2}\) represent the selection for employment participation and the choice of wage employment, respectively. Z represents covariates that determine the selection for employment participation and the choice of wage employment, respectively. \({\beta }_{1}\) is a priori different to \({\beta }_{2}\). Further, \({\mu }_{1}\) and \({\mu }_{2}\) are the error terms for employment participation and the choice of wage employment, respectively.

Discussion About the Choice of Instruments

Model 1 includes 3 potential endogenous variables. The education variable (E) is potentially endogenous. Endogeneity bias renders the estimated coefficient from standard regressions causally uninterpretable as the estimates will be inconsistent in the sense that they do not converge to true coefficient values (Clougherty & Duso, 2015). Wooldridge (2012) outlines and clarifies the three sources of endogeneity bias: measurement error, simultaneity, and omitted variables. Measurement error in variable constructed can both attenuate and bias the effect of regression estimators, while simultaneity occurs when one of the predictors is jointly determined along with the dependent variable (Li, 2012). Omitted-variable bias arises when an omitted factor exists which both affects the dependent variable and is correlated with one or more explanatory variables. In essence, this violates the most important of the OLS assumptions (the exogeneity assumption). Omitted variables have received the greatest amount of attention in this study and the reason for considering education as an endogenous variable is that factors related to unobserved individual ability may determine both education and earnings, as indicated by several existing studies (Bairagya, 2020; Kolstad & Wiig, 2015; Mavromaras et al., 2013). We perform a test suggested by Hausman (1978) to determine if education is exogenous. The test requires predicting education, obtaining the residual, and including the residual in the main regression. If the residual is significant, the exogeneity hypothesis is rejected. In the case of this study, the coefficient associated to education residual (− 0.0276) is significant at 5% threshold level (p = 0.032). Education-job mismatch is potentially endogenous. Endogeneity arises if it is assumed to be related to error terms in the wage function, which is likely to be the case (Leuven & Oosterbeek, 2011; Morsy & Mukasa, 2019). Furthermore, this variable as a job characteristic constitutes a channel through which education affects earnings. For the instrumental variable (IV) estimation method, we need at least one instrument per endogenous variable to identify the model. These instruments should however fulfill two conditions (Wooldridge, 2012): relevance (high correlation between instrument and endogenous regressor) and exogeneity (absence of correlation between instrument and error term in the main regression). We use two instruments for education variableFootnote 3: “non-self-cluster-mean education” and father’s education.

As a community level variable, non-self-cluster-mean of education is the average level of education of other cluster members in which the concerned individual resides. Authors show that non-self-cluster means are suitable instruments. They satisfy both the orthogonality condition and the exclusion restriction (Alderman & Garcia, 1994; Handa, 1996). Under the imitation effect, this variable is correlated to individual education level but not with earnings. About father’s education, it is important to mention that we consider as father the head of family in order to match with his son/daughter. As potential instruments for education-job mismatch, we decompose the education of the worker’s father into three components, following the same procedure applied to define our education-job mismatch variable: father’s overeducation (1 if s/he is overeducated and 0 otherwise) and father’s undereducation (1 if s/he is undereducated and 0 otherwise).Footnote 4 We applied different tests of weak instruments in order to assess the validity of the selected instruments. Weak instruments can bias point estimates and cause substantial test size distortions (Nelson & Startz, 1990; Stock & Yogo, 2005). We test the null hypothesis of having weak instruments versus the alternative of having strong instruments (Montiel Olea & Pflueger, 2013).Footnote 5 Our model is overidentified. We can test for the validity of the overidentifying restrictions, by using the Sargan test. The null hypothesis of the validity of the overidentifying restrictions is not rejected (Sargan statistic test of all instruments is 115.07 and p = 0.325). The IV model is appropriate.

Results and Discussion

Endogeneity Problem

In accordance with our strategic methodology above discussed, the effective F-statistic of Montiel-Pflueger is 46.91 (see Table 9 in Appendix). The LIML critical value is 15.49 for τ = 10%. Therefore, the weak IV test rejects the null hypothesis of weak instruments for a threshold of τ = 10%, consistent with the findings in Montiel Olea and Pflueger (2013). Table 4 includes the OLS and IV results. Years of education is positive and significantly related to earnings based on both the OLS and IV estimates. It implies that earnings increase with an increase in the years of education. The average rates of returns for an additional year of education are about 3.9% and 2.2%, respectively. This finding is lower than the average return to education in Cameroon (about 5.6%) reported by Baye (2015) or in Iran (about 6.4%) by Oryoie and Vahidmanesh (2021). However, it is important to observe that the average rate of return is higher in the OLS-based estimation than in the IV-based estimation. Although differences are found in terms of the magnitude of coefficients related to the control variables between these models, no significant difference is found in terms of their signs and statistical significance levels. This result is not in accordance with the extensive review of studies provided by Psacharopoulos and Patrinos (2004). They observed that IV-based estimates are often higher than the OLS-based estimates of returns to education. Interestingly, the coefficient for overeducation is negatively and significantly related to earnings based on both the different estimates, implying that overeducated workers earn less than well-match education workers in the wage employment. In addition, the coefficient associated with undereducation is positive and significant, implying that undereducated workers earn relatively more compared to well-match workers. This result is not unusual in the literature on return to overeducation.

Table 4 Determinants of earnings from wage employment based on OLS and IV models

More recently, Yanqiao Zheng et al. (2020) find in the China’s context that overeducation causes 5.1% less salary than that if the same job seeker was put in a position in which the education requirement matches his/her own level. By way of comparison, this finding is different from meta-analysis of Groot and Maassen van den Brink (2000) for developed countries. They estimate the average return to overeducation at 3%. Besides, this finding is in accordance with Herrera and Merceron (2013) who estimated the average penalty associated to overeducation at 9% in sub-Saharan Africa. The father’s level of education positively affects the years of education: a 1-year increase in the father’s own education duration significantly improved the duration of his child’s education by 85.2%. According to the popular proverb “like father, like son,” this means that children’s future life is strongly determined by their family and social background. Furthermore, the variable “non-self-cluster mean of education” positively and significantly influences the education demand. When the average level of education per cluster increases by a unit, the duration of education increases significantly by 15.5%. This result can indeed be attributed to peer effects to the extent that the neighbors’ behavior influences a person’s own behavior. This has to do with the fact that education makes individuals adjust to society’s dominant values and, in return, the latter cause social transformation through the transmission of new values which tend to spread across the entire society. The education level of members of a person’s neighborhood is a determining factor in his/her own socialization at school. Moreover, father’s overeducation and father’s undereducation are positive and significantly related respectively to overeducation and undereducation.

Theoretically, the age-earnings profiles are concave. They have an inverted U shape. Our findings show that the coefficient associated with age is positive and significant for the OLS model, implying that earnings increase with an increase in the owner’s age, as age may reflect their experience, too. This variable is positive and significant for OLS and other models. However, the coefficient associated with age-square is negative and significant, implying that earnings increase with an increase in the owner’s age up to a certain age, after which they start to decrease. The coefficient associated with males is positive and statistically significant. Males account for higher earnings than females in wage employment. The gender gap is an issue that has been abundantly addressed in the literature on economic discrimination. The fact that it has persisted as a topical issue despite the considerable progress made in this area is worrying from the pay equity viewpoint. The coefficients associated with public and private sectors are positive and significant for both models. Public and private sector workers earn more than those employed in informal sector. However, public sector accounts for lesser earnings than private sector in wage employment.

Results of Double Selection Model Estimates

To resolve the sample selection bias of the earnings equation, we adopted a double selection model, considering the selection of employment participation at the first stage and choice of wage employment at the second stage. We first estimated these two selection equations separately and then used a bivariate model considering their interdependency. The results of both the univariate and bivariate self-selection models are presented in Table 5. From the brief overview, the coefficient of correlation between two residuals of selection equations is positive (0.441) and significant, confirming the fact that the likelihood of being active and employed is potentially determined simultaneously with the probability of being in paid employment. This underscores the use relevance of biprobit model to control for selection bias.

Table 5 Determinants of selection of employment participation and wage employment

The coefficient associated to non-self-cluster mean activity variable used to identify the selection model is positive and significantly related to the likelihood to be in wage-employed. In fact, the presence, in his/her cluster, of a significant number of working people from households other than his/her own appears as a motivation factor of his/her participation. According to bivariate probit estimates, age is negatively related to the probability of being engaged in wage employment. Females display higher probability of workforce participation but less likelihood to be occupied than males. Similarly, people in rural areas present a greater likelihood of workforce participation, but they are less likely to choose wage employment. The marginal effect of the number of children in the care of the household’s head is positive on the probability of joining either the workforce or wage employment. People belonging to households with heavy family responsibilities are likely to search job or to participate in economic activity. Education levels are significantly related to participation in employment. In reference to illiterate people, achieving the primary school level decreases the probability of engaging in the labor force. Then, this probability starts to increase above with secondary and tertiary levels. Moreover, the level of education increases the likelihood to choose wage employment. Also, the Islamic religion significantly reduces either the likelihood to participate in workforce or choosing wage employment. Owing to the influence of ancestral and traditional beliefs in Cameroon, the Muslim religion reduced the chances of individuals, especially women, to participate in the labor force. Controlling for the above selection models, we subsequently estimate the earning equation for wage employed.

In Table 6, we observe that the correction terms, IMR for both the labor force participation, and the wage employment are positively significant, showing the importance of selection bias resolve when estimating the returns to education. Except for coefficients related to “undereducation” variable, results show the significance of coefficient associated with the control variables across different types of selection models. In addition, there is no difference in the signs of these coefficients. The coefficients associated to education, employment sector, age, sex, and experience are positive and statistically significant across the models, but with a varying values, whereas the coefficients associated to overeducation and age squared are significantly negative.

Table 6 Determinants of earnings from wage employment based on different types of selection models

Estimated Rate of Returns to (Over) Education

Even fewer studies have considered endogeneity and selection issues together, while estimating the rate of returns to different levels of education. Estimates of returns to education and education-job mismatch in respect of wage employment in Cameroon are presented in Table 7. It highlights that the average rate of returns to education is positive but differs across different selection models. The average rate of returns to education based on the IV model is much less than other estimates (OLS as well as other selection models). The average returns to overeducation and undereducation are respectively negative and positive across different models. However, the magnitude differs across different models. The wage penalty associated to overeducation changes between − 7.9 and − 18.04% while the average premium related to undereducation varies between 2.88 and 9.5%. Moreover, penalties associated to overeducation based on selection models, especially double selection, is much higher than OLS as well as other selection models. The average rate of premium associated to undereducation based on the IV model is much higher than OLS and single selection with endogenous covariates estimates; this premium is not significant for other selection models. However, these results reject the human capital and job competition theoretical predictions that only actual education is important in earning determination.

Table 7 Average rates of returns to education and mismatch in respect of wage employment

Education-job mismatch may mask unmeasured differences in abilities, which are a real predictor of earnings (Allen & Van der Velden, 2001). Individuals with lower educational attainment might earn more even when mismatched, since a market sorts graduates with the same level of education according to their abilities. Therefore, with each year of schooling, differences in earning increase or decrease between matched and mismatched jobs. The interaction between education/job mismatch and education has a significant effect on monthly earning. Indeed, with each year of education, the difference in returns to schooling decreases between overeducated and well-matched people and increases between undereducated and well-matched people. This result confirms that returns to schooling also depend on the job quality on the labor market.

Conclusion

In this paper, we have sought to the contribution of education-job mismatch to heterogeneity in returns to schooling, using employment data. The basic question we pose is whether education-job mismatch affects the return to schooling. We have a twofold objective: (i) to show the relative returns to schooling derived to education-job mismatch; (ii) to determine whether the rate of returns to schooling depends on the choice of different selection models used in the estimation process. Most of the applied studies that attempt to investigate the education-job mismatch effect on wages have been criticized for not taking into account the selection and endogeneity bias. In the literature, scholars provide different results according to used estimators. Using a rich database on employment (provided by the National Institute of Statistics for Cameroon), this paper has adopted the double selection model and suitable instruments for the two econometric problems aforementioned. It has also employed different consistent estimators in order to assess the sensibility of the return to schooling to the choice of selection models.

The main conclusions of our paper are as follows: (i) based on univariate and bivariate selection models, the coefficient associated to education is positive and statistically significant with the set of models; (ii) the average returns to overeducation and undereducation are respectively negative and positive but differ across the estimated models. Wages depend mainly on the education required level of jobs. Thus, overeducated workers suffer a wage penalty while overeducated people earn a wage premium. For this reason, we conclude that overeducation implies a waste of resources; (iii) the substantial difference was observed in the magnitude of coefficients with respect to the control variables for different types of selection model, but it is important to note that there is no significant difference in the sign and significance threshold of variables. This difference observed across different models implies that the estimation of returns to education is highly sensitive to model specification, considering the selection bias and potential endogeneity issues. Penalties associated to overeducation based on selection models, especially double selection, are higher than those of OLS as well as other selection models. The average rate of premium associated to undereducation based on the IV model is higher than OLS and single selection with endogenous covariates estimates; (iv) the estimated value of the rate of return to education varies with the type of education-job mismatch; it decreases (increases) with overeducation (undereducation).

There are some policy recommendations associated with these results. As overeducation is associated to wage penalties and persists in the period of school-to-work transition, we consider that policy-makers should seek to reduce the overeducation phenomenon on the labor market. With respect to education, many authors point to the need to reform an education system that results in some graduates never obtaining a job matching their qualifications. For instance, Green and McIntosh (2007) suggest that some workers “have acquired a ‘wrong’ type of human capital, in the sense that these qualifications are less demanded on the labor market.” Therefore, public authorities should reform the education system to provide graduates with the skills the market demands. They should focus on reducing the phenomenon then defining the level of competence that should be acquired at each level of education. Skills should be evaluated at educational institutions in the same way as education. With respect to labor market functioning, policy-makers should make an effort to promote the creation of companies that require high-skilled workers in order to promote the full use of the labor force. In short, this study highlights the need for customized policy responses such as career guidance, policies to raise job quality to tackle mismatch and specially overeducation. Better career guidance may also play an important role in reducing education-job mismatch, which varies considerably by field of study. In addition, care should be exercised when choosing a suitable model for estimating returns to education or education-job mismatch.