Introduction

Education plays a significant role in economic growth and development. One of the central topics in the economics of education is the estimation of the rate of private return to education. The rate of return has decreased in North Africa and the Middle East in recent decades, and this reduction has caused youth dissatisfaction and political instability (Arampatzi et al., 2018; Pellicer, 2018). We face a high investment in education in the MENA while we see high youth unemployment. It seems that this educational investment has a very small effect on the labor market in this region. Like other countries in the region, Iran is also faced with this paradox in its labor market. There are extensive investments in education after the 1979 revolution while the country is struggling with a high unemployment rate for decades. The estimations of returns to schooling shed light on what is going on in the labor market and education system in Iran. For this purpose, we used Household Expenditure and Income Surveys of Iran to estimate the rates of returns to schooling and investigate the spillover of education in the labor market.

Mincer (1974) provides for the first time a piece of empirical evidence for the labor market effects of education. The Mincerian equation explores the causal relationship between schooling and earning and tries to measure the effect of education on people’s well-being. In the original Mincer equation, the logarithm of income is the dependent variable, and the independent variables are the years of schooling and experience. Mincer estimates this equation using the ordinary least squares (OLS) method. In this framework, the coefficient of the years of schooling is the private rate of return to schooling. However, due to the endogeneity problem in the schooling variable, there are concerns about the true value of the return to education (Card, 2001). An individual with a higher level of ability and motivation may invest more in her education and find a better job with higher wages. This is the root of the endogeneity problem in the Mincer equation that leads to an upward bias in the OLS estimates.

To solve this problem, different solutions are proposed in the literature such as the instrumental variables (IV) and fixed-effects models. Card (1995) uses geographical proximity as an instrument to correct for the potential ability bias; Angrist and Keueger (1991) suggest the quarter of birth as an instrument. Many other studies use the instrumental variable method to study the causal relationship between schooling and earning (e.g., Butcher & Case, 1994; Duflo, 2001). In most of these studies, the IV coefficients are larger in magnitude than the OLS coefficients, while the IV coefficients are expected to be smaller than the OLS coefficients due to the ability bias. Some of these studies argue that the ability bias might be quite small and the downward bias caused by the measurement errors in schooling may crowd out the effect of ability bias. However, Card (2001) believes that it is hard to accept this argument. He discusses that the IV estimates might be biased upward in comparison to the OLS estimates. This is because some unobserved differences between the control and treatment groups, which are framed in the IV model implicitly, can cause the instruments to be invalid or weak.

If there is a genuine panel data set, then there is no need to use the instrumental variables method, and hence, there is no worry about the weakness or invalidity of the instruments. A fixed-effect model can remove all of the unobserved individual fixed effects; hence, the estimates should be unbiased. However, such a good panel data are not available in most countries. To solve this problem, Deaton (1985) proposes a new method for constructing a pseudo-panel dataset using repeated cross-sectional data sets. In this approach, some cohorts are constructed using some common time-invariant characteristics (e.g., birth-year), and these cohorts are considered as observations of the pseudo-panel (PP). This method is being progressively used in different fields of study (see, for example, Guarini et al., 2018).

This study estimates the rate of return to schooling in Iran, a country whose economy and educational system are under pressure, using the pseudo-panel approach (Oryoie & Abbasinejad, 2017). Iran has been investing extensively in its education system during the past decades. As a result of these investments, the average years of schooling has increased from just less than 4 years to more than 10 years, and the literacy rate has improved from 42% to more than 85%.Footnote 1 However, this huge improvement in educational attainment has not still translated to labor productivity. In the labor market, Iran faces a chronic high unemployment rate, especially among the youth population. The average unemployment rate is around 11.66% throughout 2000–2019.Footnote 2 The total labor force participation for ages 15–24 is also quite low. It is around 30.4% throughout 2000–2019, and it is falling during recent years.Footnote 3 This fall of labor force participation suggests that people are becoming unemployed and/or hopeless to find a job. Therefore, the level of educational attainment has increased a lot, while the labor force participation rate and the employment rate is very low in Iran. These two phenomena encourage us to find the rate of return to schooling in Iran. The policy implications of this exercise are first to learn better the relationship between education and labor market and second to help the policymakers to design better policies in the education system and labor market. Based on our results, it seems that Iran faces over-investment in education while the labor market and economy itself were not prepared to create high-skills jobs.

There are few studies on Iran’s return to schooling. Salehi-Isfahani et al. (2009) estimate the rate of return to schooling using the OLS regression in Iran, Egypt, and Turkey. But as we discussed earlier, there is a notorious endogeneity problem in the OLS method. This paper is the first study that uses a powerful unbiased method for measuring the rate in Iran. We use six Household Expenditures and Income Surveys (HEIS) conducted by the Statistical Center of Iran (SCI) between 2008 and 2013 to construct our pseudo panel data set. Our results suggest that the rate of return is about 9% and 7% based on the PP model and the OLS, respectively. In urban and rural areas, the rate of return is, respectively, 7.8% (resp. 6.4%) and 10% (resp. 7.1%) based on the PP (resp. OLS) approach.

Estimation of the Mincer equation using the PP approach has two benefits: first, the time-invariant unobserved characteristics (such as ability) are removed, and hence we can correct for the potential endogeneity. Second, as Glenn (2005) explains, one of the weaknesses of the cross-sectional data is that it does not control for age-group effects. The people of the same age groups face different conditions at different times. For example, the conditions of the labor market may change over time. Therefore, differences by age may not be completely related to age effects if we do not control for age-group effects. Pseudo panel analysis provides an opportunity to overcome this limitation of the cross-sectional data by construing cohorts based on birth year. By the same logic, we can construct cohorts based on different areas if there is considerable heterogeneity in different areas of a country. Iran is a pretty diverse country in terms of culture, religion, ethnicity, weather, geography, and socioeconomic status. Therefore, people in different areas face pretty different conditions. By constructing the cohorts based on area, we can control the area effects as well.

Methodology is described in “Methodology,” the data and the empirical results are discussed in “Data” and “Results,” respectively, a comparative analysis is provided in “Comparative Analysis,” suggestions for future studies are presented in “Suggestions for Future Studies,” and the conclusions are stated in “Conclusion.”

Methodology

This paper uses the Mincer equation to estimate the return to education as follows:

$${Ln W}_{it}=c+{\beta }_{1}{E}_{it}+{\beta }_{2}{X}_{it}+{\beta }_{3}{X}_{it}^{2}+{\theta }_{i}+{\varepsilon }_{it} ; i=1,\dots , {N}_{t} ;t=1,\dots ,T$$
(1)

where \({Ln W}_{it}\)\({E}_{it}\) and \({X}_{it}\) are, respectively, the natural logarithm of wage, the number of years of education, and the number of years of experience for individual \(i\) at time \(t\). \(T\) is the number of periods, \({\theta }_{i}\) is the unobserved individual effects, and \({N}_{t}\) is the total number of individuals at time \(t\)(Consider that the number of individuals varies from one survey to another in repeated independent cross-sections.). The coefficient \({\beta }_{1}\) measures the private return to education, so the purpose of the paper is to estimate \(\widehat{{\beta }_{1}}\) consistently. For the sake of simplicity, assume \(E\left\{{E}_{it}{\varepsilon }_{it}\right\}=0, \forall t=1,\dots ,T\).

Consider that \({\theta }_{i}\) is unobservable, and hence in practice, the error term is \({\theta }_{i}+{\varepsilon }_{it}\). If \({\theta }_{i}\) is not correlated with the independent variables, then the composite error term \({\theta }_{i}+{\varepsilon }_{it}\) is not correlated with the independent variables, and Eq. (1) can be estimated consistently using the OLS after pooling all cross sections. However, the problem is that \({\theta }_{i}\) might be correlated with the years of schooling.Footnote 4 Therefore, the composite error term is correlated with \({E}_{it}\) and, as a result, the estimated \(\widehat{{\beta }_{1}}\) is biased.

If there is a genuine panel data, \({\theta }_{i}\) can be eliminated using a fixed-effect model, but such a data set is not often available. If some repeated cross sections are available, this problem can be solved by transforming the cross sections to pseudo-panel data using some cohorts and applying a fixed-effect model (Deaton, 1985; Verbeek & Nijman, 1992; Verbeek & Nijman, 1993; Verbeek, 2008). Each cohort is a set of individuals sharing some common characteristics that do not change over time (from one survey to another) and is observable for all individuals in the sample (e.g., the birth year). As an example, a cohort might be made of all individuals born in the period 1984–1987.

Suppose there are \(T\) periods (or independent cross sections). Construct \(C\) cohorts based on birth year and the place of residence in each period. After constructing the cohorts, the mean of all variables in each cohort over the cohort members should be calculated, and the averages are used as observations. In this way, we make a pseudo-panel with repeated cohorts over \(T\) periods. This dataset can be used for estimating the following equation:

$$\overline{{Ln W}_{ct}}=k+{\beta }_{1}\overline{{E}_{ct}}+{\beta }_{2}\overline{{X}_{ct}}+{\beta }_{3}{\overline{{X}_{ct}}}^{2}+\overline{{\theta }_{ct}}+\overline{{\varepsilon }_{ct}} ; c=1, \dots , C$$
(2)

where \(\overline{{Ln W}_{ct}}=\sum_{i\in {S}_{c}}{Ln W}_{it}/{n}_{c}\), \(\overline{{X}_{ct}}=\sum_{i\in {S}_{c}}{ X}_{it}/{n}_{c}\), \(\overline{{\theta }_{ct}}=\sum_{i\in {S}_{c}}{ \theta }_{it}/{n}_{c}\), \(\overline{{\varepsilon }_{ct}}=\sum_{i\in {S}_{c}}{ \varepsilon }_{it}/{n}_{c}\), \({n}_{c}\) and \({S}_{c}\) are, respectively, the number and the set of individuals in cohort \(c\).

There are three problems with the estimation of Eq. (2): first, \(\overline{{\theta }_{ct}}\) is unobservable, so it cannot be controlled. Second, it might be correlated with \(\overline{{E}_{ct}}\), so the estimated \(\widehat{{\beta }_{1}}\) might be biased. Third, it varies over time, so it cannot be eliminated using a fixed-effect model. However, Verbeek and Nijman (1992, 1993) show that if the size of each cohort is large enough (beyond 100 observations per cohort), then \(\overline{{\theta }_{ct}}\) is approximately equal to \(\overline{{\theta }_{c}}\), and the coefficients of Eq. (2) can be estimated consistently using a within estimator (fixed-effect model). The within estimator is given by

$$\widehat{\beta }={\left(\sum_{c=1}^{C}\sum_{t=1}^{T}(\overline{{Z}_{ct}}-\overline{{Z}_{c}}){(\overline{{Z}_{ct}}-\overline{{Z}_{c}})}^{^{\prime}}\right)}^{-1}\sum_{c=1}^{C}\sum_{t=1}^{T}(\overline{{Z}_{ct}}-\overline{{Z}_{c}}) (\overline{{Ln W}_{ct}} -\overline{{Ln W}_{c}} )$$
(3)

where \(\widehat{\beta }\) denotes the vector of coefficients and \(\overline{{Z}_{ct}}\) shows the vector of independent variables for cohort \(c\) at time \(t\), \(\overline{{Z}_{c}}=\sum_{t=1}^{T}\overline{{Z}_{ct}}/T\), and \(\overline{{Ln W}_{c}}=\sum_{t=1}^{T}\overline{{Ln W}_{ct}}/T\). Consider that if cohort sizes are not equal, then the error terms \(\overline{{\varepsilon }_{ct}}\) in Eq. (2) will be heteroskedastic. For solving this problem, Deaton (1985) suggests correcting the heteroskedasticity by weighting each cohort with the square root of the cohort size. Since cohort sizes vary considerably in this study, each cohort is weighted by the square root of the cohort size.

Data

We construct the pseudo panel data using six rounds of the HEIS conducted by the SCI between 2008 and 2013. The HEIS has been collecting annually since 1963. The surveys are nationally representative in both urban and rural areas. These surveys contain information on consumption expenditures, income, household demographics, schooling, employment, and asset ownership. The number of households in the sample increases gradually every year (e.g., there are 5689, 21,950, and 38,252 households, respectively, in 1985, 1997, and 2015). Those surveys that are conducted over the period 2014–2017 are not included in this analysis, because the definition of education has been changed in those surveys by the Statistical Center of Iran. From 2014 onward, we can only observe the level of education of the respondents (e.g., elementary, secondary), and hence, the number of years of education is not computable.

The cross-sectional data sets are employed to build pseudo-panel data based on cohort means for the individuals who are born in 1951–1983. Therefore, this analysis considers only the individuals who are 25–62 years old. The reason that this age range is chosen is that it leads to some extent to a balanced panel dataset. Women are excluded from the sample to avoid the selection bias problem, because women’s labor force participation rate is quite low in Iran. Also, the sample is restricted to the wage income earners, that is, those who work either in public or private sectors.

The cohorts are constructed by the place of residence (see Appendix 1) and the year of birth (4-year and 3-year birth cohorts). Our sample is limited to the cohorts with more than 100 observationsFootnote 5, because we want to employ the asymptotic theory explained in “Methodology” that is based on the large number of observations per cohort. In this way, 4-year cohorts generate 222 cohorts with an average cohort size of 364 individuals, and 3-year cohorts generate 246 cohorts with an average cohort size of 309 individuals. We also build two distinct pseudo panel data sets for rural and urban areas based on the same criteria as above for further investigations.

The main variables of interest are annual gross wage earnings, age, and the number of years of schooling. Non-monetary incomesFootnote 6 are also added to the gross wage. The wages are deflated by the Consumer Price Index.Footnote 7 The number of years of schooling is made from a question that asks about the level of education. This variable varies from zero (for illiterate) to 24 (for Ph.D. degree). The descriptive statistics of the variables are reported in Appendix 5.

The individuals that come from high-income family groups might be more successful in finding high-salary jobs than those from low-income family groups. Therefore, the economic class of the household may be very important, and it is added to the regressions. We define this variable based on the relative income lines. The total consumption expenditures are usually used as a proxy for total income, because consumption expenditures are usually considered as the best measure of living standards (Deaton, 1997). Each cohort is classified into three classes which are poor, middle, and rich, respectively, if the mean of the total expenditures in the cohort is below 75% of the median, between 75 and 125% of the median, and above 125% of the median.

Results

The results are presented in this section. In “Pooled Cross Section,” the Mincer equation is estimated using a simple OLS method when all cross-sectional data are pooled. The estimated return from this Pooled OLS (POLS) is considered as a benchmark. In “Pseudo Panel Data,” the return is estimated using the PP model, and finally, the return is estimated for urban and rural areas in “Disaggregation by Location.”

Pooled Cross Section

In this section, the rate of return to education is estimated using the POLS during the period 2008–2013. Table 1 reports the estimates for all men born between 1951 and 1983. Unobservable local and time fixed effects are controlled in some of the regressions. Under the odd columns, the coefficients of the simple standard (traditional) Mincer equation are reported. Under the even columns, we include squared years of education into the simple Mincer equation to capture nonlinear behaviors of education. The relationship between the logarithm of wage and the years of schooling is supposed to be linear in the simple Mincer equation; however, many studies show that the wage-schooling relationship might be nonlinear (e.g., Hungerford & Solon, 1987). It seems that the logarithm of wage is often an increasing convex function of schooling. This fact is generally explained by an increase in the relative demand for high-skilled labor. Mincer (1996) considers that higher-educated individuals generally work in high-skilled jobs, and lower-educated individuals are expected to work in low-skill jobs. Therefore, if relative demand for schooling increases more than the relative supply, the marginal return to education increases for higher-educated workers relative to lower-educated workers. Since the marginal return is the slope of the wage-education relationship, an increase in the marginal return leads to a convex wage-schooling relationship. In fact, the odd columns estimate (variables were introduced in “Methodology”):

Table 1 Regression of logarithm of annual wage on schooling, OLS model, pooled all cross sections
$${Ln W}_{it}=c+{\beta }_{1}{E}_{it}+{\beta }_{2}{X}_{it}+{\beta }_{3}{X}_{it}^{2}+{\varepsilon }_{it} ; \ i=1,\dots , {N}_{t} ; \ t=1,\dots ,T$$
(4)

and the even columns estimate

$${Ln W}_{it}=c+{\beta }_{1}{E}_{it}+{\lambda }_{1}{E}_{it}^{2}+{\beta }_{2}{X}_{it}+{\beta }_{3}{X}_{it}^{2}+{\varepsilon }_{it} ; \ i=1,\dots , {N}_{t} ; \ t=1,\dots ,T$$
(5)

In Eq. (4), the marginal effect of education is measured by \({\beta }_{1}\),Footnote 8 and in Eq. (5), the marginal effect (ME) is measured by \({\beta }_{1}+2{\lambda }_{1}{E}_{it}\), because

$$\frac{d\left(Ln {W}_{it}\right)}{d({E}_{it})}=\frac{\frac{d{W}_{it}}{{W}_{it}}}{d({E}_{it})}={\beta }_{1}+2{\lambda }_{1}{E}_{it}$$
(6)

We replace \({E}_{it}\) with its mean over all individuals and years to calculate the mean of the ME. The ME multiplied by 100 shows how much, on average, an individual’s wage increases (in percentage) when the number of years of education increases by one unitFootnote 9; in other words, it shows the average return to education. The MEs are reported on the seventh row. As can be seen, the average return to schooling is around 7% during the period 2008–2013. This estimate is considered as a benchmark for future comparisons in the next section.

Pseudo Panel Data

In this section, the return to education is estimated using a fixed-effect model based on the pseudo-panel data during the period 2006–2013. As we discussed in “Methodology,” the model eliminates all of the unobserved heterogeneities and therefore corrects the endogeneity problem of the Mincer equation. The results are shown in Table 2. Unobservable time effects are controlled in all regressions. A joint F test is done to check whether the dummies for all years are statistically equal to zero or not. If they are, then no time effects are needed. The null hypothesis is rejected at the 1% level in all regressions, so it is necessary to control the time effects. The first four columns estimate the Mincer equation based on 3-year cohorts. To check the sensitivity of our results to cohort size, we estimate the equation based on 4-year cohorts as well, where the relevant results can be seen under the last four columns. We also estimated the rate of return based on 2-year cohorts and got pretty similar results.

Table 2 Regression of logarithm of annual wage on schooling, fixed-effects model, and pseudo panel data

As is shown in Table 2, the return to schooling is about 9% in Iran, and the results are robust in all specifications. Compare them with the OLS estimates in “Pooled Cross Section.” As can be seen, the OLS estimates are smaller than the pseudo-panel fixed-effect (PPFE) estimates by around two percentage points. Interestingly, the estimated return increases after removing the unobserved heterogeneities. The magnitude of the change is significant. Therefore, we face a downward bias if we do not control for the unobserved individual effects. This result is contrary to the ability bias argument that predicts an upward bias in an OLS model. Many other papers have also found a similar pattern of change from an OLS model to an IV model. That is, their estimated return using an IV model is larger in magnitude than their estimated return using an OLS model. We already explained that the issue emerges due to the measurement error problem and/or the weak/invalid instrument problem (Angrist & Keueger, 1991; Card, 1995; Butcher & Case, 1994). An advantage of the PPFE model is that it does not face the weak/invalid instrument problem. In addition, Deaton (1985) explains that when the cohort size is large enough, the measurement errors can be ignored. Since the cohort sizes are sufficiently large in this research, we are not worried about the measurement error too. As far as there is a good variation inside the cohorts (cohort sizes are large), the potential coefficient bias is minimized (Verbeek & Nijman, 1992). Therefore, we are confident that our estimates are unbiased.

In addition to estimating the return to education, we can also estimate the effect of the socioeconomic status of the individuals on their wage based on columns 3, 4, 7, and 8. For this purpose, we use columns 3 and 4 since they are estimated based on 3-year cohorts in which cohort sizes are larger than the other columns. Suppose \({\beta }_{M}\) and \({\beta }_{R}\) denote respectively the estimated coefficients of the middle and rich cohorts, and \({\overline{W}}_{P}\), \({\overline{W}}_{M}\), and \({\overline{W}}_{R}\) stand, respectively, for the mean of wage in the poor, middle, and rich cohorts, then it can be easily shown that

$$100\times \frac{{\overline{W}}_{M}-{\overline{W}}_{P} }{{\overline{W}}_{P}}=100\times ({e}^{{\beta }_{M}}-1)$$
(7)

and

$$100\times \frac{{\overline{W}}_{R}-{\overline{W}}_{M} }{{\overline{W}}_{M}}=100\times ({e}^{{{\beta }_{R}-\beta }_{M}}-1)$$
(8)

Based on Eqs. (7) and (8) and 3-year cohorts, if an individual is transferred from a poor family to a middle-class family, her wage rises by about 6%, and if she is transferred to a rich family, her wage increases by about 8%.

In summary, this section concludes that the rate of return to schooling for males aged 25–62 years old is around 9% and this result is more reliable than the simple OLS; because the pseudo panel fixed-effects model eliminates all of the unobserved time-invariant characteristics of the individuals, hence, we are confident at least that this approach corrects for the bias caused by the omitted variable problem.

Disaggregation by Location

This section estimates the rate of return to schooling in urban and rural areas to check the robustness of the results that we got in the previous section and to see the differences between the urban and rural areas. The PPFE (based on 4-year cohorts) and POLS estimates are presented in Table 3. Unobservable time effects are controlled in all regressions. A joint F test is done to check whether the dummies for all years are equal to zero or not. The null hypothesis is rejected at the 1% level in all regressions, so we control the time effects in all regressions.

Table 3 Regression of logarithm of annual wage on schooling, OLS, and fixed-effect model, urban and rural areas

As can be seen, the estimated return in urban areas (resp. rural areas) is 7.8% (resp. 10%) based on the PPFE estimators, while the return is about 6.4% (resp. 7.1%) based on the OLS estimators. We again see a gap between the OLS and the PPFE estimates. The estimated return in rural areas is slightly larger, while one may expect a higher rate of return in urban areas because of more and better educational and job opportunities in urban areas in comparison with rural areas. This small gap might be due to this fact that the rate of return to education is decreasing at higher levels of education (the number of years of education is around 6.5 years in rural areas, and this number is around 9.8 years in urban areas), or it might be because measurement error problem.

In addition to urban/rural criteria, Warunsiri and McNown (2010) disaggregate their sample based on gender and marital status as well. However, we could not disaggregate our sample due to some data restrictions. As we explained in “Data,” women’s labor force participation rate is quite low in Iran which may cause the selection bias problem. The female labor force participation rate was around 17.3% in 2013 and 21.2% in 2008 for women between 20 and 55 years of age (Salehi-Isfahani et al., 2009). In addition, if cohorts are constructed based on marital status, there will be many cohorts with very small sizes due to heterogeneity in the marital status of the individuals. In our sample (males born in 1951–1983), about 75% of the individuals are married, about 0.3 and 0.5% are widowed and divorced, and 24% are never married. Thus, the distribution of the marital status is very heterogeneous, and we face many small cohort sizes if we construct cohorts based on marital status.

Comparative Analysis

This section compares the rate of return to education in Iran with other countries (for more comparisons, see, Psacharopoulos & Patrinos, 2004). Table 4 summarizes the comparisons. Most studies use either OLS or IV methods to estimate the private return to schooling. As can be seen, there are only two other studies that employ the pseudo-panel approach in addition to this study. The IV estimates are generally larger in magnitude than the OLS estimates. Our results indicate that the return is around 7% (9%) based on the POLS (the PPFE) model in Iran between 2008 and 2013.

Table 4 Return to education, cross-country comparison

Montenegro and Patrinos (2013) use about 545 harmonized household surveys from 98 countries to estimate the return to schooling in a comparable way around the world between 1970 and 2011. They estimate the traditional Mincer equation using the OLS method, which may suffer from the endogeneity problem. They show that the rate of return is around 10.4% on average, and the rate is on average around 10% in high-income countries. From a regional perspective, they show that the rate is around 5.6% in the Middle East and North Africa (MENA) between 2000 and 2011.

Salehi et al. (2009) estimate the rate of return to schooling in Iran in 2006 around 7.6% using the OLS method. Our estimates are 8.8% and 9% using, respectively, 3-year and 4-year cohorts suggesting a downward bias in Salehi’s study. In comparison with other countries, as can be seen in Table 4, most of the studies use either OLS or IV methods to estimate the rate of return to schooling. There are only two papers that use the PP approach. Therefore, we compare our results only with these two studies. The results suggest that Iran’s rate of return to schooling is larger than that of Sri Lanka and smaller than that of Thailand. The rate of return to education is determined by many economic, social, and political factors. To explore the causal relationship between the factors and the return to education, we need to employ the data sets of the countries and control the economic, social, and political factors in our regressions. However, we do not have access to such data at this point of time. Therefore, we cannot say why the rate of return in Iran is different from other countries. It is left for future studies.

Suggestions for Future Studies

We have two suggestions for further research on this topic. First, let us denote the total number of individuals by \(N\), the number of periods by \(T\), the number of cohorts by \(C\), and the number of observations in cohort \(c\) by \({n}_{c}\). We explained that if the cohorts are constructed such that the number of observations in every cohort is above 100 (\({n}_{c}\to \infty\)), then we get unbiased estimates. In addition to this method, the cohorts can also be constructed in three other different ways (Verbeek, 2008):

  1. 1.

    Both \(N\) and \(C\) go to infinity, and \({n}_{c}\) is remained constant.

  2. 2.

    \(T\) goes to infinity, with \(N\) and \(C\) kept constant.

  3. 3.

    Both \(T\) and \({n}_{c}\) goes to infinity (McKenzie, 2004).

It might be interesting to see how the results change in each approach mentioned above. But the problem is that there are no specific criteria for constructing cohorts based on these approaches. For example, we still do not know how large \(N\) must be when we say \(N\) goes to infinity.

Second, we discussed that the best method for estimating the rate of return to education, in the absence of a genuine panel data set, is to use the pseudo-panel method. If there is a genuine panel data set, we can get better results. There are two panel data sets in Iran that are not publicly accessible. First, there is a comprehensive panel data set about income, job, family background, and education in the Ministry of Cooperatives, Labor, and Social Welfare. Second, there are two comprehensive panel data sets about educational attainment of Iranians in the ministries of education and higher education. If researchers are allowed to get access to these data sets, we can have much more precise estimates using panel data methods.

Conclusion

This study was one of the few studies in the literature of the economics of education in Iran. It estimated the private return to schooling using a pseudo-panel fixed-effect model. The point of the model is that it removes all of the unobserved heterogeneities, and the estimated coefficients are unbiased if the cohort sizes are large enough. Since our cohort sizes were sufficiently large, we expected to derive unbiased estimates for the coefficients of the years of schooling. Our results showed that the rate of return was around 9% in Iran. That is, it is expected that each individual’s wage increases on average by about 9% for each additional year of education. The rate of return in urban and rural areas was around 7.8 and 10%, respectively.

This study provided the correct estimates of the return to schooling for policymakers in Iran. We now know that each additional year of schooling increases an individual’s earning by 9% on average. This is not a large number given the vast investment in Iran’s educational system within the past four decades. It seems that education does not transform into human capital and job skills well. Most of the investment in education in Iran is in terms of providing schools for kids in remote areas. Iran needs more investment in the quality of education to improve human capital accumulation which later on shows itself in the labor market. Besides, we think that the Education Ministry needs to boost the job skills of students during primary/secondary education. This training can improve labor productivity and hence increase the rates of return to schooling.