1 Introduction

One of the most striking, and still largely unexplained, facts about female labor supply in the developed countries is its heterogeneity across households, and indeed across countries. In many OECD countries, on average around one third of partnered women work full time in the labor force, one third do various amounts of part-time work, and one third work solely in household production. Very little of the aggregate heterogeneity across all households in any one country is explained by wage rate differences and by the number of children present in the household. Moreover, the correlation between female labor supply and fertility across these countries is strongly positive, even though historically, in any one country, there has been an inverse relationship between them.

Some insight is gained by organizing the data in terms of life cycle phases based on the number and age of children in the household. In the pre-children phase, there is very little difference between male and female labor supply distributions. This changes dramatically when children arrive, and this is when the heterogeneity in female labor supply essentially sets in. Though there is a trend of return to the labor force over subsequent phases of the life cycle as the children reach school age and beyond, the basic pattern of heterogeneity persists. Such findings suggest that for the theoretical and empirical analysis of female labor supply it is fruitful to focus on the life cycle phase in which households have young children.

The dramatic change in female labor supply with the birth of the first child reflects the additional work choice created by that event. At least one parent, typically the mother, can choose between working at home providing her own child care or working in the market and buying in care from formal care providers, such as kindergartens and child care centers, or by engaging other care givers, including relatives and friends. The importance of the availability and cost of child care for the labor supply of mothers with young children has been confirmed by theoretical (Apps and Rees 2009) and empirical studies, including Ribar (1995), Blau (2003), Connelly and Kimmel (2003), Doiron and Kalb (2005), Kalenkoski et al. (2005), Kornstad and Thoresen (2007), Baker et al. (2008), and Blundell and Shephard (2012).

The current paper presents a structural discrete choice model of the time allocation choices of partnered mothers with pre-school aged children. The main advantage of the discrete choice approach is that it can account for the non-convex nature of the household budget sets. Within this model, we analyze the decisions of mothers on hours of market work, time spent on child care and domestic work, and hours of formal child care. The main goal is to assess the sensitivity of choices at the intensive and extensive margin of female labor supply and to capture underlying substitution patterns between the alternative uses of maternal time.

Similar models are employed by Doiron and Kalb (2005), Kornstad and Thoresen (2007), and Blundell and Shephard (2012). We allow for a more flexible household utility function than previous studies (following Van Soest 1995; Kabátek et al. 2014) and include both formal child care and maternal care in the utility function. Bought-in child care can be incorporated in two ways—either indirectly, subtracting child care costs from disposable household income (Doiron and Kalb 2005; Kornstad and Thoresen 2007), or directly, with the hours of child care taken as an additional argument of the utility function (Ribar 1995; Bernal 2008). We follow the direct approach, implying that formal child care and maternal care can be imperfect substitutes, with their own effects on household utility.

An important aspect of our empirical model is that we incorporate unobserved heterogeneity in the flexible form of latent classes, following Train (2008) and Pacifico (2012). We thus extend the treatment of unobserved heterogeneity beyond the traditional framework of random coefficient models,Footnote 1 avoiding restrictive assumptions on the distribution of the population parameters of the utility function, which we show has a pronounced effect on estimated labor supply elasticities.Footnote 2

The model is estimated on data drawn from the Household, Income and Labour Dynamics in Australia (HILDA) survey which provides detailed information on time use and child care use and corresponding prices. Simulations based on the estimated parameters show that the time allocations of partnered mothers with pre-school children are highly sensitive to changes in wages and the cost of child care. A policy simulation also suggests that lowering effective tax rates faced by partnered mothers as second earners, by switching from joint to individual taxation, would lead to a substantial increase in their labor force participation and hours of work.

The paper is organized as follows. In the next section we set out the underlying theoretical model. In Sect. 3 we present the econometric specification that we take to the data. Section 4 discusses our dataset and Sect. 5 presents parameter estimates. Section 6 reports the results of policy simulations. Section 7 concludes.

2 Economic model

We construct a one period model of mother’s time use and child care decisions during the preschool phase of the life cycle of a two-parent family. The time-use decisions of the father, taken to be the “primary” earner, are treated as exogenous.

By treating father’s choices as exogenous, we depart from the assumption of full Pareto efficiency that underlies for example the collective model. There is, however, a growing literature that seeks to relax this assumption, for example that based on non-cooperative rather than cooperative household equilibria. The assumption of exogenous male choices would seem to us to be an acceptable approximation in the light of the results of time-use studies showing relatively little variation in male time choices in the early child rearing years, with the vast majority working full time.

In a one period model, potentially important intertemporal effects, such as the anticipated loss of future human capital and employment possibilities from reducing current labor supply, cannot be incorporated explicitly. A mother may continue to work throughout the preschool phase despite facing a very low net wage or negative net earnings after child care costs, as an investment in her long-term career prospects. We can, however, partially capture these effects in a reduced form sense, through their impact on the marginal utility of market work vis á vis leisure or home child care and domestic work. We also take the number of children in the household as exogenous and therefore do not model fertility decisions.

Household \(h=1,2,\ldots ,H,\) chooses:

  • its consumption of a market good \(x_{ih},\) with \( i=1,2,\ldots ,n\) denoting the individuals within the household;

  • the mother’s leisure consumption \(l_{2h}\);

  • consumption of a composite household good, \(y_{h},\) representing child care and domestic work;

  • the mother’s time input to the production of the household good, \(t_{2h}^{y}\);

  • purchases of the market child care good \(m_{h}^{c}.\)

Consumption is a composite market good with price 1, the mother’s gross wage rate is \(w_{2h},\) and the price of the market child care good is \(p_{h}^{c}\), which varies across households.Footnote 3 The father’s leisure and time allocation to household production are taken to be exogenous and therefore denoted by \( \hat{l}_{1h}\) and \(\hat{t}_{1h}^{y}\). Given the time endowment constraint, his market labor supply, \(L_{1h}=\hat{L}_{1h}\), is also exogenous. The sum of the parents’ gross incomes from market supply, \( \sum _{i}w_{ih}L_{ih}\), is denoted by \(I_{h}(w_{1h},w_{2h}).\) Their utility functions are \(u_{ih}(x_{ih},y_{h},l_{ih}),i=1,2\). The remaining utilities \(u_{ih}(x_{ih},y_{h}),i=3,\ldots ,n\) correspond to children.

The household is assumed to maximize a unitary household welfare function,Footnote 4 concave in utilities,

$$\begin{aligned} W_{h}=\Psi _{h}(u_{1h}(.),\ldots ,u_{nh}(.);\mathbf {e}_{h})\quad h=1,2,\ldots ,H \end{aligned}$$
(1)

where \(\mathbf {e}_{h}\) is a vector of exogenously given “environmental” or “distributional” factors which can be interpreted as determining the household’s preferences over the utility profiles of its members.Footnote 5 This function is based upon some household choice process which need not be further specified.Footnote 6

The household’s budget constraint can be written as

$$\begin{aligned} \sum _{i}x_{ih}+p_{h}^{c}m_{h}^{c}\le I_{h}(w_{1h},w_{2h})-T\left( I_{h}(w_{1h},w_{2h}),p_{h}^{c}m_{h}^{c};n,\ldots \right) \quad h=1,2,\ldots ,H \end{aligned}$$
(2)

where T(.) is a tax-benefit function which may contain as arguments demographic variables as well as gross incomes and expenditure on bought-in child care.Footnote 7

The technology of the household production of \(y_{h}\) is expressed by the function

$$\begin{aligned} y_{h}=g_{h}\left( \hat{t}_{1h}^{y},t_{2h}^{y}\right) \quad h=1,2,\ldots ,H \end{aligned}$$
(3)

and there is a time constraint

$$\begin{aligned} l_{2h}+t_{2h}^{y}+L_{2h}=T \end{aligned}$$
(4)

where T is a given time endowment. Because we will be adopting a discrete optimization approach, directly comparing values of the household welfare function at all choice opportunities (see Van Soest 1995), we do not need to impose conditions of convexity or even differentiability on the function in (3). Thus the household can be thought of as choosing the variables \( l_{2h},\) \(t_{2h}^{y}\) and\(\ m_{h}^{c}\) that determine consumptions, market labor supplies and income via the constraints (2)–(4) in such a way as to yield a global maximum of the function \(\Psi _{h}(.).\) We can obtain a reduced form of this function by substituting from (2)–(4) into (1) to obtain a utility function that depends on these three choice variables as well as net household income Y. This then forms the basis for the empirical model specification.

3 Econometric specification

In order to specify a discrete choice model, we restrict the values of the three choice variables, the mother’s labor supply, \(L_{2h}\), her time allocated to household production, \(t_{2h}^{y}\) , and the hours of bought-in child care, \(m{_{h}^{c}}\), to take one of five possible values which can be characterized as “low”, “low-medium”, “medium”, “high-medium” and “high” according to their observed distributions.Footnote 8 The five values of each variable yields a grid of \( 5^{3}=125\) possible discrete choice points from which the household can choose its optimal allocation. The only restriction we impose on the household-specific choice set is that we exclude alternatives which would imply bought-in child care costs exceed family income. This restriction applies mainly to households with the lowest disposable incomes and long hours of formal care.Footnote 9

Dropping the household subscript, we specify for the purposes of our model the vector \({\mu }=[l_{2},t_{2}^{y},m^{c},Y]\). The leisure variable, \(l_{2}\), is the residual of the daily time constraint in (4) with \(T=24\). The mother’s household production time, \( t_{2}^{y}\), is computed as the sum of hours allocated to child care and to other home production activities, many of which may simultaneously include child care.Footnote 10 Net household income, Y, is calculated as gross income net of taxes, family tax benefits and expenditure on child care. Gross income is the sum of each partner’s earnings and the family’s non-labor income. Since household income does not include the implicit value of household production it does not depend on \(t_{2}^{y}\). There are therefore 25 possible values of net household income for each household, corresponding to combinations of the five choices of \(L_{2}\) and the five choices of \(m^{c}\).

The mother’s gross earnings are calculated as the product of her gross wage and hours of market work. Unobserved wages are predicted by a Heckman selection model (Heckman 1979), with the exclusion restrictions being number of children in the household and the sum of husband’s income and family non-wage income.Footnote 11 Expenditure on child care is calculated as the product of a household-specific child care price and the household’s choice of formal child care hours. To account for families who do not use formal child care (and therefore do not report a corresponding price), we follow Connelly (1992) and use a predicted price derived from a Heckman selection model with the exclusion restrictions being number of adults in the household (excluding spouses) and distance from grandparents.Footnote 12 Sample selection criteria and regression results for both selection models are presented in “Appendix 1”.

3.1 Baseline model without unobserved heterogeneity

We first present the model without unobserved heterogeneity. We take a reduced form of the household welfare function introduced in the previous section, specified as a flexible quadratic function

$$\begin{aligned} \Psi ({\mu })={\mu }^{\prime }\mathbf {A}{\mu }+\mathbf {b} ^{\prime }{\mu }\,\, \end{aligned}$$
(5)

where \(\mathbf {A}\) is a symmetric \(4\times 4\) coefficient matrix, and \( \mathbf {b}\) is a 4-component vector. The first three components of \(\mathbf {b }\), corresponding to the time-use variables \(l_{2},t_{2}^{y},m^{c},\) are defined as

$$\begin{aligned} b_{j}=\sum _{k=1}^{K}\beta _{kj}X_{k},\quad j=1,\ldots ,3 \end{aligned}$$
(6)

where the \(X_{k}\) denote, respectively, a constant term and variables representing observed household characteristics: wife’s age; wife’s age squared; number of pre-school age children; number of school-age children; and hours of informal child care provided by relatives, friends or the husband. These represent sources of observed heterogeneity. The elements of the matrix \(\mathbf {A}\) as well as the component \(b_{4}\) are assumed the same for all households.Footnote 13

The household welfare function in reduced form does not explicitly separate the parameters of the household production function, the utility functions of the household members, or the household process which combines the utilities of the members. This should be kept in mind when interpreting the parameters. For example, the partial derivative of \(\Psi ( \mathbf {.})\) with respect to \(t_{2}^{y}\) is the marginal change in household welfare with the other components of (\({\mu }\)) (that is, (\(l_{2}, m^c\)) and Y) held constant, that is, when an hour of market work is replaced by an hour of work at home without changing income. This captures the (positive) effect of additional home production as well as the potential (positive or negative) effect of a higher or lower preference for home rather than market work, not accounting for the value of home production or the wage for market work. Differences in \(b_{1}\) across households may therefore reflect either differences in productivity in household production or differences in preferences, or both. Conceptually, these are of course two quite distinct sources of heterogeneity, but they cannot be separately identified in the available data.

We introduce randomness in the value of the household welfare function at each possible choice point \((l_{2},t_{2}^{y},m^{c},Y)\) by specifying:

$$\begin{aligned} \Psi _{r}=\Psi (\mathbf {.})+\varepsilon _{r},\quad r=1,2,\ldots ,125 \end{aligned}$$
(7)

We can rationalize these errors as being errors of optimization or as being due to unobserved alternative specific characteristics that make each alternative more or less attractive than predicted by the systematic part. They can be due to factors that make a specific alternative more (less) attractive because of high (low) productivity or other, possibly preference-related, factors. The \(\varepsilon _{r}\) are assumed to be independent of each other and identically distributed and to follow the Type 1 Extreme Value Distribution. This implies that the conditional probability that point \(r^{*}\) is chosen as the optimal point is

$$\begin{aligned} P\left[ \Psi _{r^{*}}>\Psi _{r},\,\,\forall r\ne r^{*}\mid {\mu }, \mathbf {A,b}\right] =\frac{\exp \Psi ({\mu }_{r^{*}},\mathbf { A,b)}}{\sum _{r=1}^{125}\exp \Psi (\mathbf {\mu }_{r},\mathbf {A,b)}} \end{aligned}$$
(8)

Finally, to guarantee that household welfare always increases with household income (an assumption which is needed for economic interpretation of the estimates), we penalize the likelihood when necessary by adding points inside the budget frontier as additional choices that are never chosen by the household.Footnote 14

3.2 Unobserved heterogeneity

It is likely that different households within the selected sample of families with young children have different unobserved attributes, for example in human and physical capital, which may impact on home productivity, measured, for example, by child outcomes. There may also be unobserved variation in the quality of market child care. Unobserved heterogeneity, whether in home productivity, in market child care or in preferences, is captured by the specification of error terms \(\varepsilon _{r}\) in the model as interdependent across alternatives. This contrasts with the basic model in which the errors are alternative-specific, which implies independence of irrelevant alternatives.

Several alternative approaches have been developed to allow for unobserved heterogeneity in the context of discrete choice labor supply models. The most prominent one is the parametric random coefficients model (see Van Soest 1995; or Keane and Moffitt 1998). This method has been criticized for the restrictive assumptions imposed on the distribution of stochastic terms (see Burda et al. 2008; Train 2008; Pacifico 2012). The distributions are predominantly assumed to be multivariate normal or log-normal, which implies that the corresponding density of parameter values is unimodal, that is, it has one peak characterizing the most frequent household welfare function. The restrictiveness of the unimodality assumption is well documented in Burda et al. (2008) who show that the standard random coefficients models perform poorly when the distribution of unobserved heterogeneity has multiple modes. This is not well captured by standard models, rendering the resulting preference ordering too uniform. This issue is of particular importance for our analysis, because previous theoretical work (Apps and Rees 2009) suggests that multimodal parameter distributions might well be present in the context of female labor supply.

A small body of literature on female labor supply allows for more flexible treatment of unobserved heterogeneity (Bernal 2008; Blau and Hagy 1998 and Tekin 2007). These studies draw on Heckman and Singer (1984) and Mroz (1999), using a step function to model the unknown distribution of the key random coefficient. This random coefficient therefore follows a discrete distribution and if enough mass points are allowed for, this distribution is very flexible and can approximate any underlying distribution.

The latent-class model can be seen as a tractable generalization of this approach, allowing for flexible discrete distributions of all parameters of the utility function. The underlying assumption is that the population consists of a number of different homogeneous populations (or classes) \(K_{c},c=1,\ldots ,C\), characterized by utility functions with parameters \(\mathbf {A}_{c},\mathbf {b}_{c}\) (see Train 2008). The parameterization is class-specific, implying that the probability mass is assigned to the whole set of parameters. This allows individual random coefficients to be correlated, although the correlation structure is not explicitly modeled.

Given the probability \(P(h\in K_{c})\) that a household \( h=1,\ldots ,H\) is in the class \(K_{c},c=1,\ldots ,C,\) and writing the probability that point \(r^{*}\) is chosen by this household as

$$\begin{aligned} P[\Psi _{r^{*}}>\Psi _{r},\, \, \forall r\ne r^{*}\mid {\mu } ,\mathbf {A}_{c},\mathbf {b}_{c},\mathbf {X}] = \frac{\exp \Psi ({\mu }_{r^{*}},\mathbf {A}_{c},\mathbf {b}_{c},\mathbf {X)}}{ \sum _{r=1}^{125}\exp \Psi ({\mu }_{r},\mathbf {A}_{c},\mathbf {b}_{c}, \mathbf {X)}} \end{aligned}$$
(9)

the unconditional probability that alternative \(r^{*}\) is chosen by household h is

$$\begin{aligned} \sum _{c=1}^{C}P(h\in K_{c})\times P[\Psi _{r^{*}}>\Psi _{r},\,\,\forall r\ne r^{*}\mid {\mu },\mathbf {A}_{c},\mathbf {b}_{c},\mathbf {X} ]\, \, ,\, \, c=1,\ldots ,C \end{aligned}$$
(10)

Allowing for multiple latent classes makes the model more difficult to estimate, with the traditional maximum likelihood optimization methods often failing to converge. Train (2008) and Pacifico (2012) show that in such cases we can take advantage of the well-known EM algorithm. This estimation procedure is considerably faster and more stable than the traditional methods, which makes it feasible to estimate flexible models even with a large number of latent classes.

4 Data

The HILDA survey provides data on a wide range of variables for a representative sample (17,000 respondents) of the Australian population interviewed annually since the year 2001. Particularly relevant to this study are the detailed data on time use and cost and utilization of formal and informal child care.

Mothers with pre-school aged children represent only a small fraction of each wave of the HILDA survey. To increase sample size, we construct a pooled cross-section using the four consecutive waves of HILDA from 2005 to 2008. From each wave we select partnered mothers with pre-school children. We exclude couples in which a partner is disabled, retired, or a full-time student, the husband is unemployed or the family lives in a multi-family household. We also exclude records with incomplete or implausible survey responses (usually on the relevant time-use variables).

The final sample contains 1465 records. Descriptive statistics for the dependent variables and the socioeconomic and demographic characteristics entering as independent variables in X are reported in Table 1. To enable comparisons by gender, the table also includes descriptive statistics for male wage rates and time use.

Table 1 Summary statistics, sample of couples with preschool children

On average, parents of pre-school children are in their early thirties, with the father around two years older than the mother. Only 56 % of mothers in the sample are employed and, as we would expect, market hours distributions differ dramatically by gender, as shown in Fig. 1. The result is a gap of over 30 h per week between average female and male labor supplies. The vast majority of men work full-time (more than 35 h per weekFootnote 15), while women have a distribution of market hours that is relatively uniform apart from a large spike at zero hours. There are 83 mothers who report working more than 18 h a day for seven days a week.Footnote 16 In these cases we scale down the reported hours to satisfy a time constraint of 18 h of market work and housework per day while retaining the same relative time allocations as in the original data.Footnote 17

Fig. 1
figure 1

Distribution of weekly hours of market work in families with preschool children

Figure 2 compares hours spent on household production activities by gender. As noted above, household production is defined to include the allocation of time to activities involving direct interaction with children, such as “playing with your children”, and domestic work, much of which may also involve supervision of children aged 0–4. As we would expect, household production hours are higher for females than for males, as shown in Fig. 3, and their leisure hoursFootnote 18 are more dispersed, with substantially higher frequencies at the lower levels of weekly leisure time. It is clear that for this group of households with young children, the total work burden is on average greater for mothers than for fathers.

Fig. 2
figure 2

Distribution of weekly hours of household production in families with preschool children

Fig. 3
figure 3

Distribution of weekly hours spent on leisure in families with preschool children

We differentiate between formal child care provided by recognized institutions, such as kindergartens and care centers, and informal care provided by the father, grandparents or other relatives, and friends, for two reasons. First, formal child care differs from informal child care in that it is recognized as incurring costs by the Australian fiscal authorities, and the family is eligible for reimbursement of a considerable part of these costs. Second, the price data on informal care are rather unreliable. The price of formal child care is reported for all children in registered care. In contrast, informal child care is often provided with no charge, or at a price that implies an unobserved subsidy from the carer. The lack of more detailed information about the costs of informal child care makes any effort to impute corresponding prices infeasible. Therefore, we consider the choice of formal care only, treating informal care as exogenous.Footnote 19 Informal care enters the utility function as one of the interaction terms X in (6), measured in hours, without a specified price.

Formal care is used by 43 % of the families, while the use of informal child care is almost universal (only 9 families report that they used no form of informal child care). The distributions of the weekly hours of child care are presented in Fig. 4. The profiles for both types of care are relatively similar, although the formal care distribution does not go far above 60 h per week. This reflects the fact that formal care centers are closed on weekends.

Fig. 4
figure 4

Distribution of weekly hours of informal and formal child care, families with preschool children using child care

Annual labor incomes are derived from reported weekly gross salaries from all jobs. The annual non-labor income of the couple is computed as the sum of each partner’s business income, investment income, private domestic pensions and overseas pensions. Figure 5 presents distributions of male and female labor incomes and household non-labor income. According to these data, around 45 % of mothers have zero labor income, while 54 % of families in the sample have zero non-wage income. The distribution of non-labor income for the subsample of families with non-negative incomes is skewed toward zero. At the same time, several outliers report very large incomes from business and investments.

Fig. 5
figure 5

Annual labor and non-labor gross incomes of families with preschool children, 2008, AUD

These income data are used to derive the set of 25 family incomes, net of the taxes and benefits and cost of child care, associated with the discrete time-use choices. All incomes are deflated to 2005, the selected base year, using the Australian consumer price index.

4.1 Income taxes and family benefits

The rate scale of the formal Australian income tax, the Personal Income Tax, is strictly progressive and applies to individual taxable income. However, strict progressivity is lost with the phasing out of an offset, the Low Income Tax Offset, also based on individual taxable income. While tax rates and the offset vary across the four waves of HILDA, the basic structure of the system is essentially the same in each year. For the purpose of illustration, Fig. 6 plots the profiles of marginal and average tax rates with respect to individual taxable income for the 2007–2008 financial year. Details of the rate scale and offset for that year are provided in “Appendix 2”.

Fig. 6
figure 6

Marginal and average tax rates of the 2007–2008 individual income tax

While the Australia income tax, the Personal Income Tax combined with the Low Income Tax Offset, is based on individual incomes, families are taxed effectively under a system of “quasi-joint” taxation. This is due to the withdrawal of child payments at various thresholds defined on the combined income of partners under a complex family payment system labeled “Family Tax Benefit Part A”. The effective marginal tax rate, obtained by adding the withdrawal rate of payments to the marginal income tax rate, varies widely across the distribution of earnings and can be well above the top rate of the Personal Income Tax scale at relatively low incomes levels. This is illustrated in Fig. 7 for the 2007–2008 financial year, for a family with two children aged under 13, with one under 5 years.Footnote 20 The figure plots the profiles of effective marginal and average tax rates with respect to the income of the primary earner for two limiting cases: a single income family and a two income family in which both partners earn the same income. For the latter case, the figure plots the effective marginal and average tax rates applying to the income of the second earner.Footnote 21 The higher rates indicate the tax penaltyFootnote 22 married mothers as second earners can face on entering the workforce under a system of joint or “quasi-joint” taxation.

Fig. 7
figure 7

Marginal and average tax rates of the 2007–2008 family tax system (including family payments)

Table 2 Regression results for the baseline homogeneous model

The pattern of marginal tax rates in Fig. 7 implies a budget set with many non-convexities and would make the traditional approach of finding the optimum in the complete budget set infeasible. This makes the discrete approach, approximating the complicated budget frontier with a small finite set of points, particularly useful.

5 Results

We first report the results for the baseline homogeneous specification presented in Sect. 3.1 and then discuss those for the model with unobserved heterogeneity introduced in Sect. 3.2.

5.1 Baseline model without unobserved heterogeneity

The estimated parameters of the baseline model are reported in Table 2. If the homogeneity assumption were found to be valid, the results would be consistent and more efficient than the latent-class model. The coefficients indicate that several of the interaction terms yield intuitively plausible results. An increase in the number of pre-school aged children in the household raises the marginal utility of formal child care and therefore strengthens the demand for it. On the other hand, an increase in the (assumed exogenous) availability of informal child care weakens it. The same is true for the allocation of time to household production.

The estimated marginal utilities of the choice variables, the components of the vector \({\mu }\), are central to our analysis, but their evaluation is more complex than consideration of the simple regression coefficients in isolation, since the marginal utilities depend upon the entire matrix \(\mathbf {A}\) and the vector \(\mathbf {b}\). They also vary with the household-specific socio-demographic characteristics, \(\mathbf {X}\), and with the values of the choice variables \({\mu }\) themselves. In Table 3 we summarize the distribution of the estimated marginal utilities at the observed choices, presenting first their sample averages and second, the proportion of households that have negative marginal utilities. We do this for the full sample as well as for the subsample of households that actually buy formal child care.

Table 3 Average marginal utilities of the main regressors and fraction of the population sample with negative marginal utilities, homogeneous model

As expected, marginal utilities of income, household production and leisure are on average positive, with only a very small fraction of households having a negative value in each case. On the other hand, around 90 % of households have a negative marginal utility of formal child care. This is of course not a problem for those households that do not use formal child care, but the last column of the table shows that households that do buy formal care have, on average, negative marginal utilities. This implies that this model is not successful in explaining the use of formal child care from economic arguments. For most households, the use of formal child care can only be predicted with the inclusion of error terms \(\epsilon _{r}\), reflecting optimization errors or unobserved factors that make specific choices more or less attractive.

This counter-intuitive result could be due to unobserved heterogeneity. Our sample contains a large proportion (57 %) of households that do not use formal child care. This can be problematic for the homogeneous model if the decision to use formal child is influenced by, for example, unobserved differences in home productivity. The model tries to explain this relation in terms of the variables included in the utility function, assigning strong disutility to formal child care. Since the majority of families do not use formal care, the failure to take account of unobserved heterogeneity forces the common coefficient to be negative. Introducing unobserved heterogeneity may help to solve this problem.

5.2 Latent-class models

A key step in the EM estimation procedure is the initial selection of the number of latent classes. This decision involves a trade-off. On the one hand, the higher the number of heterogeneous groups, the better is the fit of the model because we account for unobserved heterogeneity in a more flexible form. On the other hand, more stratified models are bound to be estimated less precisely because the number of unknown parameters rises proportionally to the number of allowed latent classes. The determination of the optimal number of classes is therefore crucial.

Following Train (2008), we compare the models with varying classification choices on the basis of their Schwarz–Bayesian information criteria (\({ BIC }\))

$$\begin{aligned} { BIC }=-2\log (L)+k\log (n) \end{aligned}$$
(11)

where L is the likelihood, k is the number of free parameters in the model and n is the number of observations in our sample. The multiple-class models yield the statistics in Table 4. The table shows that the 8-class model attains the lowest \({ BIC }\), and should therefore be considered as the most reliable specification for further analysis.

Table 4 Bayesian information criteria for multi-class models

In order to examine whether our models actually fit the data, we simulate individual time-use allocations using the estimated models and compare the simulated aggregated distributions to their observed counterpart. Figure 8 presents this comparison both for the baseline model and the model with eight latent classes.

Fig. 8
figure 8

Distribution of time-use variables into intensity levels, observed and predicted shares

As expected, the 8-class model replicates the empirical distributions very well, attaining almost identical shares of intensity levels among all three time-use choices. The homogeneous model performs much worse and essentially fails to capture the distribution of market work hours. In particular, the model underestimates the proportion of mothers with zero market hours and overestimates the proportion with low hours in part-time work. The distributions of the other two choice variables are replicated well even by the homogeneous model, though the latent-class model still provides more precise approximations.

We do not present the regression coefficients for the 8-class model because the class-level stratification makes their interpretation practically impossible. However, one statistic which can be readily interpreted is the fraction of the sample with negative marginal utilities (see Table 5).

Table 5 Fraction of the population sample with negative marginal utilities of the main regressors, model with eight latent classes

The only result which exhibits a substantial change compared to the baseline specification (see Table 3) is that for formal child care. The proportion of mothers with disutility from additional formal child care drops by 30 % points, to 53 % in total and to 31 % when we restrict the sample to mothers who are using formal child care.Footnote 23 This is a considerable improvement over the homogeneous specification, though the 31 % is still substantial. The reason for this is that the majority of all mothers do not use any formal child care. As a consequence, the estimates imply that most mothers do not attach positive utility to formal child care. Due to the important role played by unobservables (error terms and unobserved heterogeneity), the model is not able to perfectly predict who will and who will not attach utility to formal child care.

The relative performance of the models with varying numbers of latent classes is further tested through a series of simulations in the next section. The aim of these simulations is to predict how people respond to selected changes within their economic environment. By predicting (and comparing) the behavioral responses for different model specifications, we can analyze the importance of unobserved heterogeneity and assess the limitations of the homogeneity assumption.

6 Microsimulations

First, to analyze the sensitivity of choices to wages and prices, we simulate a 10 % increase in the wages of all mothers, and a 10 % increase in the prices of formal child care. Second, we carry out a policy simulation in the spirit of Apps and Rees (2009), building on their critique of joint taxation (as discussed in the previous section). We propose an alternative system of taxes and benefits designed to have a less distortionary effect on female labor supply than the actual system and we estimate its impact on the choices of the type of households we consider.

6.1 Changing wages and child care prices

The impact of wage and price changes is measured in terms of aggregate elasticities. We compute the percentage changes in total hours of market work, total hours of household production, and total hours of formal child care with respect to changes in the wage rates of all mothers, or all child care prices, holding all other variables constant. We present both gross and net elasticities, which correspond to changes being applied before and after subjecting families to all taxes and benefits. Accordingly, changing gross wages will have an impact on applicable tax rates and benefit eligibility criteria. Due to the progressive features of the income tax system, the 10 % increase in gross wages generally leads to lower increase in net wages. Changing net wages has the advantage of circumventing secondary effects caused by changes in the effective tax rates: increasing net wages by 10 % results in 10 % higher disposable incomes from the mother’s market work across all households.

The resulting income changes are proportional to the earnings of mothers so that those working earn more while non-participants retain their original disposable incomes. Since the 10 % increase in the wage makes participation more attractive, we can expect an increase both in market hours of employed mothers and in the labor market participation rate. Similarly, an increase in child care prices results in an income reduction that is proportional to the cost of bought-in child care, and we can expect that it leads to a reduction in bought-in hours and in the fraction of the sample using formal child care.

We compute aggregate elasticities as the ratios of percentage changes in the relevant time or care use to the percentage changes in the wages and prices (where the latter are 10 %, by construction). This is done as follows. We first derive the benchmark time-use allocations (using the same wages and prices that are used for estimation) by averaging individual choice probabilities predicted by the model and using these to compute the average hours of each activity.

A similar procedure is applied to calculate the average hours of activities after the wage or price increase. The only difference is that the choice probabilities are derived using adjusted disposable income for each alternative in the choice set. This changes the utility values for some of the choice alternatives but not for others and, as a consequence, changes the probabilities of all choices. Using the new probabilities we recompute average hours. Finally, we compute the percentage deviations in the new averages compared to the benchmark. The elasticities for the homogenous model and our preferred specification are provided in Table 6.Footnote 24

Table 6 Elasticities of time-use allocations with respect to changes in wages and child care prices, partnered mothers with pre-school children

The first panel gives the responses to the increase in all mothers’ net wage rates. The first thing to note is the large difference between the homogeneous (one class) model and the model with unobserved heterogeneity, demonstrating the importance of controlling for unobserved heterogeneity. When we allow for unobserved heterogeneity, the predicted responses fall substantially. Their sizes remain relatively stable among models with different numbers of classes.

Standard errors tend to increase as we allow for latent classes. This reflects the fact that this model is more flexible and therefore requires more data for accurate estimation.

Focusing on the latent-class model, the net wage increase leads to time-use shifts that correspond to intuition. A 10 % increase in all net wage rates results in a (significant) 4.3 % rise in average working hours, implying a positive uncompensated own labor supply elasticity of 0.43 for this group of mothers with young children. This is well in line with the large literature on female labor supply. The positive substitution effect (the price of leisure increases) dominates the negative income effect. Moreover, the 10 % wage increase leads to a (significant) 4.2 % increase in hours of formal child care (a “cross” elasticity of 0.42). First, the higher demand on time due to increasing hours of market work leads to substitution of own child care for bought-in child care. Second, higher earnings lead to higher family income, increasing the demand for formal child care if this is a normal good.

The elasticity of time allocated to household production is significantly negative, at -0.08. The negative sign implies that higher wages lead mothers to work less in the household. However, the actual change in home production hours is not large enough to compensate for the increase in market hours, implying that mothers also reduce their leisure in order to do more market work.Footnote 25 The second panel reports mothers’ gross wage elasticities. The responses induced by 10 % gross wage change are weaker than those reported for the net wage change. This follows from the fact that gross stimuli are still subject to tax, so that the resulting net wage changes will be smaller than 10 %.

Turning to the impact of the rise in child care prices in the third panel, it is not surprising that the highest elasticity is that of formal child care itself. With a 10 % rise in child care prices, the demand for formal child care falls significantly, by 4.2 %. This in turn causes mothers to work less in the market–market hours drop significantly, by 0.8 %, as they have to substitute their own time for bought-in services.Footnote 26 Accordingly, the hours of household production increase by 0.2 %, replacing almost all of the forgone time formerly spent on market work.Footnote 27 The gross child care price elasticities presented in the fourth panel are higher compared to their net counterparts. This is a direct consequence of the Child Care Rebate (CCR). CCR imposes an upper cap on rebatable child care expenditures (see “Appendix 2”), and this cap is binding for many families in the sample. By increasing the gross child care prices, these families will be facing additional child care expenditures which cannot be rebated. As a result, they will be subject to net child care costs higher than the ones induced by the net child care price increase. For families with expenditures below the upper cap, the effects of gross and net price changes will be equivalent.

6.2 Simulation of a tax and benefit reform

As discussed in Sect. 4, the phasing out of family benefits on household income creates high effective marginal tax rates for many mothers as secondary earners. To investigate the impact of these high rates on their labor supply and participation, and also on the demand for formal child care, we simulate the effects of switching to an individual-based income tax with universal payments. The reform replaces the marginal rate scale depicted in Fig. 7 with one that applies to individual taxable incomes, as illustrated in Fig. 6. To fund the increase in benefit payments, we increase proportionally (by 26.76 %) all marginal tax rates of the system in Fig. 6 to achieve a reform which is ex ante (that is, before behavioral responses) revenue neutral.Footnote 28

Figure 9 shows graphically the differences in the net tax positions of households resulting from the reform (assuming no behavioral responses). The differentials are ordered by the corresponding pre-reform net household incomes, so that we can see how the shift in the tax burden varies with household income. Since the reform is ex ante revenue neutral, the changes for all families in the sample sum to zero. The figure shows that when benefits are universal families with average joint incomes gain. It is important to keep in mind that this is due to gains for relatively low to average two earner families who previously lost the joint-income-tested benefits. The proportionally higher marginal tax rates shift the tax burden toward the higher income groups, in effect shifting the burden from lower wage two earner families to those with higher wage rates.

Fig. 9
figure 9

Post-reform differences in the net tax positions of families, ordered by pre-reform net household incomes

Table 7 summarizes the simulated changes in time allocations and hours of formal child care in response to the reform. As in the previous simulations, we observe a large discrepancy between the changes predicted by the homogeneous model and those predicted by the models with more than one latent class, with the results of the latter proving relatively stable across different specifications. Again this implies that accounting for unobserved heterogeneity is important not only to improve the fit of the model but also from a substantive point of view.

Table 7 Percentage changes in time allocations after ftb reform, partnered mothers with pre-school children

We again focus on the outcomes for the 8-class specification. We observe that the reform would lead to a 3.11 % increase in average hours of work (about 0.43 h per week, using the average hours in Table 1), a 1.75 % increase in average hours of formal child care (0.15 h per week), and a 0.63 % decrease in the average hours of home production (about 0.45 h per week). All these effects are statistically significant. On average, the positive effect on market work of not phasing out family benefits is more important than the negative effect due to the increase in the marginal tax rates. Market hours of work therefore increase and, as a consequence, hours of home production fall and demand for formal child care increases.

6.2.1 Heterogeneity of the behavioral responses

The behavioral effects induced by the reform appear to be highly heterogeneous across population groups and latent classes. Closer analysis of our results reveals positive effects at the extensive margin of female market labor supply, with the predicted labor market participation rate rising by 4.4 % (to about 58 %). On the other hand, these effects are mitigated by responses at the intensive margin, with some employed mothers choosing to work fewer hours under the reform. Average hours of market work (conditional on being employed) fall by 1.3 %, with individual responses showing considerable variation. In fact, expected hours of market work increase for 69 % of all women in the sample. The 1.3 % decline in the aggregate work at the intensive margin is driven by the response of mothers on higher wages and in full-time employment. For these women, the generic increase in the marginal tax rate needed to finance the reform leads to a negative substitution effect that reduces their market work and increases non-market time uses. This effect dominates the positive income effect stemming from the loss of disposable income. This intuition is supported by the results presented in Table 8 where we divide households into four groups corresponding to quartiles of the observed household income distribution.

Table 8 Percentage changes in time allocations after FTB reform by household income quartiles, model with eight classes, partnered mothers with pre-school children

Households in the first three income quartiles exhibit responses with signs in line with the aggregate effect, whereas the high-income households in the fourth quartile are predicted to adjust their time-use allocations in the opposite direction. For the latter group, the female labor force participation drops by 0.92 %. This decline is smaller than the one derived for the aggregate intensive margin of female labor supply,Footnote 29 since high household income does not necessarily imply that a woman in such household would be a high earner.

The behavioral heterogeneity is crucial for successful targeting of policy reforms, as it helps to identify the potential impact on different subsamples of the population. It is also interesting from the perspective of economic modeling, as we can compare the relative performance of homogeneous and latent-class models. In order to do so, we split the sample into two groups according to actual employment status and compute the elasticities separately for the two groups, using both the homogeneous model and the latent-class model with eight classes. Using the homogeneous specification, the effects prove to be almost identical for both groups, as this model captures only a small part of the differences in productivity and preferences between the groups (the “observed heterogeneity” part captured by the covariates in the model). According to the eight class model results, the simulated increase in aggregate working hours is much stronger for non-employed mothers, with the absolute increase in market work hours being 28 % larger. As for the change of formal child care hours, the non-employed mothers exhibit a rather modest increase in absolute terms (70 % lower than employed mothers), but in relative terms their bought-in child care rises more than for employed women (the initial level of formal child care utilization is substantially lower for non-employed mothers).

The failure to capture heterogeneity in responses of the homogeneous model is further illustrated by the fact that this model cannot replicate observed differences in reported time-use allocations between the two groups, overestimating work and formal child care allocations of non-employed mothers and underestimating them for employed mothers. On the other hand, the 8-class model produces almost identical time-use patterns as observed in the data. For these reasons, it is hard to maintain that the homogeneous model would be able to provide reliable predictions of the responses to proposed policy changes.

6.2.2 Net fiscal effect of the reform

We also analyze the net revenue effect of the reform taking account for behavioral changes predicted by our 8-class model. Changes in time allocations can affect government revenue through two distinct channels: by increasing (reducing) their hours of work mothers are also increasing (reducing) income tax revenues, and by buying in longer (shorter) hours of formal child care, child care benefits rise (decline).

The key result in this context is that the government marginally improves its net fiscal position. Income tax revenue from mothers rises by only 0.5 %, which seems low compared to the 3.1 % increase in aggregate working hours. The reason is the heterogeneity in responses discussed above: mothers with higher wages tend to reduce their hours of market work, and the progressive nature of the income tax system makes the fall in tax revenues from this group relatively large, substantially offsetting the additional revenue from low and middle income households. More specifically, mothers who increase their market hours (69 % of the sample) are predicted to pay an average of $151 more in annual income taxes (a 5.6 % increase), whereas those who reduce their hours reduce their income tax liabilities by a predicted average of $261 per annum (a 2.3 % reduction). The net result is an aggregate increase in the income taxes by $25 per household (which translates into the aforementioned 0.5 % revenue gain).

The situation is very similar for the child care benefits, which increase only slightly in aggregate: on average, a household gets an additional $2 (0.1 % of the initial payments), which is small compared with the 1.75 % change of formal child care hours. Analogously to the income tax effects, this outcome reflects the heterogeneity in behavioral responses.Footnote 30 Combining the two effects, we estimate that on average households will contribute an additional $23 to government tax revenue, an increase that represents 0.2 % of their original contribution.

6.3 Robustness checks

In order to assess the stability of our results, we run a series of sensitivity checks, altering the econometric specification of our model in the following ways. First, to achieve a more flexible specification, we divide the time-use variables into a finer grid \((6^{3})\) of discrete points, allowing a greater degree of choice in household decision making. Second, we experiment with the composition of time-use variables, reducing the mother’s household production decision to a single maternal child care choice.Footnote 31 A third extension augments the model by a fixed disutility of work which is modeled as an additional term in the utility function (a dummy for positive hours of paid work times an additional parameter to be estimated).Footnote 32 In a fourth extension, in order to account for potential dependence of child care prices on the quality of the service, and for misreporting in the individual household accounts, we estimate a model with imputed wages and child care prices for everyone (instead of just for the households where wages or prices are not observed). A fifth alternative model investigates plausibility of the assumption that post-reform informal child care use remains fixed at the pre-reform levels. We do so by estimating a model which does not allow for interactions of choice variables with informal child care. This model does not assume that the provision of informal child care remains unchanged, and the potential effects of substitution between informal and formal child care are contained within the formal child care coefficients included in the utility function. The sixth robustness check generalizes the model using the random opportunity specification of Aaberge et al. (1999). This model allows part-time jobs to differ in their availability from full-time jobs, reflecting the fact that women may be facing more (or less) job opportunities depending on the desired work hours. Unlike Aaberge et al. (1999), we do not allow for availability of jobs to depend on the corresponding wages, as this adjustment proves cumbersome in our model specification. The final extension evaluates the maintained assumption that women’s leisure is consumed privately, yielding the same utility irrespective of the leisure choices of their partners. To check its plausibility, we augment the utility function with an interaction term of male and female leisure time, where male leisure is treated as given. The interaction term allows for a preference for shared spousal leisure time.

Table 9 Robustness check—elasticities and reform responses derived by alternative model specifications with eight latent classes, partnered mothers with pre-school children

Table 9 shows that changes in the econometric specification induce changes in the values of the elasticities, but their relative sizes and signs remain similar to those in the original model. Most of the values remain within the 95 % confidence interval of the corresponding baseline elasticities. This finding is particularly important in the context of child care quality concerns, as it suggests that the differences in service quality reflected by variation in observed prices are unlikely to distort our estimates.

The stability of the elasticities is also interesting in the context of the model containing maternal child care decisions, as it suggests that changes in the hours of home production are proportional, irrespective of the distinction between child care-related and other household activities. Women who engage in the labor market will therefore work less in the household, delegating part of their chores either to the husband or buying in the services from the market.

We also check the validity of standard errors corresponding to the measured elasticities (without changing the specification of the model itself), by calculating standard errors in a robust way, controlling for general heteroskedasticity and household-specific clustering (considering the estimates as pseudo maximum likelihood estimates). In both cases the newly derived standard errors preserve the significance levels attained by the benchmark approach, suggesting that heteroskedasticity or clustering does not distort our results.

7 Conclusions

In this paper we have analyzed the time allocation decisions of partnered mothers with pre-school children, with emphasis on the influence of a non-convex tax and benefit system on labor supply, household production and the use of formal child care. We have focused on incorporating unobserved heterogeneity, originating possibly from differences in productivities and preferences. Our findings show this plays a dominant role in analyzing the mothers’ decisions in our data. Our results cast strong doubts on the usefulness of the homogeneous model with no unobserved heterogeneity. The parameters fail to capture the true effects of factors driving household decision making, and hence simulations based on the baseline homogeneous model specification give misleading results.

To control for unobserved heterogeneity, we estimated a series of latent-class models, among which the 8-class model was found to perform best, balancing goodness of fit against parsimony. To assess the responsiveness to changes in the family tax and benefit system and in child care prices, we conducted several simulations based upon our estimated models, increasing net wages of mothers or net child care prices, and altering the joint-income structure of the existing tax and benefit system in the third reform.

The simulations show that mothers with pre-school children are responsive to changes in wages as well as changes in child care prices. The results suggest that market work and formal child care tend to be complements, and respond significantly to wage and price changes. The results also indicate that significant changes in labor supply and child care demand can remain unidentified when the unobserved heterogeneity is not accounted for, since the homogeneous model leads to significantly distorted female labor supply elasticities.

In the third simulation, we show that the phasing out of family benefits on the basis of joint income increases marginal tax rates on the incomes of mothers as second earners, with a negative impact on their labor supply. The tax system can be made more favorable for mothers with pre-school children by switching to a fully individual based system. In such a reformed setting, mothers are predicted to increase their labor supply and use of formal child care. The net budgetary effect of these behavioral responses is to raise additional tax revenue which could be used to lower tax rates and therefore achieve efficiency gains. The gains from the reforms we have simulated arise from changing the structure of effective marginal tax rates under the Australian “quasi-joint” family tax and benefit system. Running a similar policy simulation on data for countries with full joint taxation may yield considerably stronger behavioral responses.

It should be noted that these results are derived from the responses of partnered mothers with preschool children, and so they are specific for this population. However, in the context of the Australian family tax and benefit system for two-parent families with dependent children, the reform which we pursue could be potentially viewed as the one that needs to be considered for the full-population. The extremely large child payments increase further for older children, extending the problem of high effective marginal tax rates on secondary earners across the full-population of families with dependent children. For that reason, it is plausible that the proposed reform would prove beneficial for much broader set of Australian families than the one analyzed in this paper.

A number of improvements and extensions are of course possible. First, our analysis would benefit from exploiting the panel structure of the HILDA dataset, controlling for time-stable individual effects. Secondly, although we consider the current method of treating unobserved heterogeneity to perform well, it could be worthwhile to assess the stability of our results by using alternative ways of controlling for unobserved heterogeneity, such as the random coefficient mixed logit model, or the approaches utilizing Bayesian nonparametric methods. Thirdly, with sufficient information on disposable income of single mothers, the model could be extended to account also for this demographic. It is likely that we would then observe even stronger dependence of market work on child care, since single mothers are more restricted in their informal child care choices. An interesting extension would be to model fertility as a choice, since the family tax benefits are directly influencing the costs of childbearing. This would be, however, difficult in the current static setup of our model.