1 Introduction

An online microloan is a financial service that offers unsecured, small loans between lenders and borrowers through online platforms without the intermediation of financial institutions [2, 9, 58]. Driven by the Internet and mobile technology, sharing economy arises as a new market model that provides peer-to-peer (P2P) sharing of access to goods and services [59]. Microlending, as a typical form of online microloan in sharing economy, has received extensive attention from governments, industry, investors, and researchers [6], and the corresponding worldwide growth in the number of loans and investors has been remarkable [43]. For example, China had approximately 6,000 microlending platforms with nearly 2.4 million borrowers and 4.2 million investors at the end of 2016.Footnote 1

The risk of default is often higher in online microloan platforms than in conventional credit markets where borrowers could provide collateral for loans [19]. The information asymmetry is more severe in microlending platforms because borrower and lender identities are anonymous to each other [18]. Consequently, risk and credit assessment algorithms with limited personal information of borrowers is crucial to both platform managers and lenders. Because microloan borrowers frequently lack a credit report from formal institutions, most research has focused on the antecedents to their default risk. Empirical studies of some platforms (e.g., Prosper, Lending Club, and PPDAI.com) have found that a borrower’s loan characteristics, individual characteristics, and social capital are useful indicators of the borrower’s default behaviors [34, 51]. A platform can collect most of these antecedents during the loan application process. However, “soft information,” such as a borrower’s social network [34, 44, 56] and narratives provided by borrowers (e.g., [13, 28]), has been shown to be valuable in risk assessment.

A borrower’s risk manifests first as delinquency (i.e., not repaying an installment on time) and then as the ultimate default behavior (Fig. 1). Based on their repayment behavior, borrowers can be categorized into three groups: normal, delinquent, and default borrowers. Normal borrowers make timely repayments and are charged only the interest rate. In contrast, delinquent borrowers are charged a fine interest rate, in addition to the interest rate. If the delinquent borrowers fail to make required loan payments for an extended period of time, they then become default borrowers.

Fig. 1
figure 1

Two-stage repayment process

Although the repayment process comprises two stages, involving both delinquency and default, extant studies have mainly focused on a borrower’s default risk at the loan level (e.g., [31, 39, 51]). Although some have studied delinquency, it has been used as an indicator of default (e.g., [24]). Studies have treated delinquency and default as independent borrower outcome behaviors without considering the interrelationship between them (e.g., [12, 17, 22, 52]). Therefore, the literature lacks an integrated model that considers both stages and their transition.

Distinguishing borrowers’ behavior in the two stages is crucial for at least three reasons. First, the psychological motives underlying delinquency and default differ [7]. Specifically, delinquency is very likely to arise due to borrowers inadvertently failing to make timely payments such as missing the repayment due dates [21]. Default, however, stems from a variety of reasons, including intentional causes (such as financial fraud) and unintentional causes (such as low financial capability and accidental events) [37, 55]. Understanding the different antecedents to delinquency and default can offer rich insights for scholars and practitioners to make respective predictions at the loan-approval stage. Consequently, the risk antecedents may have different effects in predicting delinquency and default behavior [7]. To assess a borrower’s risk at each stage, platforms must identify the factors unique to delinquency or default. Second, once delinquency occurs, a platform may execute corresponding interventions such as a debt collection notice to prevent delinquency from turning to default [60]. An integrated two-stage model could help analyze the effect of interventions conditional on the outcome of the first stage. Methodologically, an integrated model avoids sample selection bias when analyzing default risk.Footnote 2 Finally, because a platform may impose a fine on borrowers for delinquent payments, it may garner a higher return after the borrower repays the installment and fine. Therefore, distinguishing delinquency and default enables a platform to more effectively assess the value and risk of loans.

This study proposes a two-stage conditional probit model for exploring the effect of antecedents on a borrower’s delinquency and default behavior. Specifically, we address the following questions: (1) What are the distinct effects of risk antecedents on a borrower’s delinquency and default behavior in microloans? (2) How do the effects of risk antecedents and debt collection vary in different borrower segments? (3) What is the economic value of debt collection by a platform?

Risk occurs at the installment level. Studies have often analyzed the risk of an entire loan instead of each installment. According to our review of the literature, no research has been conducted on borrower risk at the installment level. That is, the literature fails to take advantage of the dynamic information of borrowers’ monthly repayment. Delinquency and default may occur at the installment level. For instance, a borrower can be delinquent on an installment and finally default on it, or a borrower could be delinquent on some installments but finally pay them back with no loan-level default. Installment-level analyses entail using a borrower’s previous repayment records to assess the risk of an upcoming installment, resulting in a more fine-grained risk assessment. Moreover, managerial interventions such as a debt collection notice often occur at the installment level. Their effectiveness can be analyzed only at the installment level.

This study modeled borrowers’ delinquency and default behaviors by applying a novel two-stage conditional probit model at the installment level. A latent class model was employed to capture the heterogeneity of borrowers. Our study proposes a set of new risk antecedents such as previous repayment behaviors and the installment due date in a month. These antecedents have not been explored by previous studies.

As revealed by our results, risk antecedents have a significant effect on delinquency, although the effects vary by segment, and that the antecedents to delinquency and default risk are different. Moreover, debt collection helps recover debt, but the effect varies by segment. Our results provide guidance for classifying customers into various segments and taking corresponding managerial interventions. Our results also provide a model for firms to assess the marginal effect of debt collection in the microloan context. Our findings reveal that compared with a no-collection scenario, the default rate would decrease to 16.30% if the platform conducted debt collection for all the delinquent installments. If a platform collects a delinquent installment, its subsequent default rate would decrease by 0.1571. The marginal revenue of collecting an installment is ¥282.41 (approximately US$43).

The remainder of this paper is organized as follows. Section 2 provides a literature review on studies of default risk in microloan context and studies on debt collection. Section 3 presents the research data and econometric model. Section 4 reports the empirical results. Finally, Sect. 5 offers a summary of the study with its limitations and contributions.

2 Literature review

2.1 Antecedents to delinquency and default risk in microloans

The first group of antecedents to default risk comprises loan characteristics such as loan purpose, loan size, and interest rate. Serrano-Cinca et al. [51] defined 14 types of loan purposes, such as debt consolidation, wedding, and starting a small business. Among these loan purposes, starting a small business has the highest risk and a wedding has the lowest risk. The relationship between risk and loan size has been extensively discussed [42]. A study argued that risk grows when the loan size is larger [33], whereas another found no such relationship [42]. Nevertheless, another study reported a negative relationship [29]. Moreover, a study found a positive correlation between a loan’s interest rate and default risk [57].

The second group of antecedents to default risk comprises borrower characteristics including demographics and financial information. Chen et al. [8] reported a lower default risk among female borrowers, compared with male borrowers. Yao and Sui [57] found a significant positive correlation between the age of borrowers and their default behavior. Other research has demonstrated that financial information such as credit score, income, employment status, and house ownership affects the probability of default [47, 51]. Furthermore, Ravina [49] revealed that borrowers perceived to be more esthetically pleasing were 1.41% more likely to have their loan application approved and pay 81 basis points less; notably, lenders may also be more willing to tolerate dishonest behavior from these borrowers [30].

The third group of antecedents to default risk comprises the social capital of a borrower. Social capital refers to the borrower’s interpersonal connections such as friends, colleagues, and group affiliations [48]. In the context of a microloan, social capital includes group resources and personal relationships. Some platforms (e.g., Prosper.com) encourage borrowers and lenders to join an online group with a group leader. The group leader may have private information about the borrowers and can thus select appropriate borrowers for lenders and influence them to repay their loans. Berger and Gleisner [3, 4] have found that such group membership reduces the default rate. Lin et al. [34] revealed that when a borrower’s friend is the lender, the borrower’s odds of default decreases by 9%, on average, and the odds of default decrease further if the friend joins the bidding. However, the number of general friends has no effect. Ge et al. [20] found that a borrower’s self-disclosure of their social media account and social media activities predicts a lower default probability in microlending.

Although many studies have focused on default risk, only a few quantitative studies have been conducted on delinquent behavior in microloan contexts. Ravina [49] reported that borrowers perceived to be more esthetically pleasing are three times more likely to be delinquent than those perceived to have an average appearance. She also identified a borrower’s employment status as an antecedent to delinquency risk. Agarwal and Liu [1] found that macroeconomic factors such as county unemployment rate significantly influence delinquency in credit card loans. In a field experiment in India, Field and Pande [17] discovered that the type of repayment schedule (i.e., weekly or monthly repayment) has no effect on delinquency or default, but a more flexible schedule can significantly lower transaction costs without increasing default. Getter [22] showed that the size of a household’s payment burden (i.e., monthly payments relative to monthly income) has a nonsignificant effect on delinquency and a very small effect on default. Although these studies have considered both delinquency and default, the two behaviors have been treated as independent outcomes. Recently, Zhou et al. [60] partitioned consumer late repayment period into grace stage, delinquency stage, and default stage, and built prediction models by incorporating these three stages simultaneously as output variables. However, their data-driven study did not consider distinct effects of factors for different stages.

2.2 Debt collection

The literature on debt collection focuses on assessing the effectiveness of collection interventions on borrowers’ repayments. Debt collection is a common managerial intervention used by microloan platforms [25, 45] and can be costly. Lenders are willing to bear the cost only when the cost of collection is lower than its return [14].

The most common debt collection practice is offering economic incentives to borrowers. Cadena and Schoar [5] demonstrated that using waivers of interest and delayed fees have a positive effect on promoting a borrower’s repayment behavior. Notice of outstanding and overdue installment is another common practice. Salisbury [50] indicated that a minimum payment warning printed on consumers’ monthly account statement has no effect on subsequent repayments, whereas Stewart [53] and Navarro-Martinez et al. [45] have found a negative effect. A third collection practice is to impose social pressure on delinquents [11]. Specifically, with a borrower’s consent, platforms can disclose a borrower’s delinquency or default behavior to their family members or friends. Perez-Truglia and Troiano [46] found that shame penalties that entail publishing the names of those with tax delinquencies on a social website are effective. However, this effect is only significant for individuals with smaller debts. More relevantly, Luo et al. [40] conducted a field experiment and discovered that the combination of economic incentives and social pressure penalties decreases the likelihood of loan repayment, whereas the use of each tactic separately has the opposite effect. Researchers have called for more precise and effective collection strategies [14]. Recently, Lu et al. [37] examined the impact of social notification on default and showed that notifying social contacts could reduce the default rates. Du et al. [15] revealed that embedding lender’s positive expectations in the reminder messages would increase loan repayment rates in both the short and long run whereas emphasizing adverse consequences exerted only pronoun effects on encouraging loan repayment in the short run.

Although the effectiveness of debt collection intervention methods has been explored, the marginal economic value of such managerial interventions has received less attention, as have the contingencies for effective collection. Because debt collection is costly, quantifying its marginal benefit can help a loan provider determine the conditions (e.g., which type of delinquents) for applying such methods.

In summary, the literature has two gaps: It lacks a two-stage model for assessing delinquency and default risk simultaneously, and it also lacks an installment-level analysis factoring both individual repayment behavior and managerial intervention. Our work differs from the literature in several crucial aspects, as summarized in Table 1.

Table 1 Comparison of the literature and this study

3 Research data and model

3.1 Research data

We derived our data from a startup microloan platform in China that targets college students as its customers. This study focused only on loans with a fixed size of ¥3,000 (approximately US$440) with an installment period of 6 months. The platform adopts a decreasing interest rate policy with a fixed mortgage; that is, a borrower is scheduled to repay ¥500 equally as principal capital every month,Footnote 3 and the interest decreases by month in the loan period. Usually, the interest rate ranges between 9 and 18%. When borrowers apply for a loan, they provide their personal information and specify the purpose of the loan. Subsequently, the platform conducts a thorough review and evaluation of the provided information, upon which the decision is made regarding the approval or denial of the loan application. The borrowers majorly leverage the microloans to cater to the temporary financial requirements, such as additional capital for small businesses, occasional shopping needs, education spending, and medical expenses. The debt collection methods of the platform include overdue notice and shame penalties if a borrower fails to respond to the notice.Footnote 4 Specifically, the platform sends reminder texts to borrowers and uses the shame penalties by informing delinquents’ social contacts, such as family members and friends. If the loan turns into default, the platform would submit the default records to a centralized shared blacklist system maintained by a symposium of microloan companies and initiate legal action. The platform generates revenue through a fixed commission fee from each borrower, interest paid by non-default borrowers, as well as the penalties imposed on those who fail to make timely payments (i.e., delinquency). The cost of the platform includes the loss of principal capital (i.e., exposure at default). Note that, the focal microloan platform has followed the common industrial practice (loan size, interest rates, and repayment performance) and market regulation during our observation period. Thus, it is representative of the whole microloan platform population in China. Moreover, we believe that our findings can also be generalized to most other contexts because the focal microloan platform presents shared similarities in loan type, loan purposes, and revenue structures with the microloan business in other countries.Footnote 5

Our dataset covered the period from August 1, 2014, to December 31, 2015, including loan attributes, borrower characteristics, monthly repayment records, and debt collection records. The dataset had two parts: The first part comprised the loans issued from August 2014 to March 2015, and the second part comprised the monthly repayment records from September 2014 to December 2015. Because all the loans had a 6-month span, the last repayment of the sample was due on September 30, 2015. If an installment was not collected 3 months after the due date, a default was confirmed.Footnote 6 The dataset covered the comprehensive repayment process of every installment, and it had a sufficient time span to observe the ultimate default of the installments and loans.

The dataset contained 6097 unique borrowers with 6207 loans. The total number of installments was 37,242 (= 6207 × 6). Figure 2 presents the delinquency and default rate of installments in the 6-month loan period. Specifically, the first installment had the lowest default and delinquency rate. The delinquency rates were higher for the middle installments (i.e., months 2 through 5), and the default rate increased by month. Figure 2 also suggests that delinquency and default behavior have distinct trends.

Fig. 2
figure 2

Monthly delinquency and default rates in the sample

3.2 Delinquency model for the first stage

Our econometric model is based on a conditional probit model.Footnote 7 Let \(Y_{1i,t}\) denote whether borrower i was delinquent at t, where subscript 1 refers to the first stage. Let the corresponding utility be \(Y_{1i,t}^{*}\). A borrower’s delinquency behavior is specified in Eq. (1).

$$Y_{1i,t} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {Y_{1i,t}^{*} > 0,} \hfill \\ 0 \hfill & {Y_{1i,t}^{*} \le 0.} \hfill \\ \end{array} } \right.$$
(1)

A dynamic binary panel choice model was used to specify the delinquency model. Let \(X_{1i,t}\) be the covariate to denote the antecedents to the borrowers’ risk, including previous repayment behavior and other installment-specific variables. \(X_{1i,t}\) represents the time-variant variables. As suggested by Gross and Souleles [24], \(Y_{1i,t}\) is affected by both \(Y_{1i,(t - 1)}\) and \(X_{1i,t}\).

For the previous repayment behavior, we defined the accumulated number of delinquencies in a loan before month t (ANDLY), the accumulated number of defaults in a loan before month t (ANDFT), the average days of delinquency in a loan before month t (ADD), and the average days of advance repayment before month t (ADRA). At t = 1, there is no previous repayment behavior for borrowers. To manage the potential problem of a dynamic panel bias,Footnote 8 following Heckman’s [26] approach, we specified a separate equation for the initial condition when t = 1. Specifically, we used M1i to denote an observable instrument variable to represent a borrower’s previous delinquency behavior before t = 1. M1i is the average delinquency rate of other borrowers who applied for a loan on the same day as borrower i.

For the installment-related attributes, we defined a set of date-specific attributes based on whether the installment due date is a weekend (Weekend), at the beginning or end (i.e., the first or last 3 days) of a month (B/E of month), during the summer or winter vacation (S/W vacation, based on China’s school calendar), or on national holidays (Holiday, as announced by China’s government). The monthly interest rate (MIR) and fixed month effects were controlled. Equation (2) specifies the dynamic binary panel choice model.

$$Y_{1i,t}^{*} = \left\{ {\begin{array}{*{20}l} {\lambda X_{1i,t} + \tau M_{1i} + \eta_{i,t} } \hfill & {t = 1;} \hfill \\ {\beta X_{1i,t} + \gamma Y_{1i,t - 1} + \eta_{i,t} } \hfill & {t = 2,...,T_{1i} .} \hfill \\ \end{array} } \right.$$
(2)

where λ, τ, β, and γ are coefficients and η is the error. We assumed that Y1i,t follows a probit distribution:

$$\begin{aligned} \Pr (Y_{1i,t} = & 1|X_{1i,t} ,M_{1i} ) = \Pr ob[\eta_{i,1} > - (\lambda X_{1i,t} + \tau M_{1i} ){| }X_{1i,t} ,M_{1i} ] = \Phi (\lambda X_{1i,t} + \tau M_{1i} ),\quad t = 1; \\ \Pr (Y_{1i,t} = & 1|X_{1i,t} ,Y_{1i,t - 1} ) = \Pr ob[\eta_{i,t} > - (\beta X_{1i,t} + \gamma Y_{1i,t - 1} ){| }X_{1i,t} ,Y_{1i,t - 1} ] = \Phi (\beta X_{1i,t} + \gamma Y_{1i,t - 1} ),\quad t = 2,...,T_{1i} . \\ \end{aligned}$$
(3)

3.3 Unconditional default model for the second stage

Y2i,t denotes a borrower’s default behavior based on his or her utility in the second stage [Eq. (4)].

$$Y_{2i,t} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {Y_{{_{2i,t} }}^{*} > 0,} \hfill \\ 0 \hfill & {Y_{{_{2i,t} }}^{*} \le 0.} \hfill \\ \end{array} } \right.$$
(4)

The covariate \(X_{2i,t}\) includes the previous repayment variables, the installment-specific attributes, and a borrower’s time-invariant attributes. Additionally, we defined a dummy variable, Collection, which indicates whether the platform took any collection actions for an installment.Footnote 9 Collection is a variable unique in the second stage to represent managerial intervention. Similar to the first stage, we modeled the initial condition at t = 1 with a separate equation. We used N2i as an instrumental observed variable, which was the average default rate of other borrowers who applied for loans on the same day as borrower i.

$$Y_{2i,t}^{*} = \left\{ {\begin{array}{*{20}l} {\theta X_{2i,t} + \varphi N_{2i} + \varepsilon_{i,t} } \hfill & {t = 1,} \hfill \\ {\delta X_{2i,t} + \phi Y_{2i,t - 1} + \varepsilon_{i,t} } \hfill & {t = 2,...,T_{2i} .} \hfill \\ \end{array} } \right.$$
(5)

where θ, \(\varphi\), δ, ϕ are coefficients and ε is the error. We assumed Y2i,t to follow a probit distribution.

$$\begin{gathered} \Pr (Y_{2i,t} = 1) = \left\{ \begin{gathered} \Pr ob[\varepsilon_{i,t} > - (\theta X_{2i,t} + \varphi N_{2i} )] = \Phi (\theta X_{2i,t} + \varphi N_{2i} )\quad t = 1; \hfill \\ \Pr ob[\varepsilon_{i,t} > - (\delta X_{2i,t} + \phi Y_{{2i,t{ - }1}} )] = \Phi (\delta X_{2i,t} + \phi Y_{2i,t - 1} )\quad t = 2,...,T_{2i} . \hfill \\ \end{gathered} \right. \hfill \\ \Pr (Y_{2i,t} = 0) = \left\{ \begin{gathered} \Pr ob[\varepsilon_{i,t} \le - (\theta X_{2i,t} + \varphi N_{2i,t} )] = 1 - \Phi (\theta X_{2i,t} + \varphi N_{2i} )\quad t = 1; \hfill \\ \Pr ob[\varepsilon_{i,t} \le - (\delta X_{2i,t} + \phi Y_{2i,t - 1} )] = 1 - \Phi (\delta X_{2i,t} + \phi Y_{2i,t - 1} )\quad t = 2,...,T_{2i} . \hfill \\ \end{gathered} \right. \hfill \\ \end{gathered}$$
(6)

3.4 Conditional default model for the second stage

The behavior of default is conditional on delinquency. That is, the probit function of Y2i,t is conditional on X1i,t, M1i, Y1i,t-1, and Y1i,t = 1. Assuming \(\varepsilon\)i,t and \(\eta\)i,t are independent of X with a zero-mean normal distribution, we could derive the conditional probability as follows, where \(\rho\) is the correlation between \(\varepsilon\) and \(\eta\):

$$\begin{aligned} \Pr (Y_{2i,t} = 1|Y_{1i,t} = 1,X_{1i,t} ,M_{1i} ) = & E[\Pr (Y_{2i,t} = 1|\eta ,X_{1i,t} ,M_{1i} )|Y_{1i,t} = 1,X_{1i,t} ,M_{1i} )] \\ = & E\left\{ {\Phi [(\theta X_{2i,t} + \varphi N_{2i} - \rho \eta )/(1 - \rho^{2} )^{1/2} ]|Y_{1i,t} = 1,X_{1i,t} ,M_{1i} } \right\} \\ = & \frac{1}{{\Phi (\lambda X_{1i,t} + \tau M_{1i} )}}\int_{{ - (\lambda X_{1i,t} + \tau M_{1i} )}}^{\infty } {\Phi [(\theta X_{2i,t} + \varphi N_{2i} - } \rho \eta )/(1 - \rho^{2} )^{1/2} ]\phi (\eta )d\eta ; \\ \;\;\;\Pr (Y_{2i,t} = 1|Y_{1i,t} = 1,X_{1i,t} ,Y_{1i,t - 1} ) = E[\Pr (Y_{2i,t} = 1|\eta ,X_{1i,t} ,Y_{1i,t - 1} )|Y_{1i,t} = 1,X_{1i,t} ,Y_{1i,t - 1} )] \\ = & E\left\{ {\Phi [(\delta X_{2i,t} + \phi Y_{2i,t - 1} - \rho \eta )/(1 - \rho^{2} )^{1/2} ]|Y_{1i,t} = 1,X_{1i,t} ,Y_{1i,t - 1} } \right\} \\ = & \frac{1}{{\Phi (\beta X_{1i,t} + \gamma Y_{1i,t - 1} )}}\int_{{ - (\beta X_{1i,t} + \gamma Y_{1i,t - 1} )}}^{\infty } {\Phi [(\delta X_{2i,t} + \phi Y_{2i,t - 1} - } \rho \eta )/(1 - \rho^{2} )^{1/2} ]\phi (\eta )d\eta . \\ \end{aligned}$$
(7)

Similarly, we could infer the conditional probability of Y2i,t = 0 as follows:

$$\begin{gathered} \Pr ({\text{Y}}_{{{2}i,t}} = 0|Y_{1i,t} = 1,X_{1i,t} ,M_{1i} ) \hfill \\ \quad = \frac{1}{{\Phi (\lambda X_{1i,t} + \tau M_{1i} )}}\int_{{ - (\lambda X_{1i,t} + \tau M_{1i} )}}^{\infty } {\{ 1 - \Phi [(\theta X_{2i,t} + \varphi N_{2i} - } \rho \eta )/(1 - \rho^{2} )^{1/2} ]\} \phi (\eta )d\eta ; \hfill \\ \Pr ({\text{Y}}_{{{2}i,t}} = 0|Y_{1i,t} = 1,X_{1i,t} ,Y_{1i,t - 1} ) \hfill \\ \quad = \frac{1}{{\Phi (\beta X_{1i,t} + \gamma Y_{1i,t - 1} )}}\int_{{ - (\beta X_{1i,t} + \gamma Y_{1i,t - 1} )}}^{\infty } {\{ 1 - \Phi [(\delta X_{2i,t} + \phi Y_{2i,t - 1} - } \rho \eta )/(1 - \rho^{2} )^{1/2} ]\} \phi (\eta )d\eta . \hfill \\ \end{gathered}$$
(8)

Hence, the probability of the observed behavior of borrower i in a loan is expressed as follows:

$$\begin{gathered} f_{it} = (\Pr ({\text{Y}}_{{{2}i,1}} = 1|Y_{1i,1} = 1)^{{Y_{2i,1} = 1}} \Pr ({\text{Y}}_{{{2}i,1}} = 0|Y_{1i,1} = 1)^{{Y_{2i,1} = 0}} )^{{{\text{Y}}_{1i,1} = 1}} (1 - \Phi (\lambda X_{1i,t} + \tau M_{1i} ))^{{Y_{1i,1} = 0}} \hfill \\ \quad \quad \times \prod\limits_{2}^{{T_{i} }} {(\Pr ({\text{Y}}_{{{2}i,t}} = 1|Y_{1i,t} = 1)^{{Y_{2i,t} = 1}} {\Pr}_{i} ({\text{Y}}_{{{2}i,t}} = 0|Y_{1i,t} = 1)^{{Y_{2i,t} = 0}} )^{{{\text{Y}}_{1i,t} = 1}} (1 - \Phi (\beta X_{1i,t} + \gamma Y_{1i,t - 1} ))^{{Y_{1i,t} = 0}} } . \hfill \\ \end{gathered}$$
(9)

3.5 Model borrower heterogeneity

Unobserved borrower characteristics are highly likely to affect the behavior of delinquency and default. To account for such unobserved borrower heterogeneity, we assumed that the observed behavior of borrowers follows a mixture of distributions. Let fits indicate the probability of observing the behavior of i in month t if i belongs to segment s, and let mis indicate the logit form of the probability of i belonging to segment s.

As suggested by Heckman and Singer [27], a latent class model can be specified with a borrower’s segment membership, dependent on time-invariant borrower-specific financial information and demographic variables (Demo), which include age, gender, loan experience, education level, monthly income, average hometown GDP, house ownership, and distance from home. Equations (10) and (11) define fit and mis with regard to borrower heterogeneity.

$$f_{it} = \sum\limits_{s = 1}^{S} {m_{is} } f_{its} (\beta_{s} ,\lambda_{s} ,\tau_{s} ,\gamma_{s} ,\theta_{s} ,\delta_{s} ,\phi_{s} ,\varphi_{s} ,\mu_{s} ,\rho ),$$
(10)
$$m_{is} = \frac{{\exp (\mu_{is} + \mu_{is}^{\prime } Demo_{i} )}}{{\sum\nolimits_{k = 1}^{S} {\exp (\mu_{ik} + \mu_{ik}^{\prime } Demo_{i} )} }}.$$
(11)

With these equations, we could define the log-likelihood function as follows:

$$L(\beta ,\lambda ,\tau ,\gamma ,\theta ,\delta ,\phi ,\varphi ,\mu ,\rho ) = \sum\limits_{i = 1}^{n} {\ln f_{it} } .$$
(12)

4 Estimation results

Section 3 proposed the two-stage economic model. In this section, we employed this model to estimate the antecedents of delinquency and default behavior. Specifically, in Sect. 4.1, we provided an overview of the variables used in our analysis, along with their corresponding descriptive statistics. Subsequently, in Sect. 4.2, we presented the estimation results for the two-stage model, followed by an analysis of borrower heterogeneity in Sect. 4.3. Finally, in Sect. 4.4, we evaluated the marginal effect of debt collection on both default rate and revenue.

4.1 Variables and descriptive statistics

Table 2 presents the variables, their definitions, and the descriptive statistics. In the dataset, the overall delinquency rate was 0.68 and the default rate was 0.28. The platform took partial collection action on 17.7% of delinquent installments,Footnote 10 constituting approximately 12% (= 17.7% × 0.68) of all installments. Although the platform should have taken collection actions for all delinquent installments, the incomplete intervention offered a unique opportunity for us to assess the effect of debt collection.

Table 2 Variable definition and summary statistics (N = 37,242)

Most borrowers had an income level less than ¥3,000 per month and were from less-developed regions with a hometown GDP per capita of approximately ¥53,000. The borrowers’ average education level was low (Mean = 1.52; i.e., many borrowers had only completed junior college or college education), and only 18% of the borrowers owned their places of residence. Regarding the time distribution of loans, application dates were distributed uniformly in the sample period. For instance, the loans applied for on the weekends were proportional to the number of weekend days in the entire period.

4.2 Estimation results

We estimated the two-stage model with maximum likelihood estimation (MLE), using the quasi-Newton method to maximize the log-likelihood function. Table 3 shows the results of the fitted models. The two-stage model had lower Bayesian information criterion (BIC) and Akaike information criterion (AIC) scores compared to the one-stage model, suggesting that the two-stage model had a better fit to the data. To investigate borrower heterogeneity, we also fitted a latent class model with different segmentations, and the results showed that the BIC and AIC scores were the lowest when the segment was 3. Comparing the BIC and AIC of the homogeneous fixed-effect model and heterogeneous model revealed that the model with heterogeneity was better.

Table 3 Comparison of model specifications

Table 4 presents the results of the fitted two-stage models, assuming a homogeneous population or assuming a heterogeneous population of the three segments. Panel a (parameters of delinquency model) and Panel b (parameters of conditional default model) correspond to model specifications for the two-stage repayment behaviors (i.e., delinquency and default). Besides, as introduced in Sect. 3.5 (Eq. 11), we applied borrower heterogeneity analysis and reported the results in Panel c. The table also differentiates models for t = 1 and t > 1, in addition to presenting the effect of risk antecedents on delinquency (panel a) and default (panel b). The ρ value for the entire model was positive and significant, indicating that the first- and second-stage choices were highly dependent, and the rationality of the two-stage model was thus justified.

Table 4 Estimation results

A few salient patterns appeared in the results when consumers were assumed to be heterogeneous and when t > 1. In the first stage (i.e., Panel a), all our proposed installment repayment behavior variables were significant to delinquent behavior. Among them, ADRA was an indicator of low risk, and it was shown to reduce delinquency risk. The rest of the variables (i.e., ANDLY, ANDFT, ADD) were indicators of high risk, and they were revealed to increase delinquency risk. These effects were consistent across all segments, suggesting that these variables are critical new behavior variables for predicting delinquent behavior.

Furthermore, borrowers’ delinquency behavior in the previous month, Y1i,t-1, was a significant predictor of the delinquency risk in the following month. Segments 1, 2, and 3 represented low-, medium-, and high-risk borrowers, respectively. For segments 1 and 3, those who had been delinquent in the previous month tended to be delinquent the following month, suggesting a carry-over effect over months. The negative coefficient of segment 2 suggests that borrowers tended to swing between delinquent and nondelinquent behaviors.

Second, the date-related variables (Weekend, B/E of month, S/W vacation, Holiday) were also significant antecedents to delinquency risk. However, their effect varied across segment. In particular, segments 1 and 2 were less likely to be delinquent at the beginning and end of a month and during vacations and holidays. Personal income usually arrives at the beginning or end of a month. A vacation is a time when a borrower might receive extra income or incur extra expenses, and a holiday often suggests extra expenses. A plausible reason for segments 1 and 2 is that the borrowers had a greater amount of income or fewer expenses on those days. By contrast, segment 3 had a higher risk on such days: plausibly, they had less income or more expenses on those days.

Third, in the second stage (i.e., Panel b), most of our proposed installment repayment behavior variables were still significant to default behavior. Among them, ADRA reduced the default risk, whereas ANDLY and ADD increased the default risk. ANDFT was significant in segment 1 but not in segment 2 or 3. Furthermore, a borrower’s default behavior in the previous month, Y2i,t-1, was significant to the default risk of the following month, suggesting a carry-over effect over months.

Fourth, debt collection significantly reduced default risk, and the effect varied across segments. Segment 2 was the most sensitive to debt collection.

Fifth, the due date-related variables had similar but less pronounced effects on default behavior than on delinquency. S/W vacation reduced the default risk for segments 1 and 2, plausibly indicating extra income during vacations or avoidance of extra spending. S/W vacation increased the default risk for segment 3, plausibly because of extra spending. B/E of month reduced the default risk for segments 1 and 2, plausibly because of the income schedule, but had no effect on segment 3. Segment 1 had a lower default risk for the weekend, whereas segment 3 was more likely to default on the weekends, plausibly because of greater expenses.

Sixth, the interest rate increased delinquency and default risk in all segments. Segment 3 was more sensitive to interest rate than segment 1.

Finally, regarding the estimation results in Panel c, for the homogeneous model (i.e., Columns 1 and 2), we considered the borrower-fixed effect. Therefore, the parameters of borrower characteristics were absorbed by the borrower-fixed effect. As for the heterogenous model (i.e., from Columns 3 to 8), borrowers were optimally identified by our latent class model with three segments. Since we treated Segment 1 as the benchmark, the estimated parameters of borrower characteristics for Segment 1 (i.e., Columns 3 and 4 in Panel c) were omitted. The estimated parameters of borrower characteristics Segments 2 and 3 were reported in Columns 5 to 8, respectively. The last row of Panel c presents the estimation of ρ, which is the correlation between ε and η. As for borrower characteristics (i.e., Panel c), financial indicators (i.e., hometown GDP, house ownership) reduced delinquency and default risk, whereas individual income had no effect.

4.3 Borrower heterogeneity analysis

Table 5 presents a summary of the statistics for each borrower segment. Segment 1 contained approximately half (50.35%) of our sample, and its delinquency (61.92%) and default (20.56%) rates were the lowest among the three segments, suggesting relatively low-risk borrowers. The average days of delinquency in segment 1 were much less than those in segments 2 and 3; the borrowers in segment 1 repaid installments nearly 1 week in advance. The borrowers in segment 1 came from relatively more developed cities and had a higher monthly income and level of education, compared with those in segments 2 and 3. One-fifth of the borrowers in segment 1 lived in their own houses, instead of living in dormitories or rentals.

Table 5 Segment description

Segment 3 had the highest delinquent (75.73%) and default (46.41%) risk, nearly doubling the risk of segments 1 and 2. This segment contained the least proportion (23.65%) of our sample. Borrowers in segment 3 were from less-developed cities and had a relatively lower monthly income, compared with those in segments 1 and 2. The majority of the borrowers in segment 3 were junior college students living in dormitories. The average days of delinquency were more than 70 days. Overall, those in segment 3 were high-risk borrowers; therefore, the platform should screen them carefully during the loan application process.

Segment 2 comprised 26% of the borrowers. They had a high delinquency rate (71.49%) similar to that of segment 3, but their default rate (27.40%) was much lower than that of segment 3. These results suggest that segment 2 represented median-risk borrowers among the three segments. Therefore, the platform could garner greater profits from segment 2 by using delinquency fines as a penalty. The socioeconomic features of segment 2 were lower than those of segment 1 and higher than those of segment 3. Notably, the three segments corresponded well to our ex ante classification of borrower behavior in Fig. 1.

4.4 Marginal effect of debt collection

This section aims to evaluate the marginal effect of debt collection. To achieve this goal, we made the following assumptions: (1) The borrowers in this study are representative of the microloan student population in terms of demographics. (2) Collection of delinquent installments at time t does not affect the borrowers’ delinquency and default behavior for future installments of the loan. (3) Collection does not moderate the effects of the antecedents on the borrowers’ repayment behavior.

As mentioned in Sect. 4.1, the platform in fact conducted partial collection, with the rate of 17.7%. To better assess the marginal effect of debt collection, we simulated and compared the differences in default rates between two hypothetical scenarios: no debt collection and full debt collection. In the no- and the full-collection scenarios, the values of debt collection in our dataset were set to 0 and 1, respectively, for all overdue installments. Then, we applied the parameters from Table 4 to Eq. (8) to obtain the predicted probability of default for each installment. If the predicted probability was greater than 0.5 (i.e., \(\hat{P}_{default,it} \ge 0.5\)), the installment was considered a default. Because the artificial changes to collection might affect the model fitting result for segmentation, we recalculated the posterior probabilities of segmentation based on Eq. (10). Figure 3 shows the new segments based on the simulation results. N is the total number of installments in each segment under different collection scenarios. We observed that the number of installments in each segment were redistributed due to the debt collection mode change. Specifically, if the full-collection practice was enforced, compared with the partial-collection practice, segment 3 would lose the largest number of installments (from 8670 to 2844) while segment 2 would gain the most (from 9720 to 14232). These results suggest that many borrowers with high default risk would become medium-risk borrowers. This change would not only reduce risk but also increase profit, because segment 2 pays a greater amount of delinquency fines.

Fig. 3
figure 3

Default rates of no, partial, and full collection

We also notice that default rate varies according to the segments and collection scenarios. We can therefore compute the marginal effect of debt collection on default rate. Simulation result demonstrates that the overall default rate of the platform would be 31.14% if it does not enact any measure to collect debts: the default rate would decrease by 0.0278Footnote 11 (= 31.14% − 28.36%) if partial collection are adopted. By contrast, if the platform conducts debt collection for all installments, the default rate would decrease to 16.30%. Hence, full collection is clearly the superior option, whereas no collection is the worst. The marginal effect of partial collection on default risk thus can be calculated by the change in default rate between partial and no collection, relative to the percentage of loans in partial collection; in this case, it is 0.1571 (= 0.0278/17.7%).Footnote 12 Moreover, when using the no-collection scenario as the benchmark, partial collection could reduce the number of defaulted installments by 50.43% (= 0.1571/31.14%); that is, the number of defaulted installments in the no-collection scenario can be reduced by half.

Alternatively, the effect of collection on default can also be assessed by comparing the full- and no-collection scenarios. The difference in the aggregate level default rate between no- and full-collection is 0.1484 (= 31.14% − 16.30%). The marginal effect on default risk can be calculated by the change in default rate between full and no collection relative to the percentage of loans in full collection, which is 0.1484 (= 0.1484/100%). Full collection reduces the number of defaulted installments by 47.66% (= 0.1484/31.14%).

Collectively, the results from the comparison between full and no collection and partial and no collection arrive at similar conclusions in terms of the marginal effects of collection on the default rate and number of defaults. These results also indicate that the platform conducted collection randomly in the early stage; otherwise, the effect of collection calculated based on partial versus no collection would be significantly different from that of full versus no collection. The random collection practice justifies the treatment of collection as an exogenous variable in our model.

We further analyzed the marginal effect of debt collection on revenue contribution. The average number of days of a delinquent but not defaulted installment was approximately 23 (22.86) days. The average interest rate for these installments was 1.65% and the delinquency fine rate was 0.45% of the loan size per day. The size of an installment was ¥500. With these parameters, the revenue from each collection was ¥282.41 (= 500 × (1 + 1.65% + 0.45% × 23) × 50.43%). That is, when the total cost of debt collection was less than 56% (= 282.41/500) of the installment amount, the corresponding debt collection implementation would be averagely lucrative for the platform.

5 Summary and implications

5.1 Summary

With an installment-level dataset, this study proposed and empirically tested a two-stage model of borrowers’ delinquency and default behavior during the repayment process. The results confirm the better fitting of the two-stage model to the data, and they thus suggest that the variables of the borrowers’ previous repayment behavior have a significant influence on borrowers’ delinquency risk and default behavior. The results also suggest that the due date of an installment affects delinquency and default.

The heterogeneity analysis results suggest that borrowers in the three segments behaved differently. Moreover, the results suggest a positive effect of debt collection on debt recovery. We found that compared with the no-collection scenario, the default rate would decrease to 16.30% if the platform conducts collection for all the delinquent installments. When a platform collects a delinquent installment, its subsequent default rate would decrease by 0.1571. The marginal revenue of collecting an installment is ¥282.41.

5.2 Theoretical implications

Our study offers noteworthy theoretical implications. First, to help microloan managers and lenders make a better use of borrowers’ monthly repayment behavior, departing from the traditional one-stage model, this study proposes a two-stage probability model to factor in the dynamics of borrowers’ repayment behavior for predicting delinquency and default behavior. A one-stage loan level model is useful for screening loan applications but lacks the capability to assess the dynamics of individual risks. The two-stage model offers not only screening capability, but also the capability to capture the borrower’s dynamic behavior during the repayment process and accordingly adjust the delinquency and default risk of the individual for each month. This comprehensive model covers the entire loan life cycle.

Second, according to our review of the literature, this study is among the earliest to incorporate debt collection in the model and investigate its effect on delinquency and default risk. In addition to carefully screening loan applicants in the preloan stage, debt collection is often the only managerial intervention to control the risk of an outstanding loan. Hence, assessing the economic benefit of debt collection relative to its cost is imperative. The proposed two-stage model offers a new instrument to make such an assessment for debt collection and other managerial interventions in a loan’s life cycle.

Third, this study proposes a set of novel risk antecedents, particularly borrowers’ repayment behaviors and characteristics of an installment due date. Only a staged model can leverage these valuable variables. A traditional one-stage loan-level analysis would have to forego such nuanced behavior measurements and date-specific factors. The behavior variables offer a set of indicators that enable the constant monitoring of borrower risk.

5.3 Practical implications

The key practical contribution of the two-staged model is that it offers the capability to assess loan risks dynamically over a loan’s life cycle. First, this study considered borrowers’ heterogeneity and offers an instrument to classify borrowers into segments. Although many studies have identified borrowers’ risk antecedents in microloans [6, 57], our findings demonstrate how borrowers behave differently in various segments.

In this study, segment 2 represented medium-risk borrowers and higher revenues, and segment 3 represented high-risk borrowers. During the screening stage, a platform can classify applicants to a segment, depending on a variety of variables, with an algorithm such as k-nearest-neighbors. This system of classification is the first step of risk and value assessment. Thus, a platform could set higher screening criteria for segment 3 to increase the probability of loan repayment and profits.

After a platform grants a loan, it should closely monitor segment 3 by, for example, strengthening managerial interventions during the loan’s life cycle and checking whether the due date is on a weekend, at beginning or end of a month, or during school vacations or holidays. This is because the risks of segment 3 would be higher during such periods.

Second, a platform can employ our model to forecast the delinquency and default risk of each installment. As an illustration, we fitted a model with loans applied for from August 2014 to January 2015 (i.e., three-quarters of the sample) and used the result to forecast the repayment behaviors of borrowers in the rest of the sample. Table 6 indicates that the two-stage model provides a more accurate forecast than do one-stage models for installments. Both the root mean-squared error (RMSE) and mean absolute percentage error (MAPE) of the two-stage model are shown to be lower.Footnote 13

Table 6 Comparison of model prediction precision (N = 9440)

Third, the two-stage model can depict the risk of a loan over time. Figure 4 illustrates the predicted risk of each installment over time. Figure 4 illustrates how the risk of default would change if a collection action had been employed, given a borrower’s previous repayment behavior (as presented in the table in this figure). Based on the default risk of the upcoming installment, the platform can set a threshold to activate managerial interventions such as a collection action. After a collection action, the risk dynamics of the borrower would change accordingly.

Fig. 4
figure 4

Predicted risk of a loan in segment 2

For platform managers, the model can also provide a summary report of borrower segments. For example, Fig. 5 presents the risk trend of segment 2.

Fig. 5
figure 5

Segment level risk trends (segment 2)

5.4 Limitations

This study has limitations that should be considered when interpreting the study results. First, the dataset did not have antecedents, such as the loan’s purpose and social capital, and inclusion of such factors would have enhanced the predictive ability of the risk measurement model. Second, this study neither differentiated the types of intervention (e.g., notice, warning, and shame penalty) nor considered the lasting effect of debt collection across multiple installments [7]. Third, this study simplified debt collection to a binary variable and its time of occurrence, neglecting the choice of collection action types and frequency of those actions. Fourth, we have only considered a single platform and the delinquency rate and default rate of our focal platform were high during the research period, which may result in biased estimations and hamper the generalizability of the findings. Future research, with more installment-level data available, would alleviate this issue by employing the proposed two-stage model across other platforms, regions, and periods.