2.3 Data and Institutional Background
2.4.2 ATE with Continuous Treatment
In the previous section, I assumed that all borrowers were treated equally by the policy which essentially means that all agents received the same level of treatment. The purpose of the previous section was to relate this paper to the existing literature in which such a treatment was considered binary, and also establish a baseline case
for the findings when the assumption of binary treatment is relaxed. In this section I take into account the fact that different agents were impacted differently by the policy when evaluating its effect on borrower outcomes. To do this, I first measure the heterogeneity in the treatment levels for all agents and use that variation in the treatment to estimate its average effect. Essentially, I estimate the effect of the implementation of finer credit scoring in a more realistic way in which treatment is continuous.
Due to the variation in the treatment levels in terms of changes in prices, one should expect variation in the responses to treatment levels. The nature of the policy change was such that some borrowers were better off while others were worse off with the new policy. This means that some borrowers faced price increases, making them worse off, while others faced price decreases, making them better off. It is important to note here that this situation is representative of a setting in which the implementation or improvement of credit scoring helped to distinguish agents based on the credit-worthiness. Despite the heterogeneity in treatment levels, earlier literature mostly considered this a binary treatment. A key contribution in this paper is that I consider this to be a continuous treatment with the intention of estimating more realistic and precise average treatment effects.
To estimate the average effect of the difference in price on borrower outcomes of interest, I first aggregated the data at the level of risk category defined by the level of Estimated Loss Rate (ELR) under finer (new) credit scoring policy. Define ∆Yj as
the mean difference in the outcome variable within category j, before and after the implementation of policy. Within each category, the main outcomes of interest were the number of borrowers who applied for loans, the mean loan amount, the fraction of borrowers who defaulted, and the mean fraction of loan repaid. Similarly, define
∆Pj as the mean difference in the prices charged to all identical borrowers within
category j. Once they are constructed (as explained below), the average treatment effect of the change in prices can be estimated with OLS where each observation is weighted by the number of borrowers of category j in the market. To be precise, I estimate the following regression using weighted OLS for different outcome variables of interest:
∆Yj =β0 +β1×∆Pj +εj
Where ∆Pj =Pjnew−Pjold and Pjt denotes the price charged to all borrowers in
category j in time period t for t ∈ {old, new} denoting pre-policy and post policy time periods respectively. Even though the price is the same for every borrower in the same category and same time period, this price may vary over time. In cases where this price varied over time, Pt
j was constructed by taking the weighted average of the
prices over time within the given category j for the given time period t. However, to actually get to the final measure of ΔPj and subsequent measures of ∆Yj, I first
classified all borrowers under the old credit scoring policy according to the new credit scoring policy and the details of it can be found in the next section.
As noted above, ∆Yj is defined as the mean difference in the outcome variable
within category j, before and after the implementation of policy. For each outcome variable, ∆Yj is defined as follows for each of the three estimations:
Difference in the mean loan amount requested within category j:
∆Yj = 1 nnew j nnew j X i=1
log Loan Amountnewij − 1
nold j nold j X i=1
Where Loan Amountij is the requested loan amount by borroweriin credit cate-
goryj, and nt
j denotes the number of borrowers in categoryj in time periodt. Time
period new is defined as the period after the new policy was implemented and time period old is defined as the period before the new policy was implemented.
Difference in the fraction of borrowers who defaulted within category
j : ∆Yj = 1 nnew j nnew j X i=1
Def aultnewij − 1
nold j nold j X i=1
Def aultoldij
Where Def aultt
ij is equal to 1 if borrower i in category j repaid less for a loan
that was issued in period t and Def aultt
ij is equal to 0 if that borrower repaid in
full.
Difference in mean fraction of loan repaid within category j:
∆Yj = 1 nnew j nnew j X i=1 Repaymentnewij − 1 nold j nold j X i=1 Repaymentoldij Where Repaymentt
ij ∈ [0,1] is the exact fraction of loan principal repaid by
borroweriin categoryj for a loan that was issued in time periodt. Finally, to account for the variation in the number of borrowers within each risk category, I weighted each observation by total the number of borrowers in category j as a fraction of the total number of borrowers in the market. This weight is defined below:
W eightj = nnewj +noldj PJ j=1 n new j +noldj
Matching Borrowers Under Two Credit Scoring Regimes
As mentioned earlier, to construct the variables defined above and to measure the different treatment levels of all borrowers, I need to classify these borrowers into different credit score categories according which as defined by the finer (newer) credit scoring function. For the borrowers who were issued loans under this new regime, this is simply the credit category they were assigned by the platform so it is already defined in the data. However, for borrowers under the old regime, I conducted a matching exercise. To explain this, I turn to the details of how the platform’s its internal risk scores changed and how the platform used these scores to determine prices.
Let there be two types of borrowers: a low risk type denoted by Land a high risk type denoted by H. Before the policy, both these sets of borrowers were considered identical in terms of their repayment probabilities and so their estimated loss rate was the same, denoted byELR¯ . Based on this, these borrowers also faced the same price denoted by ¯P. After the policy was implemented, the platform could distinguish between L−type and H−type borrowers and hence assigned ELRh to high risk type and ELRl to the low risk type. The corresponding prices for these types were
Ph and Pl. Here ELRh >ELR > ELR¯ l and Ph >P > P¯ l.
With the impact of policy, the H−type borrowers face a higher price after the policy and this difference is calculated asPh– ¯P > 0. Similarly, theL−typeborrowers
face a lower price after the policy and this difference is calculated asPl– ¯P <0. Once
the change in price for each type of borrowers is calculated, it can be used as the treatment level for that type and estimate the average treatment effect of the policy more realistically.
rowers who applied for loans before the policy was implemented. To address this challenge, I use machine learning to estimate a function that predicts ELRnew from
borrower credit variables and macroeconomic variables at the time of loan applica- tion. It is essential to note here that the platform itself uses these same variables to assign ELRnew to each borrower and since I observe all these, I estimate this
function directly from the data. This can be done using OLS too, but since it is a pure prediction problem, I expanded the set of techniques to include some from the machine learning literature and evaluated the performance of each technique using pseudo-out-of-sample Root Mean Squared Error (RMSE). The random forest algorithm gave the lowest RMSE of 0.0054 and also gave out-of-sample R-Squared of 0.98 which is why I picked it as the final estimation technique for predictingELRnew.
Alternative techniques, like Lasso and Ridge gave slightly higher RMSE of 0.0066 and 0.0076, respectively.
With this estimated function I predicted the ELRnew for borrowers in the pre-
policy period and this gave me a single measure of borrower type according to the platform’s new policy. Furthermore, I used this ELRnew to find the closest match
for each borrower from the pre-policy period to a set of borrowers in the post-policy period to assign Pnew to each borrower in the pre-policy period. Similarly, I used
this matching variable to assign ¯P to each borrower in the post-policy period based on the closest match for that borrower from the pre-policy period. As defined above, I calculated the ∆Pi for each borrower given the complete sets of prices.
Figure 2.3 shows the empirical distribution of ∆Pi and it can be seen that for a
large number of borrowers, the difference in price was close to zero while there was non-trivial mass on either side of zero. This Figure highlights my point earlier that not all borrowers were equally treated by the policy. Instead, some borrowers were
better off from this policy since they received a price decreased while some others received a price increase. Intuitively, the borrowers who received a price decrease would be the ones who were of lower risk type but the platform assigned grouped them with higher risk types and thus charged a higher price. Similarly, the borrowers who faced an increase in the price were the ones who were benefitting from a lower price before the policy because the platform grouped them with lower risk type borrowers. This highlights an important feature of the variation generated in prices due to this policy: the direction and magnitude of the price difference is not random but is determined by the difference in the platform’s estimates of the two risk scores (loss rates) for each borrower.
Given this variation in prices it is not exactly clear how borrowers would respond to changes in prices. A typical borrower should reduce the loan amount requested if the interest rate increase, given everything else stays constant. However, a risky borrower, who has a higher probability of default, may also be less sensitive to price in his choice of loan amount as compared to a less risky borrower. In the case of this paper, price is increasing for risky borrowers and decreasing for less risky borrowers. In this case the magnitude of the average effect from a unit increase in price will depend on the distribution of different types of borrowers and the levels of price differences they face with the implementation of this policy. Furthermore, the default and repayment choices of borrowers are also going to differ depending on the riskiness of the borrower because the by definition, the riskiness of a borrower reflects the expected ex-ante repayment outcome of the borrower. However, when the prices are change in a way that a high risk borrower’s price is increased while a low risk borrower’s price is decreased, the average effect could be ambiguous. While a decrease in interest decrease the likelihood of default, it also leads to an increase
in requested loan amount which in turn may increase the likelihood of default.