Supporting Lenders: Loan Default Prediction with Data-Science

Objective:

In this excercise, a bank's consumer credit department seeks to create a transparent, interpretable credit scoring model aligned with the Equal Credit Opportunity Act (ECOA) to simplify decisions on home equity credit approvals. Context:

Addressing credit defaults is crucial to minimize risks for lenders, investors, and the financial industry while fostering trust, compliance, and market stability through non-discriminatory lending practices.

Approach:

We develop a (1) logistic regression model, (2) decision tree model and (3) random forest classifer model to find the optimal approach to predicting likelihood of loan default, to address the above questions (among others) and to further identify what features most strongly indicate the likelihood of default of a borrower and see how accurately those features can predict the likelihood of default and to what degree of certainty. The model, uses data from recent loan applicants to ensure fairness and justify adverse decisions.

Summary of Findings:

Our findings show that the decision tree model is recommended for predicting loan default due to its high recall score (0.78 on test data), balancing bias and variance, and strong generalization. It outperforms logistic regression and random forest models, which showed overfitting or poor generalization. Key insights highlight Debt-to-Income Ratio (DTI) as the primary predictor, with CLAGE (credit age) inversely correlated to default risk. The decision tree’s explainability, and low computational cost make it an ideal, efficient, and transparent solution.

Key Data Science concepts covered:

Logistic Regression

Decision Trees

Random Forest

Hyperparameter Tuning

Exploratory Data Analysis

Click the link below to download the pdf file.

Supporting Lenders: Loan Default Prediction with Data-Science

Recent Posts

Comments

AIconomics