Skip to content

eliasdjaoui/exploratory-data-analysis---customer-loans-in-finance400

Repository files navigation

exploratory-data-analysis---customer-loans-in-finance

Table of Contents, if the README file is long A description of the project: what it does, the aim of the project, and what you learned Installation instructions Usage instructions File structure of the project License information

Loan Data Dictionary - remember that these are currently the old data types, remember to provide a column with the new datatypes

Field Description Type
id Unique ID of the loan string
member_id ID of the member who took out the loan string
loan_amount Amount of loan the applicant received float
funded_amount The total amount committed to the loan at that point in time float
funded_amount_inv The total amount committed by investors for that loan at that point in time float
term The number of monthly payments for the loan integer
int_rate (APR) Annual (APR) interest rate of the loan float
instalment The monthly payment owed by the borrower, inclusive of interest float
grade Loan company (LC) assigned loan grade string
sub_grade LC assigned loan sub grade string
employment_length Employment length in years integer
home_ownership The home ownership status provided by the borrower string
annual_inc The annual income of the borrower float
verification_status Indicates whether the borrower's income was verified by the LC or the income source was verified string
issue_date Issue date of the loan date
loan_status Current status of the loan string
payment_plan Indicates if a payment plan is in place for the loan (indicating the borrower is struggling to pay) string
purpose A category provided by the borrower for the loan request string
dti Debt-to-income ratio: calculated using the borrower's total monthly debt payments on the total debt obligations, excluding mortgage and the LC loan, divided by the borrower’s self-reported monthly income float
delinq_2yr The number of 30+ days past-due payments in the borrower's credit file for the past 2 years integer
earliest_credit_line The month the borrower's earliest reported credit line was opened date
inq_last_6mths The number of inquiries in the past 6 months (excluding auto and mortgage inquiries) integer
mths_since_last_record The number of months since the last public record integer
open_accounts The number of open credit lines in the borrower's credit file integer
total_accounts The total number of credit lines currently in the borrower's credit file integer
out_prncp Remaining outstanding principal for the total amount funded float
out_prncp_inv Remaining outstanding principal for the portion of the total amount funded by investors float
total_payment Payments received to date for the total amount funded float
total_rec_int Interest received to date float
total_rec_late_fee Late fees received to date float
recoveries Post charge-off gross recovery float
collection_recovery_fee Post charge-off collection fee float
last_payment_date Date on which last payment was received date
last_payment_amount Last total payment amount received float
next_payment_date Next scheduled payment date date
last_credit_pull_date The most recent month LC pulled credit for this loan date
collections_12_mths_ex_med Number of collections in the past 12 months, excluding medical collections integer
mths_since_last_major_derog Months since most recent 90-day or worse rating integer
policy_code Publicly available policy code: 1 for new products, 2 for products not publicly available integer
application_type Indicates whether the loan is an individual application or a joint application with two co-borrowers string
  1. Columns with High Null Percentage (> 50%) Columns:

mths_since_last_record (88.60%) mths_since_last_major_derog (86.17%) next_payment_date (60.13%) mths_since_last_delinq (57.17%)

We deal with these columns by dropping them.

  1. Columns with Moderate Null Percentage (5-10%) Columns:

int_rate (9.53%) term (8.80%) funded_amount (5.54%) Approach:

Imputation: Given that these columns have a moderate percentage of null values, imputation is likely the best approach. You can impute int_rate with the mean or median interest rate. For term, you might fill in the mode (most common loan term). funded_amount could be imputed with the mean, median, or even a predicted value based on other available data.

  1. Columns with Low Null Percentage (< 5%) Columns:

employment_length (3.91%) last_payment_date (0.13%) collections_12_mths_ex_med (0.09%) last_credit_pull_date (0.01%) Approach:

Imputation: Since these columns have very few missing values, imputation is straightforward. For employment_length, you might fill in with the median or mode. For last_payment_date and last_credit_pull_date, consider using the mode or some forward/backward fill. The collections_12_mths_ex_med might be filled with 0 if it represents the count of collections and nulls could indicate no collections.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published