• ## An optimization problem has to be solved by adjusting the threshold and seeking the optimum in order to balance the trade-off between the decrease in revenue and a decrease in cost.

An optimization problem has to be solved by adjusting the threshold and seeking the optimum in order to balance the trade-off between the decrease in revenue and a decrease in cost.

Then by using the layout of the confusion matrix plotted in Figure 6, the four regions are divided as True Positive (TN), False Positive (FP), False Negative (FN) and True Negative (TN) ifвЂњSettledвЂќ is defined as positive and вЂњPast DueвЂќ is defined as negative,. Aligned with all the confusion matrices plotted in Figure 5, TP could be the good loans hit, and FP may be the defaults missed. We have been interested in both of these areas. To normalize the values, two widely used mathematical terms are defined: real good Rate (TPR) and False Positive Rate (FPR). Their equations are shown below:

## In this application, TPR may be the hit price of great loans, also it represents the ability of earning funds from loan interest; FPR is the lacking rate of standard, plus it represents the probability of taking a loss.

Receiver Operational Characteristic (ROC) bend is considered the most widely used plot to visualize the performance of a classification model after all thresholds. In Figure 7 left, the ROC Curve associated with Random Forest model is plotted. This plot basically shows the connection between TPR and FPR, where one always goes into the direction that is same one other, from 0 to at least one. a classification that is good would will have the ROC curve over the red standard, sitting because of the вЂњrandom classifierвЂќ. The region Under Curve (AUC) can be a metric for assessing the classification model besides accuracy. The AUC associated with the Random Forest model is 0.82 out of 1, which will be decent.

Although the ROC Curve plainly shows the partnership between TPR and FPR, the limit is an implicit adjustable. The optimization task cannot be achieved solely by the ROC Curve. Consequently, another measurement is introduced to incorporate the limit adjustable, as plotted in Figure 7 right. Considering that the orange TPR represents the capacity of creating cash and FPR represents the opportunity of losing, the instinct is to look for the limit that expands the gap between curves whenever possible. In this instance, the sweet spot is about 0.7.

You will find restrictions to the approach: the FPR and TPR are ratios. Also we still cannot infer the exact values of the profit that different thresholds lead to though they are good at visualizing the impact of the classification threshold on making the prediction. The FPR, TPR vs Threshold approach makes https://badcreditloanshelp.net/payday-loans-oh/wilmington/ the assumption that the loans are equal (loan amount, interest due, etc.), but they are actually not on the other hand. Individuals who default on loans may have a greater loan amount and interest that want become reimbursed, and it also adds uncertainties into the results that are modeling.

## Luckily for us, step-by-step loan amount and interest due are available from the dataset it self.

The thing staying is to locate a method to connect these with the limit and model predictions. It’s not tough to determine a manifestation for revenue. These two terms can be calculated using 5 known variables as shown below in Table 2 by assuming the revenue is solely from the interest collected from the settled loans and the cost is solely from the total loan amount that customers default