# Python code to compute machine learning classification evaluation metrics (Accuracy, AUC-ROC, MCC) using sklearn library

There are several evaluation metrics (e.g., accuracy, AUC-ROC, Mathew correlation coefficient, precision, recall, F1 score, confusion matrix, etc.) that are used to determine the performance of supervised machine learning classification algorithms. The selection of a metric to assess the performance of a classification algorithm depends on the input data. For example, if your data are highly imbalanced, “accuracy” should not be used; MCC or F1 score can be the right metrics. In this post, I am not going to discuss the details of any of the evaluation metrics. I assume that you understand those evaluation metrics. Here, I will write a Python code that uses functions from the sklearn library to compute those metrics. In this Python code, I have given the formula to calculate them, that should help you in understanding them.

Here is the code:

``````from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import matthews_corrcoef

def model_training_tesing(data, label):
# use a classification method
clf = LogisticRegression(max_iter=5000)
# generate 5-fold cross-validated estimates for each input data point
# compute predicted probability instead of label.
return cross_val_predict(clf, data, label, cv=5, method='predict_proba')

def compute_classification_evaluation_metrics(probabilities, label):
# determine y_true and y_predicted.
y_true = [] # store true label of records
y_pred_auc = [] # store class 1 probabilities
y_pred_acc = [] # store predicted label
for j in range(len(label)):
y_true.append(label[j])
y_pred_auc.append(probabilities[j])  # class 1 probabilities
y_pred_acc.append(round(probabilities[j]))   # predicted label

# compute confusion matrix
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_acc).ravel()
print("tn, fp, fn, tp ---> ", tn, fp, fn, tp)

# accuracy
acc = accuracy_score(y_true, y_pred_acc)
print("Accuracy ---> {0}".format(acc))

# AUC-ROC
roc = roc_auc_score(y_true, y_pred_auc)
print("AUC-ROC ---> {0}".format(roc))

# Matthews correlation coefficient (MCC)
mcc = matthews_corrcoef(y_true, y_pred_acc)
print("MCC ---> {0}".format(mcc))

# sensitivity, recall, hit rate, or true positive rate (TPR)
tpr = tp/(tp + fn)
print("Recall/Sensitivity ---> {0}".format(tpr))

# specificity, selectivity or true negative rate (TNR)
tnr = tn/(tn + fp)
print("Specificity ---> {0}".format(tnr))

# precision or positive predictive value (PPV)
ppv = tp/(tp + fp)
print("Precision ---> {0}".format(ppv))

# negative predictive value (NPV)
npv = tn/(tn + fn)
print("Negative Predictive Value ---> {0}".format(npv))

# miss rate or false negative rate (FNR)
fnr = 1 - tpr
print("False Negative Rate ---> {0}".format(fnr))

# fall-out or false positive rate (FPR)
fpr = 1- tnr
print("False Positive Rate ---> {0}".format(fpr))

# false discovery rate (FDR)
fdr = 1 - ppv
print("False Discovery Rate ---> {0}".format(fdr))

# false omission rate (FOR)
fomr = 1 - npv
print("False Omission Rate ---> {0}".format(fomr))

# F1 score - harmonic mean of precision and recall [2*tp/(2*tp + fp + fn)]
f1 = 2* ppv * tpr/(ppv + tpr)
print("F1 Score ---> {0}".format(f1))

if __name__ == '__main__':
"""
This program will compute several evaluation metrics that are used in classification algorithms.
"""
# load sklearn breast cancer data
X = data.data
y = data.target     # binary label 0 and 1

# get classification results
predicted_probs = model_training_tesing(X, y)

# compute classification evaluation metrics
compute_classification_evaluation_metrics(predicted_probs, y)``````

This code executed successfully on Python3.8.5 and gave the following output:

tn, fp, fn, tp —> 195 17 11 346
Accuracy —> 0.9507908611599297
AUC-ROC —> 0.9908302943819036
MCC —> 0.8943682764227554
Recall/Sensitivity —> 0.969187675070028
Specificity —> 0.9198113207547169
Precision —> 0.953168044077135
Negative Predictive Value —> 0.9466019417475728
False Negative Rate —> 0.03081232492997199
False Positive Rate —> 0.08018867924528306
False Discovery Rate —> 0.04683195592286504
False Omission Rate —> 0.05339805825242716
F1 Score —> 0.9611111111111111

If you get any error, you can post that in the comment box.

## Similar Posts

This site uses Akismet to reduce spam. Learn how your comment data is processed.