[ML] 분류 모델 성능 평가 지표

JEONGHEON 2023. 1. 17. 10:08

머신러닝 분류 모델에서 성능 평가 지표로 Confusion Matrix를 기반으로 하여 Accuracy, Precision, Recall, F1 score 등으로 측정합니다

Confusion Matrix

		Prediction
		positive(1)	negative(0)
Actual	positive(1)	TP	FN
Actual	negative(0)	FP	TN

TP(True Positive) : 실제 positive인데 예측도 positive인 경우
FN(False Negative) : 실제 positive인데 예측은 negative인 경우 - Type II error
FP(False Positive) : 실제 negative인데 예측은 positive인 경우 - Type I error
TN(True Negative) : 실제 negative인데 예측도 negative인 경우

Accuracy

모델이 전체 분류 건수 중 올바르게 분류한 건수를 보는 지표

하지만 데이터가 불균형할 때 accuracy만으로는 제대로 분류했는지 알 수 없기에

precision과 recall을 사용합니다

Precision

모델이 positive라고 예측한 것들 중 실제 정답이 positive인 비율

실제 정답이 negative인 데이터를 positive라고 잘못 예측하면 안 되는 경우 중요한 지표가 됩니다

Recall

실제 정답이 positive인 것들 중 모델이 positive라고 예측한 비율

실제 정답이 positive인 데이터를 negative라고 잘못 예측하면 안 되는 경우 중요한 지표가 됩니다

F1 Score

precision과 recall의 조화평균

precision과 recall은 주로 반비례하고 이것을 precision/recall 트레이드오프라고 합니다

F1 Score은 precison과 recall이 한쪽으로 치우쳐지지 않고 모두 클 때 큰 값을 가집니다

ROC Curve

거짓 양성 비율(FPR : false positive rate)에 대한 진짜 양성 비율(TPR : true positive rate)의 곡선

FPR은 1에서 음성으로 정확하게 분류한 음성 샘플의 비율인 진짜 음성 비율(TNR : true negative rate)을 뺀 값으로

TNR은 특이도 TPR은 민감도이므로 민감도에 대한 1-특이도 그래프입니다

AUC(area under the curve)를 평가 지표로 활용합니다

Python 실습

import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

plt.rc('font', family = 'AppleGothic') # mac 
# plt.rc('font', family = 'Malgun Gothic') # window
plt.rc('font', size = 12)
plt.rc('axes', unicode_minus = False) # -표시 오류 잡아줌

from sklearn.datasets import load_iris
iris = load_iris()

X, y = iris['data'], (iris['target'] == 2).astype(np.float64)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5555)

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=5555)

model.fit(X_train, y_train)
pred = model.predict(X_test)

from sklearn.metrics import confusion_matrix

confusion_matrix(y_test, pred)

>>> array([[20,  1],
           [ 2,  7]])

print('--------- RandomForest Score ---------')

from sklearn.metrics import precision_score, recall_score, f1_score
print('Precision Score : ', round(precision_score(y_test, pred) * 100, 2))
print('Recall Score : ', round(recall_score(y_test, pred) * 100, 2))
print('F1 Score : ', round(f1_score(y_test, pred) * 100, 2))

>>> '--------- RandomForest Score ---------'
    Precision Score :  87.5
    Recall Score :  77.78
    F1 Score :  82.35

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_test, pred)

def plot_roc_curve(fpr, tpr, label=None) :
    plt.plot(fpr, tpr, lw=2, label=label)
    plt.plot([0, 1], [0, 1], 'k--')
    plt.axis([0, 1, 0, 1])
    plt.xlabel('거짓 양성 비율')
    plt.ylabel('진짜 양성 비율')
    
plot_roc_curve(fpr, tpr)
plt.show()

from sklearn.metrics import roc_auc_score

print('AUC Score : ', round(roc_auc_score(y_test, pred) * 100, 2))

>>> AUC Score :  86.51

저작자표시 비영리 변경금지 (새창열림)