sklearn_classification 평가 방법

jinny95 2023. 4. 28. 18:46

2023. 4. 28. 18:46

Accuracy (정확도) 모든 데이터에 대해 클래스 라벨을 얼마나 잘맞혔는지 계산

Accuracy = 1 - (못맞춘 정도)

[데이터 정리]

이번 예제에서는 전복 데이터셋 파일을 가지고 진행한다. 데이터 파일은 두개로 abalone.txt는 전복 데이터를abalone_attributes는 features 데이터를 가지고 있다. 전복의 성별을 구별하는 것을 목표로 모델을 만들고 accuracy를 평가해보자.

파일 임포트

import os
from os.path import join

abalone_path = join('.','abalone.txt')   # 현재 위치, 파일이름
column_path = join('.','abalone_attributes.txt')

abalone_columns = list()
for l in open(column_path):
  abalone_columns.append(l.strip())

데이터 프레임 형태로 바꿔주기

data = pd.read_csv(abalone_path, header=None, names=abalone_columns)     # pd 데이터 프레임 형태로 가져오기
data.head()

출력>

미성숙한 전복 데이터 제거, 성별을 숫자형태로 encoding하고 성별 feature을 삭제해버린다.

data = data[data['Sex'] != 'I']
label = data['Sex'].map(lambda x: 0 if x =='M' else 1)   #lamda식으로 x의 값 기본 0, x가 M이면 넘어가고 아니면 1로 줘라
del data['Sex']

데이터 쪼개기

X_train, X_test, y_train, y_test = train_test_split(data,label,test_size=0.2, random_state = 2023)

[알고리즘 학습=>모델]

rf = RandomForestClassifier(max_depth=5)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

[정확도 평가]

정확도를 체크하는 모듈들 임포트

from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score

print('Accuracy: {:.2f}'.format(accuracy_score(y_test, y_pred)))        # 값으로 원본데이터와 예측치 넣어준다
print('Precision: {:.2f}'.format(precision_score(y_test, y_pred)))  
print('Recall: {:.2f}'.format(recall_score(y_test, y_pred)))  
print('AUC: {:.2f}'.format(roc_auc_score(y_test, y_pred)))  

    출력>
    Accuracy: 0.53
    Precision: 0.53
    Recall: 0.23
    AUC: 0.52

[최적의 depth 구하기]

best_model_depth = 0
best_model_accuracy = 0

for i in [2,3,4,5,6,7,8,9,10]:
  rf = RandomForestClassifier(max_depth=i)
  rf.fit(X_train, y_train)
  y_pred = rf.predict(X_test)

  acc = accuracy_score(y_test, y_pred)

  print('Accuracy: i={} {:.2f}'.format(i, acc * 100))   # 값을 보고 가장 결과값이 높게 나오는 depth 알 수 있다, 같은 조건이면 더 단순한 i가 낮은 값을 선택

  if best_model_accuracy < acc:
    best_model_depth = i
    best_model_accuracy = acc

print('-------------------------------')
print('best_model_depth={0}, best_model_accracy={1}'.format(
    best_model_depth, best_model_accuracy))
    
    
    출력>
    Accuracy: i=2 53.09
    Accuracy: i=3 51.68
    Accuracy: i=4 52.20
    Accuracy: i=5 53.44
    Accuracy: i=6 54.14
    Accuracy: i=7 53.62
    Accuracy: i=8 53.09
    Accuracy: i=9 54.85
    Accuracy: i=10 54.32
    -------------------------------
    best_model_depth=9, best_model_accracy=0.5485008818342152

'machine_learning' 카테고리의 다른 글

sklearn_regression 2. Linear regression (0)	2023.04.28
sklearn_regression 1. 예시용 당뇨병 데이터 정리 (0)	2023.04.28
sklearn_classification 4. Decision Tree(의사결정 나무), Random Forest (0)	2023.04.28
sklearn_classification 3. Support Vector Machine (0)	2023.04.28
sklearn_classification 2. Logistic Regression (0)	2023.04.28

JINHEE's lab

sklearn_classification 평가 방법

'machine_learning' 카테고리의 다른 글

+ Recent posts

티스토리툴바