識別結果の評価 - メモ的な何か

手元にトレーニングようデータX,yがあったら
X_train, y_train, X_test, y_test のトレーニングデータと検証データに分割する

from sklearn.cross_validation import train_test_split
#トレーニングデータと検証データに分割
#全体の30%をテストデータにする
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=0)

次に識別結果の評価として誤識別されたサンプルの数と正解率を出力する.

# 識別器が誤分類したサンプルの数
print("Misclassified samples: %d"%((y_test != y_pred).sum()))
# 識別器の正解率
from sklearn.metrics import accuracy_score
print("Accuracy: %.2f"%accuracy_score(y_test,y_pred))

誤識別に関する操作はこのような感じ
f:id:umashika5555:20171028023522p:plain