[Scikit-learn] LabelEncoder() :: Labelling

Python/Scikit-learn 2019. 11. 20.

머신러닝에서 모델링을 할 때 문자로 이루어진 데이터를 숫자로 바꿔줘야할 경우가 있다. 이때 Scikit-learn의 LabelEncder를 사용하여 범주형 데이터를 손쉽게 숫자형 데이터로 labelling 할 수 있다.

fruit = pd.DataFrame({'name':['apple', 'banana', 'cherry', 'durian'],
                      'color':['red', 'yellow', 'red', 'green']})   
fruit

fruit이라는 예시데이터를 생성하였다.

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
le.fit(fruit['name'])
fruit['name'] = le.transform(fruit['name'])
fruit

One-Hot Encoding은 0과 1로 이루어진 여러개의 열을 생성하는 반면 LabelEncoder는 문자를 숫자로 변환하여 하나의 열로 나타난다는 차이점이 있다.

[Python] 데이터 스케일링 :: 표준화(Standardization) (0)	2020.12.07
[Python] 홀드아웃 :: train_test_split (0)	2020.09.25
[Python] 오분류표 Confusion Matrix :: 분류(Classification) 모형 평가 (0)	2020.09.14
[Scikit-learn] ImportError: cannot import name 'CategoricalEncoder' (0)	2019.11.20
[scikit-learn] LabelEncoder / 범주형 데이터 변환 (0)	2019.11.13

🐢🐢🐢..