1 Discretization
Converting Numeric Data into Categorical Data
How to determine the boundaries between classes?
- Natural boundaries
- Equi-width ranges
- Equi-log ranges
- Equi-depth ranges
Range [a,b] is chosen- Each range has an equal number of records
- First sort the data
- Select the boundaries from the sorted
data such that each range contains equal number of observations
- Select the boundaries from the sorted
1.1 Example
Read Datafile
import pandas as pd
df = pd.read_csv('Lung Capacity.csv')
Discretize Height into 6 Categories : Width Size is Different
bins = [0, 50, 55, 60, 65, 70, 100]
group_names = ['A', 'B', 'C', 'D', 'E', 'F']
c1 = pd.cut(df['height'], bins, labels=group_names)
2 One Hot Encoding
Label Encoder
- Converts Categorical variable into Numerical values
- Starting from 0,1,2,…
- Code is assigned by alphabetical order
2.1 Example
Load the libraries
import numpy as np
import pandas as pd
from sklearn import datasets
Read the Dataset
iris = datasets.load_iris()
print(type(iris))
features = iris["data"]
print(type(features))
print(features[:5,:])
#############################################
print('---------------------------')
labels = iris["target"]
print(type(labels))
print(labels)
Encode the response variable (labels - species) data into one-hot
labels_onehot_dataframe = pd.get_dummies(labels,prefix='species')
one_hot = np.array(labels_onehot_dataframe)