February 28, 2023

9 Data Prep: Discretization & 1Hot Encoding

image-20230307140225758

1 Discretization

Converting Numeric Data into Categorical Data

How to determine the boundaries between classes?

image-20230307140432485

image-20230307140445235

image-20230307140458116

1.1 Example

Read Datafile

import pandas as pd
df = pd.read_csv('Lung Capacity.csv')

Discretize Height into 6 Categories : Width Size is Different

bins = [0, 50, 55, 60, 65, 70, 100]
group_names = ['A', 'B', 'C', 'D', 'E', 'F']

c1 = pd.cut(df['height'], bins, labels=group_names)

2 One Hot Encoding

Label Encoder

2.1 Example

Load the libraries

import numpy as np
import pandas as pd

from sklearn import datasets

Read the Dataset

iris = datasets.load_iris()
print(type(iris))
features = iris["data"]
print(type(features))
print(features[:5,:])

#############################################
print('---------------------------')
labels = iris["target"]
print(type(labels))
print(labels)

Encode the response variable (labels - species) data into one-hot

labels_onehot_dataframe = pd.get_dummies(labels,prefix='species')
one_hot = np.array(labels_onehot_dataframe)

# DS# Data Mining