1 Similarity Based Learning

Compute the distance matrices between objects

2 k Nearest Neighbor (kNN) Model
2.1 Pros and Cons of kNN
| Pros | Cons |
|---|---|
| Simple and Effective | Does not produce a model, limiting the ability to understand how the features are related to the class |
| Makes no assumption about the underlying data distribution Non-parametric |
Requires selection of an appropriate value of âkâ |
2.2 Example






3 kNN Model Assessment


4 Data Normalization: Standardization & Scaling
Suppose we have 2 data items
- Height: varies from 4 â 7 feet
- Net Worth: 100B
If we use both the variables in a model
- Net Worth will dominate because it contains large values
Solution
- Standardize
- Scale
4.1 Data Standardization and Scaling
