February 22, 2023

4 Introduction to Regression

1 Definition: Linear Regression

2 variable regression - how a response variable y changes the predictor (explanatory) variable x changes.

y = \beta_1x + c

Multiple regression - how a response variable y changes as the predictor (explanatory) variables x1 , x2 , … xn change
MATHJAX-SSR-159

Single Variable Polynomial Regression: First degree to Fifth Degree

The concept can be extended to polynomial regression

\begin{align} y &= c + a_1x \\ y &= c + a_1x + a_2x^2 \\ y &= c + a_1x + a_2x^2 + a_3x^3 \\ y &= c + a_1x + a_2x^2 + a_3x^3 + a_4x^4 \\ y &= c + a_1x + a_2x^2 + a_3x^3 + a_4x^4 + a_5x^5 \\ y &= c + a_1x + a_2x^2 + a_3x^3 + a_4x^4 + a_5x^5 +a_nx^n\\ \end{align}

2 Regression Strategy: Ordinary Least Squares (OLS)

The least-squares regression line of y and x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

image-20230222092629577

Suppose regression equation is

y = mx+b

\begin{align} Residual &= y_i-(mx_i+b)\\ Residual^2 &= (y_i-(mx_i+b))^2\\ ResidualSumOfSquares &= RSS = \sum_{i=1}^{N}(y_i-(mx_i+b))^2 \end{align}

image-20230306005620610

image-20230306005637510

\begin{align} m &= \frac{\sum(y_ix_i)-\frac{\sum(y_i)\sum(x_i)}{N}}{\sum(x_i^2)-\frac{\sum(x_i)^2}{N}}\\ b &= (\frac{\sum(y_i)}{N}-m\frac{\sum(x_i)}{N}) \end{align}

3 Regression: Supervised Learning Method

Single split model assessment methodology

image-20230306002607033

4 Nearest Neighbor Regression

A method for predicting a numerical variable y , given a value of x :

Graph of Averages

5 Solution for Regression Line

image-20230306010108743

image-20230306010118579

image-20230306010129262

6 Example

6.1 Galton

galton.ipynb galton.csv

6.2 Using skLearn and staysmodel

P1_Dictionary.ipynb

6.3 FULL CODE_Regression_Model

FULL CODE_Regression_Model_Advertising.ipynb

# DS# ML# Data Mining