February 25, 2023

6 Regression Quality

Metrics to measure the Quality of Regression

1 Measure of Regression

image-20230307111851997

Total Deviation = Explained Deviation + Unexplained Deviation

SST = SSR + SSE

image-20230307112029424

1.1 Coefficient of Determination r^2

Definition

r^2 = \frac{explainedVariation}{totalVariation}=\frac{SSR}{SST}

How to compute r^2

squareCorrelationValue = r^2 = 1-\frac{SSE}{SST}

The closer r^2 is to 1 , the better the fir. For a perfect fit, SSE = 0 , r^2 =1

2 Standard Error

Standard error is the standard deviation of the deviation of actual response variable with the predicted variable (residuals) using the regression line.

Degree of freedom

s_e=\sqrt{\frac{\sum(y_i-\hat{y_i})^2}{n-2}}=\sqrt{\frac{\sum(residuals)^2}{n-2}}=\sqrt{\frac{SSE}{n-2}}

3 Verify that the residuals are normally distributed

If the residuals are not normally distributed, regression is not Valid

3.1 Histogram

Plot the histogram of the data, see a normal distribution

Problem with this technique

3.2 QQ Plot - Quantile-Quantile plot

Data is plotted against a theoretical normal distribution. If you see a straight line, data is normally distributed

Testing Procedure

QQ plot is

image-20230307133050546

# DS# Data Mining