Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It aims to find the best-fitting straight line (the regression line) that describes how the dependent variable changes as the independent variables change.
To ensure the validity of linear regression, the following assumptions should be met:
- Linearity: The relationship between the independent and dependent variables should be linear.
- Independence: Observations should be independent of each other.
- Homoscedasticity: The residuals (the differences between the observed and predicted values) should have constant variance.
- Normality: The residuals should be approximately normally distributed.
- No multicollinearity: Independent variables should not be highly correlated with each other.
The equation for a simple linear regression model with one independent variable is:
where:
-
$$\ y $$ is the dependent variable. -
$$\ x $$ is the independent variable. -
$$\ \beta_0 $$ is the y-intercept (the value of$$\ y $$ when$$\ x $$ is 0). -
$$\ \beta_1 $$ is the slope of the regression line (the change in$$\ y $$ for a one-unit change in$$\ x $$ ). -
$$\ \epsilon $$ is the error term (the difference between the observed and predicted values).
For multiple linear regression with multiple independent variables, the equation is:
The loss function measures how well the linear regression model fits the data. The most commonly used loss function is the Mean Squared Error (MSE), defined as:
where:
-
$$\ m $$ is the number of observations. -
$$\ y_i $$ is the actual value of the dependent variable for the$$\ i $$ -th observation. -
$$\ \hat{y}_i $$ is the predicted value of the dependent variable for the$$\ i $$ -th observation.
Initialize the parameters
Compute the gradients of the loss function with respect to the parameters. For the simple linear regression, the gradients are:
Update the parameters using the gradients and a learning rate (
For multiple linear regression, the parameters are updated similarly for each
Repeat steps 3 and 4 until the parameters converge to values that minimize the loss function.