ML algorithms 1.01: Linear Regression

Anupam Misra
Geek Culture
Published in
3 min readMay 24, 2021

--

Source: Joel & Jasmin

Introduction

This is the first algorithm anybody learns when they step into the world of Machine Learning. The advantage of the model lies in its simplicity. A linear model has a high bias and low variance. Also, if the features are normalized, it helps the algorithm converge faster while using gradient descent. Missing values need to be imputed or dropped in Linear Regression. Outliers affect the formation of best fit line. They should be filtered using boxplot or any other method.

Assumptions

  • The relation between target and predictor variables is linear
  • The features are independent of each other
  • Homoskedasticity: The random variables have same finite variance
  • There is no relationship between the residuals and the target variable
  • The residuals are normally distributed

Advantages

  • Performs very well when assumptions are true
  • High model interpretability

Disadvantages

  • Hardly does data meet the assumptions
  • Model is prone to overfitting

Model

Let X be the feature set with m samples and n features. Let y be the continuous response.

Parameters of the model are represented as:

We may set the initial parameters 𝜷 of the model close to 0. Let us define the loss(cost) function of the model:

The negative gradient of the cost function gives the direction we should move to optimize 𝜷 for our model. We move down the negative gradient of the loss function with step size η(learning rate). Note that we do not update the intercept term.

The cost function of linear regression is a convex function. Therefore we iteratively update 𝜷 till the gradient is 0. If we can’t reach the gradient of 0 , we iterate till the gradient is less than some epsilon value. After these steps we have obtained the optimal parameters 𝜷 for our model and the best fit line for our data.

Estimation

Now we can estimate the continuous variable through our model as shown below:

Hyperparameter tuning

There isn’t much tuning required or possible in Linear Regression. We can use Lasso or Ridge regression techniques to penalize the cost function.

Ridge regression is used to reduce variance in the model by penalizing higher parameter values 𝜷.

Ridge regression cost function:

In Ridge regression the parameters shrink towards 0 but never become 0. Lasso regression can force some parameters to 0. This implements automatic feature selection.

Lasso regression cost function:

******************************************************************Example code:

from sklearn.linear_model import LinearRegression as LR

lr =LR()

lr.fit(X_train, y_train)

y_hat = lr.predict(X_test)

*****************************************************************

References

--

--