ML algorithms 1.01: Linear Regression

Published in

Geek Culture

3 min readMay 24, 2021

Introduction

This is the first algorithm anybody learns when they step into the world of Machine Learning. The advantage of the model lies in its simplicity. A linear model has a high bias and low variance. Also, if the features are normalized, it helps the algorithm converge faster while using gradient descent. Missing values need to be imputed or dropped in Linear Regression. Outliers affect the formation of best fit line. They should be filtered using boxplot or any other method.

Assumptions

The relation between target and predictor variables is linear
The features are independent of each other
Homoskedasticity: The random variables have same finite variance
There is no relationship between the residuals and the target variable
The residuals are normally distributed

Advantages

Performs very well when assumptions are true
High model interpretability

Disadvantages

Hardly does data meet the assumptions
Model is prone to overfitting

Model

Let X be the feature set with m samples and n features. Let y be the continuous response.

Parameters of the model are represented as:

We may set the initial parameters 𝜷 of the model close to 0. Let us define the loss(cost) function of the model:

The negative gradient of the cost function gives the direction we should move to optimize 𝜷 for our model. We move down the negative gradient of the loss function with step size η(learning rate). Note that we do not update the intercept term.

The cost function of linear regression is a convex function. Therefore we iteratively update 𝜷 till the gradient is 0. If we can’t reach the gradient of 0 , we iterate till the gradient is less than some epsilon value. After these steps we have obtained the optimal parameters 𝜷 for our model and the best fit line for our data.

Estimation

Now we can estimate the continuous variable through our model as shown below:

Hyperparameter tuning

There isn’t much tuning required or possible in Linear Regression. We can use Lasso or Ridge regression techniques to penalize the cost function.

Ridge regression is used to reduce variance in the model by penalizing higher parameter values 𝜷.

Ridge regression cost function:

In Ridge regression the parameters shrink towards 0 but never become 0. Lasso regression can force some parameters to 0. This implements automatic feature selection.

Lasso regression cost function:

******************************************************************Example code:

from sklearn.linear_model import LinearRegression as LR

lr =LR()

lr.fit(X_train, y_train)

y_hat = lr.predict(X_test)

*****************************************************************

References

Aprendizagem Automática

4,188,060 already enrolled Machine learning is the science of getting computers to act without being explicitly…

www.coursera.org

krishnaik06/Interview-Prepartion-Data-Science

Contribute to krishnaik06/Interview-Prepartion-Data-Science development by creating an account on GitHub.

github.com