Souce: Shapelined


Gradient Boosting Machines is a boosting ensemble technique. Boosting algorithms perform better because both variance and bias can be controlled by careful hyperparameter tuning. GBMs use shallow decision trees as compared to stumps in AdaBoost. They fit between AdaBoost and XGBoost in prediction/estimation performance. The advantages of GBM are better performance than AdaBoost but simpler mechanics than XGBoost.


Let X be the feature set with m samples and n features. Let y be the continuous response.

Comparison among different states on several factors in the education space



India is a melting pot of cultures, traditions, and values. The subcontinent of 1+ billion people houses its citizens in 36 States and Union Territories. The quality of education is affected not only by the presence of different educational boards but also by different prevalent socio-economic factors.



K Nearest Neighbors is a lazy learner. It does not build a model as such. Scaling of features is necessary before calculating distances when the features have different ranges.


  1. No training required.
  2. Very easy to interpret.


  1. Need to find the optimum K(no. of neighbors). This might change as new data is added.
  2. Problems arise when data is very large or the no. of dimensions is large. This can be solved by using kd-tree and ball tree algorithms.
  3. It is sensitive to noisy data.

Distance metrics



While using Linear Regression if you thought to yourself, “gosh, how can I use this for classification?”, you are reading the right article. Logistic Regression borrows the concept of best fit line from Linear Regression to demarcate classes in an OVR(one-vs-rest) fashion. Since the required output is a prediction, the model uses a sigmoid transformation to keep the output bound within [0,1]. Also, the loss function changes to hinge loss from a continuous convex loss function seen in Linear Regression. Missing values need to be imputed or dropped in Linear Regression. Outliers affect the formation of best fit line. They…

Source: Joel & Jasmin


This is the first algorithm anybody learns when they step into the world of Machine Learning. The advantage of the model lies in its simplicity. A linear model has a high bias and low variance. Also, if the features are normalized, it helps the algorithm converge faster while using gradient descent. Missing values need to be imputed or dropped in Linear Regression. Outliers affect the formation of best fit line. They should be filtered using boxplot or any other method.


  • The relation between target and predictor variables is linear
  • The features are independent of each other
  • Homoskedasticity: The random variables…

Image by author

“That’s the thing about books. They let you travel without moving your feet.” — Jhumpa Lahiri, The Namesake [1]


I belong to a family where someone is always pouring over books, newspapers, or magazines. At school, we had a library and a library period dedicated to reading. However, much to the annoyance of my friends and family, I was not a child who was keen on reading.

Anupam Misra

Aspiring data scientist. I am open to work. Contact me at for data science roles! Happy learning!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store