Feature scaling in machine learning
Let’s say you run into a dataset of house prices in your locality with their corresponding area in square feet. You being a data enthusiast want to understand the relationship between area and price.
The first thing you probably would do is create a scatter plot. Here is how it looks.
Motivated by what you see, you now want to fit a regression line to this data. Here is how that looks.
While this looks fine on the face of it, observing a little more carefully you would ask the following questions.
- What is the intercept value of -369.6 really conveying?
While it is impractical to have house of zero square feet area, the intercept is telling that such houses go for negative 369 thousand dollars. That doesn’t make any sense.
2. Why is the confidence interval on the intercept so large i.e. -580 to -158 with a very high standard error?
As you can see from the scatter plot, the data points on the explanatory variable ‘area’ are pretty further away from zero. The minimum value of area is around 1700–1800 square feet.
By definition, intercept is the value of price when area is zero (a hypothetical situation). With virtually no data points from 0–1700 square feet for the model to train on, the intercept becomes unstable. What that means is that the intercept is susceptible to more fluctuations were we to add a few more houses to our dataset. Hence, making the model unstable from a prediction standpoint.
One way to solve this is to scale the explanatory variable i.e. ‘area’ in this case in a way that data points are not clustered far away from zero.
A simplest way is to make values of ‘area’ relative to the mean by subtracting them from the mean area. Here the mean area is around 2000 square feet, so this is how your data would look.
A positive value for ‘area_centered’ means the house is above 2000 square feet and vice versa.
Here is how your scatter plot, regression line and model equation looks like now.
Observations
- While the coefficient on area has remained unchanged, the confidence interval on the intercept is narrower now with a much lower standard error
- The intercept is also interpretable i.e. houses with mean area are priced at 1.053 million dollars
This is a simple example to demonstrate the value of feature scaling in making your model more robust and interpretable.
Credits : Codecademy
Where have you used feature scaling?