Let’s say you have a dataset containing test scores of students and number of hours they studied a day before the test.

Here is how the relationship between them looks like.

Let’s say you have a dataset containing test scores of students and number of hours they studied a day before the test.

Here is how the relationship between them looks like.

As an analyst you want to get the most out of your regression model. While you are often limited (or spoilt) by the number of features in your dataset, there are creative ways to power up your model.

One of them is to look at interaction effects between features. An…

More data is being collected now than ever before. So much so, that the lines between data analysis and insight frequently blur.

In a growing field like analytics, it is common for misconceptions to set in and terms to be used loosely.

Here are my thoughts —

*Data is not…*

Let’s say you run into a dataset of house prices in your locality with their corresponding area in square feet. You being a data enthusiast want to understand the relationship between area and price.

The first thing you probably would do is create a scatter plot. …

More often than not, the data we get for analysis is not about the entire population we wish to study — it’s a sample.

Sampling makes sense for the following reasons :

- It is very expensive to collect data about an entire population. …

The term ‘statistical significance’ often comes up in analyzing results from experiments or observational studies.

A common interpretation I have heard in an experimentation context is -

*Statistical significance tells us if the difference in results between different populations is ‘large’ enough or simply put, whether an experiment had a…*

Understanding relationships between data attributes has been at the heart of data analysis. One of the most common measures of such interdependence is the Pearson correlation coefficient, defined below.

I wrote my first code in Class 6, like many students. While PC configurations were archaic and we shared machines in groups of three, there was a palpable excitement about these computer sessions.

I found this excitement similar to the library sessions where we would sift through breathtaking pictures in…

You don’t always have the data you need, there are times you need to go get it.

Web scraping can be an effective tool to get publicly available data that sits behind web pages. Such data can have a variety of use cases.

Let’s take an example.

Below is a…

I have found the Monty Hall Problem to be a good practical way to think about probabilities.

While the problem sounds counter intuitive initially, things become clear as you start putting things on paper or for that matter even a spreadsheet.

**Here is the problem –**

There are three doors…