Cohort Analysis With Python
In this article, I am going to talk about Cohort Analysis and how to analyze it with Python. It widely used for mobile applications/games.
Let’s say we created a mobile game and published it. How do we know the game will be popular or die. It depends on the relationship with the users.
If the entrepreneurs analyze relations with the users, fix bugs and errors and take the necessary steps to organize their relations, they will at least increase their chance of survival.
But, how to understand these relations?
The answer is Cohort Analysis.
Literally, a cohort is a group who shared similar behaviours within a specified period. A group of people born in Turkey in 2022 is an example for cohort related to the number of births in a country. In terms of bussiness problems, cohort represents a group of customers or users. And a cohort analysis is when you try to derive insights from the behaviour of this group.
What is Cohort Analysis?
Literally, a cohort is a group who shared similar behaviours within a specified period. A group of people born in Turkey in 2022 is an example for cohort related to the number of births in a country. In terms of bussiness problems, cohort represents a group of customers or users. And a cohort analysis is when you try to derive insights from the behaviour of this group. Cohorts analysis make it easy to analyze the user behaviour and trends without having to look at the behaviour of each user individually.
Why Cohort Analysis?
The most valuable feature of cohort analysis is that it helps companies answer some of the targeted questions by examining the relevant data. Cohort Analysis helps to understand how the behaviour of users can affect the business in terms of acquisition and retention and to analyze the customer churn rate.
Cohort Analysis Using Python
We will use the Online Retail dataset. You can find it here.
I will directly delete the missing values and duplicates as they are not in the scope of our topic.
There some negative values in the Quantity and Price features. It can’t be possible. So I will filter the data greater than zero.
Data Preparation
For cohort analysis, we need three labels. These are payment period, cohort group and cohort period/index. But first, to work with the time series, we need to convert the type of related feature. The format shuld be as in the dataset.
Now, we need to create the cohort and order_month variables. The first one indicates the monthly cohort based on the first purchase date and the second one is the truncated month of the purchase date.
Then, we aggregate the data per cohort and order_month and count the number of unique customers in each group.
Then, we aggregate the data per cohort and order_month and count the number of unique customers in each group.
To obtain the retention matrix, we need to divide the values each row by the row’s first value, which is actually the cohort size — all customers who made their first purchase in the given month.
Lastly, we plot the retention matrix as a heatmap. Additionally, we wanted to include extra information regarding the cohort size. That is why we in fact created two heatmaps, where the one indicating the cohort size is using a white only colormap — no coloring at all.
In the image, we can see that there is a sharp drop-off in the second month (indexed as 1) already, on average around 80% of customers do not make any purchase in the second month. The first cohort (2010–12) seems to be an exception and performs surprisingly well as compared to the other ones. A year after the first purchase, there is a 50% retention. This might be a cohort of dedicated customers, who first joined the platform based on some already-existing connections with the retailer. However, from data alone, that is very hard to accurately explain.
Throughout the matrix, we can see fluctuations in retention over time. This might be caused by the characteristics of the business, where clients do periodic purchases, followed by periods of inactivity.
Veri Bilimi Okulu, Mustafa Vahit Keskin
github: https://github.com/aoyilmaz/DataScience_Projects/blob/main/CRM/Cohort_Analysis/cohort_analysis.py
kaggle: https://www.kaggle.com/ahmetokanyilmaz/cohort-analysis-with-python