A cohort analysis is used to study the behavior or outcome associated with a group of users over time. In Sprinkle we can analyse cohorts by creating different groups and understand the user retention and also new customer acquisition over a period of time. It could also be termed as Behavioral analytics.
For any business, cohort analysis is often used to study user retention. Cohort is a group of customers who have common characteristics. Cohorts can be defined as any customer who joined any website or registers to avail any service,which can be categorized by demographics, age,etc for a set of individuals. This can define cohorts over a period of time.
Cohort Analysis in Sprinkle Data
The steps involved for analysing cohort analysis are as follows:-
- Data Collection from different sources
- Data Cleaning and Fact table creation
- Data Modeling
- Cohort Report creation
1. Date/Time based data
The data should have date and time based on the list of individuals or events. The events should track attributes like purchase date, sign-in date, product name, etc relevant to the analysis.
2. Calculating Days, Week or Month from the day of first Order date (Data Preparation)
For calculating the 2nd order date we can use window functions. This will help us know if the user is a regular customer or an occasional shopper. The step will help the organization to decide on the discounts or the marketing campaigns.
In the below screenshot we have created a fact table where we have changed the data type of different attributes, calculated when the user ordered for the second time using the lag and lead window function. The expression to achieve the second order date -:
lead(Order_Date) over (partition by Customer_ID order by Order_Date) as second_order_date,
rank() over (partition by Customer_ID order by Order_Date) as order_ranking,
(select *, rank() over (partition by z.Order_ID order by z.Order_Date) ranking from ds_superstore_superstore z) where ranking=1)a where order_ranking=1
3. Data Modeling in Sprinkle
Data modelling in Sprinkle is easy and users do not have to write any complex codes. Creating a model is just some clicks away. For creating cohorts according to the first and second purchase we need to find the buckets like number of customers returned within 5 months, after 5 months etc. We have configured the cohort interval for 5 months. The expression query is as follows-:
when date_diff(second_order_date, first_order_date, month) <= 5 then "< = 5 months"
when date_diff(second_order_date, first_order_date, month) between 5 and 11 then "> 5 and <=10 months"
when date_diff(second_order_date, first_order_date, month) between 10 and 16 then "> 10 and<=15 months"
when date_diff(second_order_date, first_order_date, month) between 15 and 21 then "> 15 and <=20 months"
when date_diff(second_order_date, first_order_date, month) between 20 and 26 then "> 20 and < = 25 months"
when date_diff(second_order_date, first_order_date, month) >25 then "> 25 months"
The difference between the 2nd order date and 1st order date can be calculated using models in Sprinkle tool. The expression can be as follows-:
date_diff(second_order_date, first_order_date, month)
4. Final report on Cohort Analysis in Segment
The report is now a Segment in Sprinkle Tool. The segment can be prepared using the model where we model the data by adding dimensions, measures, creating expressions, etc. The report can be achieved by the user by just some clicks, choosing the drop downs for filters and sorting.
To see the tabular cohort analysis in sprinkle we need to pivot on the cohort column i.e “Months to repeat purchase”. The pivot, transpose and seeing aggregates appears in a pop window.
FInally we are able to achieve the Cohort analysis in tabular format, which will give the analyst a broader analysis of marketing campaigns, acquiring new users and the regular users.
Cohort analysis is a simple and efficient way to understand the marketing campaign, user retention and new user acquisition performance and user acquisition. Cohort analysis can be achieved in Sprinkle tools by creating fact tables in flows by doing all data cleaning, transformations and enrichments. The user can create models using those fact tables, define the measures and dimensions and user defined expressions, using those in segments we are able to achieve Cohort analysis.