Churn prediction, or the task of identifying customers who are likely to discontinue use of a service, is an important and lucrative concern of any industry.
This project is tasked to predict the churn score for a website based on features such as:
- User demographic information
- Browsing behavior
- Historical purchase data among other information
This project aims to identify customers who are likely to leave so that we can retain them with certain incentives.
- Dataset has been taken from a Hackathon, and raw dataset can be downloaded from here. Link
- Cleaned and processed version of the data can be accessed from here. Link
- Classes [Customer will EXIT(1) or NOT(0)] are properly balanced with 5:4 ratio
Notebook contains the EDA, data processing, and model building ideas.
Notebook | Colab | Kaggle |
---|---|---|
Customer Churn Modeling | ||
Exploratory data analysis |
- The final model used is an ensemble of different classifiers such as:
- KNN
- Random Forest
- AdaBoost
- Xgboost
Python version : 3.7
Packages: pandas, numpy, sklearn, xgboost, fastapi, seaborn
Cloud: heroku
conda create -n envname python=3.7
activate envname
git clone https://github.com/d0r1h/Churn-Analysis.git
cd Churn-Analysis
pip install -r requirements.txt
python app.py
To download dataset and preprocess automatically run following script
!pip install datasets
!python src/preprocess.py
- Even though Xgboost is giving good Test Accuracy of ~ 93% but we need to focus on the customers who are leaving i.e. class 1, so that we can retain them with some discount offer on membership.
- Ensemble methods (stack classifier) is having 94% of recall for predicting the customers who are likely to leave, higher than Xgboost.
- Following is confusion matrix of final classifier (stack ensemble) and xgboost classifier.
- Score table for different classifier
Application is deployed on heroku and can be accessed on https://churn01.herokuapp.com/ and sample data for the test app is here