π A production-style churn prediction system using the Kaggle Telco Customer Churn dataset, built with Python, SHAP explainability, and an interactive Gradio app deployed on Hugging Face.
Why Churn Prediction Matters
Customer churn prediction is one of the most common real-world applications of machine learning. For subscription businesses, telecom providers, or SaaS platforms, being able to predict which customers are likely to leave allows teams to act early and reduce attrition.
Instead of leaving this as a notebook experiment, I built a deployable churn analysis platform with:
- Modular Python code (data, training, explainability, UI separated)
- Support for multiple models (Logistic Regression, Random Forest, Gradient Boosting, XGBoost, CatBoost)
- Business-friendly explainability using SHAP values
- A fully interactive demo hosted on Hugging Face Spaces
Live Demo: Churn Prediction App
π Try the Churn Prediction App
π View Code on GitHub
Dataset: Telco Customer Churn
The application uses the Telco Customer Churn dataset from Kaggle:
- 7,043 customers
- 26.5% churn rate
- Features: demographics, services, account information
- Target variable: whether the customer churned (Yes/No)
System Design & Workflow
The churn prediction app is structured into five clear steps:
- Data Ingestion & Preprocessing
- Load CSV or auto-download Kaggle dataset
- Handle missing values, categorical encoding, and feature engineering (tenure buckets, service counts)
- Model Training
- Choose between Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or CatBoost
- Configure cross-validation folds and ensemble size
- Automated best-model selection
- Evaluation Metrics
- Accuracy, Precision, Recall, F1, ROC-AUC
- Confusion Matrix and ROC curve visualizations
- Explainability (SHAP)
- Global feature importance across the dataset
- Individual prediction breakdown (e.g., contract type, tenure, monthly charges, tech support)
- SHAP waterfall view for single-customer explanations
- Deployment
- Built with Gradio 4.x for an interactive UI
- Deployed on Hugging Face Spaces for instant public access
Code Structure
“`bash
telcochurnzeroshot/
βββ data.py # data ingestion & preprocessing
βββ train.py # model training & evaluation
βββ explain.py # SHAP explainability
βββ app_clean.py # Gradio UI
βββ assets/ # saved ROC curves & confusion matrices
Example Churn Insights
From the trained models, consistent factors emerge:
- Contract Type β month-to-month contracts show the highest churn rates
- Tenure β longer-tenure customers are more likely to stay
- Monthly Charges β higher monthly fees correlate with higher churn risk
- Tech Support β customers with support are less likely to churn
These align with actionable business levers like annual plans, loyalty programs, pricing adjustments, and service improvements.
Deployment Options
- Hugging Face Spaces β zero setup, public demo
- Local run
git clone https://github.com/ajaycyril/telcochurnzeroshot cd telcochurnzeroshot pip install -r requirements.txt python app_clean.py - Enterprise-ready β containerize with Docker, connect to enterprise data sources, integrate with BI dashboards
Roadmap
- Add threshold tuning for precision vs recall tradeoffs
- Implement cost-sensitive metrics to tie predictions to real $$ impact
- Enable batch/API scoring for production deployment
- Add a policy engine mapping churn scores β retention actions
Closing Thoughts
This project demonstrates how to take a standard churn dataset and transform it into a deployable churn prediction platform:
- Modular Python ML pipeline
- Interactive Gradio interface
- Explainability via SHAP
- Deployment on Hugging Face for fast access
Itβs a reusable template for anyone looking to apply machine learning to churn analysis in production.