πŸ“ˆ A production-style churn prediction system using the Kaggle Telco Customer Churn dataset, built with Python, SHAP explainability, and an interactive Gradio app deployed on Hugging Face.


Why Churn Prediction Matters

Customer churn prediction is one of the most common real-world applications of machine learning. For subscription businesses, telecom providers, or SaaS platforms, being able to predict which customers are likely to leave allows teams to act early and reduce attrition.

Instead of leaving this as a notebook experiment, I built a deployable churn analysis platform with:

  • Modular Python code (data, training, explainability, UI separated)
  • Support for multiple models (Logistic Regression, Random Forest, Gradient Boosting, XGBoost, CatBoost)
  • Business-friendly explainability using SHAP values
  • A fully interactive demo hosted on Hugging Face Spaces

Live Demo: Churn Prediction App

πŸ‘‰ Try the Churn Prediction App
πŸ‘‰ View Code on GitHub


Dataset: Telco Customer Churn

The application uses the Telco Customer Churn dataset from Kaggle:

  • 7,043 customers
  • 26.5% churn rate
  • Features: demographics, services, account information
  • Target variable: whether the customer churned (Yes/No)

System Design & Workflow

The churn prediction app is structured into five clear steps:

  1. Data Ingestion & Preprocessing
  • Load CSV or auto-download Kaggle dataset
  • Handle missing values, categorical encoding, and feature engineering (tenure buckets, service counts)
  1. Model Training
  • Choose between Logistic Regression, Random Forest, Gradient Boosting, XGBoost, or CatBoost
  • Configure cross-validation folds and ensemble size
  • Automated best-model selection
  1. Evaluation Metrics
  • Accuracy, Precision, Recall, F1, ROC-AUC
  • Confusion Matrix and ROC curve visualizations
  1. Explainability (SHAP)
  • Global feature importance across the dataset
  • Individual prediction breakdown (e.g., contract type, tenure, monthly charges, tech support)
  • SHAP waterfall view for single-customer explanations
  1. Deployment
  • Built with Gradio 4.x for an interactive UI
  • Deployed on Hugging Face Spaces for instant public access

Code Structure

“`bash
telcochurnzeroshot/
β”œβ”€β”€ data.py # data ingestion & preprocessing
β”œβ”€β”€ train.py # model training & evaluation
β”œβ”€β”€ explain.py # SHAP explainability
β”œβ”€β”€ app_clean.py # Gradio UI
└── assets/ # saved ROC curves & confusion matrices

Example Churn Insights

From the trained models, consistent factors emerge:

  • Contract Type β†’ month-to-month contracts show the highest churn rates
  • Tenure β†’ longer-tenure customers are more likely to stay
  • Monthly Charges β†’ higher monthly fees correlate with higher churn risk
  • Tech Support β†’ customers with support are less likely to churn

These align with actionable business levers like annual plans, loyalty programs, pricing adjustments, and service improvements.


Deployment Options

  • Hugging Face Spaces β†’ zero setup, public demo
  • Local run git clone https://github.com/ajaycyril/telcochurnzeroshot cd telcochurnzeroshot pip install -r requirements.txt python app_clean.py
  • Enterprise-ready β†’ containerize with Docker, connect to enterprise data sources, integrate with BI dashboards

Roadmap

  • Add threshold tuning for precision vs recall tradeoffs
  • Implement cost-sensitive metrics to tie predictions to real $$ impact
  • Enable batch/API scoring for production deployment
  • Add a policy engine mapping churn scores β†’ retention actions

Closing Thoughts

This project demonstrates how to take a standard churn dataset and transform it into a deployable churn prediction platform:

  • Modular Python ML pipeline
  • Interactive Gradio interface
  • Explainability via SHAP
  • Deployment on Hugging Face for fast access

It’s a reusable template for anyone looking to apply machine learning to churn analysis in production.

Show CommentsClose Comments

Leave a comment