Building a Movie Recommendation System (Ke-Netflix)
Summary
Ke-Netflix is a robust, automated hybrid movie recommendation system developed from the MovieLens dataset, initially comprising 9,000 movies, 610 users, and 100,000 ratings. The system features a dedicated text-cleaning engine for movie titles, integrates the TMDB API for data enrichment and new movie discovery, and simulates ongoing user activity to create a dynamic dataset. It employs two primary recommendation algorithms: content-based filtering using feature vectors and collaborative filtering via Singular Value Decomposition, achieving an RMSE of 0.99 and Precision@10 of 15%. These are combined into a 60% collaborative, 40% content-based hybrid score. Hosted on PostgreSQL with Neon, the entire pipeline is automated using GitHub Actions for weekly movie synchronization and daily/weekly recommendation refreshes. A Streamlit application provides a personalized, explainable user interface, showcasing 9,900+ movies and 18,300+ recommendations.
Key takeaway
For MLOps Engineers designing or scaling recommendation systems, prioritize robust data engineering and full automation over isolated model training. Your system must handle messy data, grow its catalog, and refresh recommendations without manual intervention. Implement idempotent data cleaning, integrate external APIs for enrichment, and simulate user behavior to create dynamic datasets. Automate end-to-end pipelines using tools like GitHub Actions to ensure reliability and continuous operation, providing explainable recommendations through a user-friendly interface.
Key insights
Building production-ready recommendation systems requires robust data engineering and automation beyond basic model training.
Principles
- Data cleaning and schema design precede ML.
- Idempotent pipelines enable reliable automation.
- Hybrid algorithms improve recommendation quality.
Method
Develop a text-cleaning engine, simulate user behavior, implement content-based and collaborative filtering, then automate with scheduled workflows.
In practice
- Use Unicode normalization and regex for messy text.
- Integrate external APIs (e.g., TMDB) for data enrichment.
- Precompute similarities for faster content-based recommendations.
Topics
- Recommendation Systems
- Data Engineering
- MLOps
- Streamlit
- Collaborative Filtering
- Content-Based Filtering
- GitHub Actions
Code references
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.