Hi there,
I work on recommender systems, search relevance, and ML infrastructure (cloud/kubernetes). You can connect with me on LinkedIn or by email at mustaphaunubi@gmail.com or mmomoh@uwaterloo.ca.
1. Building and Deploying a Multistage Multimodal Recommender system on Amazon Elastic Kubernetes Service
Towards Data Science Post: https://towardsdatascience.com/deploying-a-multistage-multimodal-recommender-system-on-amazon-eks-featuring-bloom-filters-feature-caching-and-contextual-recommendations
Medium article: https://mustaphaunubi.medium.com/building-a-production-multistage-recommender-system-on-kubernetes-featuring-multimodal-embeddings-5bcd6d7bbf56?postPublishedType=repub
Code: https://github.com/MustaphaU/Multistage-Multimodal-Recommender-System-on-Amazon-EKS-with-NVIDIA-Merlin
Short Demo: https://www.youtube.com/watch?v=rwUwrISzDEQ
Figure 1: The model serving pipeline
This project is a multistage multimodal recommender system built and deployed on Amazon Elastic Kubernetes Service. It features online and offline feature stores backed by Athena+S3 and ElastiCache for Valkey (Redis), respectively. User cold-start is managed via Feature masking, context-aware retrieval & ranking, and near real-time personalization with online feature updates. The system also ingests multimodal item features for improved content based signal and item cold-starts. Recently interacted items are filtered using a Valkey (Redis) backed Bloom filter.
@article{momoh2026multistage,
title={Deploying a Multistage Multimodal Recommender System on Amazon Elastic Kubernetes Service},
author={Momoh, Mustapha Unubi},
platform={Towards Data Science},
year={2026},
month={May},
url={https://towardsdatascience.com/deploying-a-multistage-multimodal-recommender-system-on-amazon-eks-featuring-bloom-filters-feature-caching-and-contextual-recommendations}
}The system is operationalized with Kubeflow pipelines. One pipeline orchestrates the initial feature setup, training the models, and deploying the NVIDIA Triton Inference server. The second pipeline manages the periodic incremental fine-tuning of the query tower and the ranker.
Figure 2: MLOps architecture
Medium Article: https://mustaphaunubi.medium.com/building-a-recommender-system-with-continuous-retraining-on-amazon-eks-with-nvidia-merlin-hugectr-5b734c71bbc5
Code: https://github.com/MustaphaU/Merlin-RecSys-MLOps-on-AWS
Figure 1: Ads-ranking MLOps with monitoring component for drift detection and auto-retraining
In this project, a DCN based recommendation model is trained on a subset of the Criteo 1TB logs to predict Ads Click Through Rates (CTR). The system includes a monitoring component that watches the system for performance drift based on AUC-ROC and triggers an incremental training run once drift is detected. There are two autoscaling options included in the project: Autoscaling with Kubernetes HPA + Karpenter AND Autoscaling with Kubernetes HPA + Cluster Autoscaler.
Figure 2: The two autoscaling options — HPA + Karpenter (left) and HPA + Cluster Autoscaler (right)
@article{momoh2026continuous,
title={Building a single-stage Recommender System with Continuous Retraining on Amazon EKS with NVIDIA Merlin, HugeCTR, NVIDIA Triton Inference Server, and Kubeflow Pipelines},
author={Momoh, Mustapha Unubi},
platform={Medium},
year={2026},
month={March},
url={https://mustaphaunubi.medium.com/building-a-recommender-system-with-continuous-retraining-on-amazon-eks-with-nvidia-merlin-hugectr-5b734c71bbc5}
}Email: mustaphaunubi@gmail.com.



