I build ML systems from the ground up — not to avoid abstractions, but to understand what's underneath them.
Third-year AI/ML student at RV College of Engineering, Bengaluru. I care about inference speed, distributed compute, and what actually happens when you run a model on constrained hardware.
Transformer Inference Accelerator — FPGA-based hardware accelerator for transformer inference on Artix-7. INT8 post-training quantization pipeline, UART host interface, custom memory layout for weight tiling. Built as a capstone with a PCB team; I own the ML side end-to-end.
NietzscheGPT — Character-level GPT trained on the complete works of Nietzsche. Then extended into a Tiny Inference Engine with KV-cache and INT8 PTQ. Followed Karpathy's nanoGPT then went further.
ScratchGPT — GPT-2 implemented from scratch. No Hugging Face, no shortcuts.
Distributed Training Orchestrator — Master-worker architecture over gRPC. Fault tolerance, checkpoint recovery, all-reduce semantics. Built to understand what frameworks like PyTorch Distributed are actually doing.
Federated Learning Aggregation Server — FL system designed for hospital deployment. Hadoop HDFS + MapReduce + Apache Hive for gradient aggregation, FastAPI for the coordination layer. Local simulation as showcase.
Music Recognition — Shazam clone built from scratch. Spectrogram fingerprinting, hash-based matching.
- ML inference at the edge — making models fast and cheap to run
- Distributed training internals — what happens below the framework
- Hardware-software co-design — FPGA, quantization, memory layout
- Systems that actually ship, not just benchmarks
Python · C · PyTorch · gRPC · VHDL · FastAPI · Hadoop · PostgreSQL · Linux
Looking for a stipended ML engineering / AI infrastructure internship in Bengaluru.
If something I've built is interesting to you — reach out.
📧 shubhadityabechan2004@gmail.com 🔗 https://www.linkedin.com/in/shubhaditya-bechan/
