Skip to content
View amacharla15's full-sized avatar

Highlights

  • Pro

Block or report amacharla15

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
amacharla15/README.md

M.S. CS student at CSU Chico and ex-Cognizant software engineer. I build backend systems, LLM inference tooling, and performance-focused C++/Python projects, with recent work in tokenization, observability, benchmarking, and hardware-aware ML performance.

Achievements

  • Built Chaos Reviewer — AI agent (Fetch.ai track @ Cal Hacks) — ~7.2K interactions
  • Certifications:
  • AWS Certified Developer – Associate
  • AWS Machine Learning Foundations
  • Meta Backend Developer (Coursera)

Pinned Loading

  1. Parallel-BPE-Tokenizer Parallel-BPE-Tokenizer Public

    High-performance GPT-2-style BPE tokenizer in C++ with parallel batch encoding, thread-pool execution, thread-local caching, and benchmark-driven comparison against tiktoken and GPT2TokenizerFast

    C++

  2. CPUinference CPUinference Public

    End-to-end LLM inference on CPU: API serving, streaming, benchmarking, memory analysis, and model compression(quantization).

    Python

  3. FlashSeatReservation FlashSeatReservation Public

    Flash-sale seat reservation backend (holds + bookings) built with Spring Boot + PostgreSQL, with concurrency-safe constraints, Flyway migrations, Docker, GitHub Actions CI/CD, and Azure Container A…

    Java

  4. gpu-profiling-cuda-kernels gpu-profiling-cuda-kernels Public

    GPU profiling suite & CUDA kernels on A100 80GB — ResNet-50 benchmarks, Nsight Systems profiling, tiled matrix multiplication with shared memory

    Python

  5. Hardware-Aware-Training-Time-Throughput-Prediction Hardware-Aware-Training-Time-Throughput-Prediction Public

    Hardware-aware CNN training performance predictor for CIFAR-10 on NVIDIA A100—learns sec/epoch from config features and derives throughput (images/sec) from predicted time.

    Python

  6. Doc-Rag-Agent Doc-Rag-Agent Public

    Doc RAG Agent is a custom lightweight document Q&A system built from scratch that retrieves relevant chunks from PDFs, answers using only that evidence, and enforces verified citations (or abstains…

    Python