Run local LLMs like Gemma, Qwen, and LLaMA on Android for offline, private, real-time chat and question answering with LiteRT and ONNX Runtime.
-
Updated
May 6, 2026 - Kotlin
Run local LLMs like Gemma, Qwen, and LLaMA on Android for offline, private, real-time chat and question answering with LiteRT and ONNX Runtime.
Training pipeline for an osu!mania 7k next-event model, designed as the upstream predictor for audio-driven 7k map generation systems.
run AI without internet, on the web & desktop
Offline CrowdAware system for Raspberry Pi 4B and Heltec LoRa V3 using Raspberry Pi Camera Module 3 and MLX90640 Thermal Camera.
A Cloud-to-Edge MLOps pipeline for offline industrial diagnostics. Fine-tunes Phi-3-mini (3.8B) on Cloud GPUs via QLoRA, quantizes to INT4, and deploys as a CPU-optimized ONNX microservice for industrial standard sensor logs.
A comprehensive toolkit for streamlining and simplifying the offline inference process for LLMs across various models and libraries.
GPT-OSS B20 Local Execution. Lightweight local environment for running it with Python 3.12 and CUDA acceleration. - Run GPT-OSS B20 entirely offline - Optimize text generation with GPU - Enable fast, secure inference on consumer hardware.
Мультимодальная офлайновая система детекции контрафакта (текст+изображение+таблица).
Inclusive hand gesture recognition system for assistive human–computer interaction, based on classical machine learning and MediaPipe Hands.
Real-time semantic audio codec achieving 300bps bandwidth via gen AI reconstruction.
Add a description, image, and links to the offline-inference topic page so that developers can more easily learn about it.
To associate your repository with the offline-inference topic, visit your repo's landing page and select "manage topics."