offline-inference

Here are 10 public repositories matching this topic...

dineshsoudagar / local-llms-on-android

Run local LLMs like Gemma, Qwen, and LLaMA on Android for offline, private, real-time chat and question answering with LiteRT and ONNX Runtime.

android chatbot android-app litert on-device-ai mobile-ai onnx-runtime huggingface-tokenizers local-llm qwen llama3 local-llm-integration offline-inference litert-lm gemma4 gemma4-2b gemma4-e4b

Updated May 6, 2026
Kotlin

lunashia / o-m_beatmap_trainer

Star

Training pipeline for an osu!mania 7k next-event model, designed as the upstream predictor for audio-driven 7k map generation systems.

Updated May 6, 2026
Python

iBz-04 / offeline

Star

run AI without internet, on the web & desktop

desktop-app webgl electron-app webgpu privacy-enhancing-technologies inference-engine ondeviceai ai-lab on-device-ai llamacpp local-ai ollama node-llama-cpp ollama-app quantizedai offline-inference privateai

Updated Apr 29, 2026
TypeScript

crowdaware-inno-wing-iot / crowdaware-dual-cam-test

Star

Offline CrowdAware system for Raspberry Pi 4B and Heltec LoRa V3 using Raspberry Pi Camera Module 3 and MLX90640 Thermal Camera.

raspberry-pi yolo thermal-imaging mlx90640 onnxruntime raspberry-pi-4b crowd-detection esp32-s3 heltec-lora-32-v3 offline-inference

Updated Apr 26, 2026
Python

A Cloud-to-Edge MLOps pipeline for offline industrial diagnostics. Fine-tunes Phi-3-mini (3.8B) on Cloud GPUs via QLoRA, quantizes to INT4, and deploys as a CPU-optimized ONNX microservice for industrial standard sensor logs.

react python docker typescript quantization mlops embedded-ai fastapi edge-ai industrial-iot onnx-runtime qlora small-language-models phi-3 offline-inference

Updated Jan 20, 2026
Python

brotSchimmelt / llm-offline-inference

Star

A comprehensive toolkit for streamlining and simplifying the offline inference process for LLMs across various models and libraries.

batching llm llm-inference structured-generation offline-inference

Updated Feb 14, 2025
Python

tk-yasuno / gpt-oss-20b-local-execute

Star

GPT-OSS B20 Local Execution. Lightweight local environment for running it with Python 3.12 and CUDA acceleration. - Run GPT-OSS B20 entirely offline - Optimize text generation with GPU - Enable fast, secure inference on consumer hardware.