FastAPI server for running LLM models via Ollama on RTX 4090.
gelab-zero-4b-preview- Step-AI's Gelab Zero modelqwen3-vl- Qwen3 Vision-Language model (thinking mode - shows reasoning process)
On remote machine:
./sh/run_server.shOn local machine:
ssh -L 9000:localhost:80 ubuntu@<SERVER_IP>Then use client to handle the requests. See https://github.com/Sesamexx/Test-time-Scaling-Local.git for specific design.