Test-time Scaling Remote

FastAPI server for running LLM models via Ollama on RTX 4090.

Supported Models

gelab-zero-4b-preview - Step-AI's Gelab Zero model
qwen3-vl - Qwen3 Vision-Language model (thinking mode - shows reasoning process)

On remote machine:

./sh/run_server.sh

On local machine:

ssh -L 9000:localhost:80 ubuntu@<SERVER_IP>

Then use client to handle the requests. See https://github.com/Sesamexx/Test-time-Scaling-Local.git for specific design.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
sh		sh
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
server.py		server.py