Skip to content

feat: add GEAK online service server with local mode GPU round-robin#93

Open
xiaofei-zheng wants to merge 10 commits into
mainfrom
feature/xiaofei/claw
Open

feat: add GEAK online service server with local mode GPU round-robin#93
xiaofei-zheng wants to merge 10 commits into
mainfrom
feature/xiaofei/claw

Conversation

@xiaofei-zheng
Copy link
Copy Markdown

  • REST API + MCP server for task management (create/submit/monitor/download)
  • Local mode: auto-detect GPUs via rocm-smi, round-robin assignment across available GPUs
  • SaFE platform integration for remote workload execution (claw mode)
  • Auth service with local mode bypass (GEAK_LOCAL=true)
  • Task scheduler for periodic status polling
  • Agent skills and MCP tool definitions for Cursor integration
  • Tested: 2 concurrent kernel tasks (RMSNorm + GEMM) on separate GPUs

Made-with: Cursor

@xiaofei-zheng xiaofei-zheng marked this pull request as ready for review April 1, 2026 14:45
@xiaofei-zheng xiaofei-zheng marked this pull request as draft April 1, 2026 14:45
@xiaofei-zheng xiaofei-zheng marked this pull request as ready for review April 8, 2026 03:18
xiaofei-zheng and others added 9 commits April 21, 2026 09:51
- REST API + MCP server for task management (create/submit/monitor/download)
- Local mode: auto-detect GPUs via rocm-smi, round-robin assignment across available GPUs
- SaFE platform integration for remote workload execution (claw mode)
- Auth service with local mode bypass (GEAK_LOCAL=true)
- Task scheduler for periodic status polling
- Agent skills and MCP tool definitions for Cursor integration
- Tested: 2 concurrent kernel tasks (RMSNorm + GEMM) on separate GPUs

Made-with: Cursor
- Add gpt-5.2 and claude-opus-4-6 configs with llm-gateway endpoints
- Remove server-side kernel path/content injection from task_manager;
  kernel absolute path is now required in the caller's prompt via skill rules

Made-with: Cursor
- config.py: change default paths to /tmp/geak for local mode
- default_config.yaml: simplify to minimal config (step_limit=150,
  mode=confirm, amd_llm/claude-opus-4.6), remove verbose templates
- tools.json: remove profiling tool (unused in current workflow)

Made-with: Cursor
…th to prompt

- Move ENTRYPOINT_PRECOMMAND (setup-certs) to run after all pip installs,
  preventing certifi bundle from being overwritten by dependency reinstalls
- Append fallback kernel input path hint to prompt.md so geak CLI can
  locate uploaded kernel files when prompt lacks explicit paths

Made-with: Cursor
Pass input_path to entrypoint so the fallback note includes the exact
uploaded kernel file path and input directory as repo path.

Made-with: Cursor
Prevent full API key exposure by showing only first 4 and last 4
characters in GET/PUT /config/model responses.

Made-with: Cursor
Allow text.verbosity parameter to pass through to litellm.completion
for GPT models that support output verbosity control.
Change *txt to *.txt and add !**/requirements.txt negation rule
so that requirements.txt files are properly tracked.

Made-with: Cursor
Local mode was missing the fallback kernel path hint that remote mode
already had, causing resolve_kernel_url to fail when looking for input
files in the output directory instead of the input directory.

Made-with: Cursor
@chao-xu-spec chao-xu-spec force-pushed the feature/xiaofei/claw branch from 5245403 to 388efa5 Compare April 21, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants