Skip to content

Add OpenClaw recipes and document VRAM-tiered context defaults#1

Open
kenvandine wants to merge 16 commits into
lemonade-sdk:mainfrom
kenvandine:openclaw_recipes
Open

Add OpenClaw recipes and document VRAM-tiered context defaults#1
kenvandine wants to merge 16 commits into
lemonade-sdk:mainfrom
kenvandine:openclaw_recipes

Conversation

@kenvandine

@kenvandine kenvandine commented Apr 9, 2026

Copy link
Copy Markdown
Member

This PR adds a collection of model recipes curated for OpenClaw-style agent and assistant use cases.

Changes

  • Add openclaw/ directory with recipes for:
    • GLM-4.7-Flash-GGUF
    • Gemma-4-26B-A4B-GGUF and gemma-3-4b-it-GGUF (with corrected mmproj filenames)
    • Qwen3-8B-GGUF (ctx_size bumped to 32768)
    • Qwen3-Coder-30B-A3B-Instruct-GGUF
    • Qwen3.5-9B-GGUF (vision + tool-calling)
    • Qwen3.5-35B-A3B-Q4_K_M (vision + tool-calling)
    • gpt-oss-20b-GGUF
  • Add openclaw/README.md documenting VRAM-tiered context size defaults
  • Update root README.md to reference the new openclaw/ directory

Comment thread openclaw/gemma-3-4b-it-GGUF.json Outdated
},
"model_name": "user.gemma-3-4b-it-GGUF",
"labels": [
"appear-builtin",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“appear-builtin” means “don’t show the user. in the name”, is that intentional here?

Main reason I ask is because the name gemma-3-4b-it-GGUF is very close to a builtin name.

Makes me wonder if we should have any special prefix or suffix for recipes. Like openclaw.gemma-3-4b-it-GGUF or gemma-3-4b-it-GGUF-OpenClaw

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Switched to openclaw. prefix across all recipes and dropped appear-builtin — so the names will display as openclaw.gemma-3-4b-it-GGUF etc.

Comment thread openclaw/Gemma-4-26B-A4B.json Outdated
"main": "unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_M",
"mmproj": "mmproj-F16.gguf"
},
"model_name": "user.Gemma-4-26B-A4B",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above; in addition let’s make sure the naming scheme is consistent? e.g., the prior file had -GGUF but this one doesn’t.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — model_name is now openclaw.Gemma-4-26B-A4B-GGUF.

"recipe_options": {
"ctx_size": 32768
},
"size": 2.56

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy thought… if we’re hardcoding the context size on these, could we statically factor that into the size? That would make VRAM capacity based filtering much easier on the client side.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about that and come back to you.

@@ -0,0 +1,13 @@
{
"checkpoint": "unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF:Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the reason for using Qwen2.5 and Qwen3-VL? I would have thought the newer Qwen3.5 models of the same size would be strictly better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just from the list of recommended models I found from other sources, now that I've proven the recipes mechanism works well with a decent collection of models I can look at refreshing that list and testing some of the newer models and remove some of the older models.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked — there's no Qwen3.5-Coder at ~7-9B (the coder line jumps to Qwen3-Coder-30B and Qwen3-Coder-Next at 80B). Replaced Qwen2.5-Coder-7B with Qwen3.5-9B (5.68 GB Q4_K_M), which is the better general model at the same size tier. For vision, Qwen2.5-VL-7B is dropped since Qwen3-VL-8B is already in the set and is the more capable vision-specific model.

@bitgamma bitgamma Apr 10, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Qwen3.5 models are vision enabled and from my tests (I tested vision enabled model a lot) they are infinitely better than the older Qwen3-VL

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know — updated both Qwen3.5-9B and Qwen3.5-35B-A3B with mmproj and the vision label, and dropped the Qwen3-VL-8B and Qwen3-VL-30B recipes.

kenvandine and others added 11 commits April 10, 2026 10:43
- Replace user. prefix with openclaw. and drop appear-builtin label on all recipes
- Add -GGUF suffix to Gemma-4-26B-A4B model name for consistency
- Replace Qwen2.5-Coder-7B with Qwen3.5-9B (newer generation, better across the board)
- Remove Qwen2.5-VL-7B (superseded by Qwen3-VL-8B already in the set)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Qwen3.5 models are all vision-enabled and outperform Qwen3-VL.
- Add mmproj and vision label to Qwen3.5-9B and Qwen3.5-35B-A3B
- Remove Qwen3-VL-8B and Qwen3-VL-30B recipes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds --chat-template-kwargs to explicitly disable thinking mode, which
may cause openclaw to not respond due to <think> blocks in the output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both models are small enough to fit in iGPU shared memory (2.3 GB and
4.1 GB at Q4_K_M) and support tool-calling via llama.cpp CPU/Vulkan backends.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix Phi-4-mini checkpoint to use quantization variant (Q4_K_M) instead
of a full filename that doesn't exist in the repo. Remove Mistral-7B-v0.3
recipe as it requires HF authentication to download.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop the Q4_K_M variant since it can't find a matching file in the repo.
Let lemonade pick the first .gguf file automatically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phi-4-mini was slow on Intel Vulkan and not responding in openclaw.
Llama-3.2-3B has well-supported tool calling format and a similar size (~2 GB).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix reserveTokensFloor complaint by matching ctx_size used by other recipes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants