Skip to content

feat: add LiquidAI LFM2.x parser support#552

Draft
necaris wants to merge 1 commit into
raullenchai:mainfrom
necaris:feat/lfm2.5-parser
Draft

feat: add LiquidAI LFM2.x parser support#552
necaris wants to merge 1 commit into
raullenchai:mainfrom
necaris:feat/lfm2.5-parser

Conversation

@necaris

@necaris necaris commented Jun 11, 2026

Copy link
Copy Markdown

Implement LfmToolParser for the Liquid.AI models.

Why is this needed?

Fixes #85, adding support for a popular model series.

AI assistance disclosure

Initial implementation generated by Claude, review and additional tests from Codex, and manual touch-ups and refactorings in concert with Gemini.

By submitting this PR I confirm I can explain the intent, risk, and behavior of every non-generated change in this PR. For any generated / boilerplate / scaffolded sections, I've identified them above and can describe how I verified them.

Test plan

  • new tests/test_lfm_tool_parser.py
  • updated tests/test_native_tool_format.py to mark LfmToolParser as non-native
  • updated tests/test_tool_call_streaming_parity.py to add a streaming-parity fixture for the LFM parser

Checklist

  • Tests pass locally (python3 -m pytest tests/ -x)
  • Lint passes (ruff check && ruff format --check)
  • Self-validated with python3 -m scripts.pr_validate.pr_validate <PR#> — see CONTRIBUTING.md (opt out heavy steps with PR_VALIDATE_NO_DEEPSEEK=1 PR_VALIDATE_NO_STRESS=1 if you don't have the hardware/keys)
  • If new tests touch a critical code path (parser / scheduler / security), I've spot-checked that they fail when the corresponding production line is broken (see SOP §Step 3)
  • Updated README/docs if applicable
  • No breaking changes to existing API

NOTE: Leaving in draft state till benchmarks can be run.

@raullenchai raullenchai left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling LFM — the parser core is well thought out (AST-based, positional-arg rejection, balanced-bracket detection with quoted-string awareness, streaming hold + flush_held_content at stream end), and the AI-assistance disclosure is appreciated. Two blockers must be fixed before this can merge, plus a few observations.

Blockers

1. aliases.json is not valid JSON — import vllm_mlx will crash

Reproducing from the PR branch:

$ python3 -c "import json; json.load(open('vllm_mlx/aliases.json'))"
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 779 column 3

The lfm2.5-1b entry is missing its closing } and trailing comma. The current shape is:

  "lfm2.5-1b": {
    "hf_path": "mlx-community/LFM2.5-1.2B-Instruct-4bit",
    ...
    "supports_spec_decode": false
  "diffusion-gemma-26b-4bit": {   ← syntax error here

This means the [x] Tests pass locally (python3 -m pytest tests/ -x) checkbox can't have been ticked truthfully — aliases.json is loaded at module import time, so the very first from vllm_mlx import … in any test crashes. Could you run the test suite locally end-to-end (no -x, full collection) to catch import-time failures like this one before re-pushing?

2. Alias names violate the project's <family>-<size>-<quant> SOP

Both new aliases ship without an explicit quant suffix:

  "lfm2-24b": { "hf_path": "lmstudio-community/LFM2-24B-A2B-MLX-4bit", ... },
  "lfm2.5-1b": { "hf_path": "mlx-community/LFM2.5-1.2B-Instruct-4bit", ... },

Per the project-local naming SOP (see CLAUDE.md in this repo):

Every alias key in vllm_mlx/aliases.json MUST follow <family>-<size>-<quant>. … An alias whose hf_path quant doesn't match the alias suffix … silently swaps weights on operators.

lfm2-24b is MoE (is_moe: true) with A2B active experts and a 4-bit checkpoint, and lfm2.5-1b is a 4-bit dense checkpoint. Please rename:

  • lfm2-24blfm2-24b-a2b-4bit (matches HF naming LFM2-24B-A2B-MLX-4bit)
  • lfm2.5-1blfm2.5-1b-4bit

Why this matters: operators pin rapid-mlx serve <alias> in startup scripts. If a future PR offers an 8-bit variant under the same family and someone retargets the bare lfm2-24b alias to it, every pinned deployment silently doubles its VRAM and may OOM in production. The suffix makes the pin explicit and forces a documented migration when the quant tier changes. This rule cost us a revert on PR #558 (diffusion-gemma-26b bare-alias swap), so I'm tight on enforcing it.

Significant concerns (not strict blockers)

3. Author-acknowledged WIP — convert to Draft until benched

The PR description leads with:

"Work in progress: I don't have the hardware to run the full benchmarks, so the README and benchmark updates are not done."

Two ways forward, either is fine:

  • (A) Mark as Draft until you can borrow time on a 32GB+ Mac. I'm happy to do the bench pass on my M3 Ultra (256GB) and post results once Blockers 1+2 are fixed — drop me a ping. The Model Onboarding SOP wants suffix_decoding_tier set and at least a smoke run of the canonical tool-call prompts before going green.
  • (B) Keep as ready-for-review with suffix_decoding_tier: "unknown" (which you already have — good) and supports_dflash: false, but please call out in the PR body that those fields stay unknown/false until benched, so a future onboarding sweep knows to fill them in.

Either way: the title should not be feat: if the work is still WIP — the convention here is feat(wip): or just Draft state. (Squash-merge keeps the prefix in the commit message.)

4. Postprocessor → parser layering inversion (NIT)

vllm_mlx/service/postprocessor.py now hard-imports LFM_CALL_START from a specific parser to extend its "plausible markup" pre-check:

from ..tool_parsers.lfm_tool_parser import LFM_CALL_START_has_plausible_markup = bool(_fallback_text) and (
    "<" in _fallback_text
    or "{" in _fallback_text
    or "[Calling" in _fallback_text
    or LFM_CALL_START.search(_fallback_text) is not None
)

This grows linearly with every new parser format we add — eventually the postprocessor sprouts an import-and-marker line per parser. Cleaner shape: each ToolParser subclass exposes a quick_marker_present(text) -> bool classmethod, and the postprocessor iterates over registered parsers. Out of scope for this PR (the current shape is already established with <, {, [Calling), but flagging so we can clean it up across all parsers in one go.

5. extract_tool_calls_streaming loose end-marker (NIT)

Both AutoToolParser (your changes) and LfmToolParser triggered extraction on a bare ] in the delta, then re-trigger on every subsequent ]. The _streaming_tools_emitted flag correctly fixes the "re-emit corrupts arguments" case (good catch), but every chatty response containing ] (markdown lists, JSON output, even code) now runs the full extractor and fails — adding latency per ]-containing delta on non-tool responses.

Lighter shape: gate the bare-] trigger on LFM_CALL_START.search(current_text) being non-None first. That keeps the extra extractor call off the prose path entirely.

Not blocking — performance impact is per-delta-with-], not per-token — but worth tightening in a follow-up.

What's good

  • AST-only argument parsing (no eval 👍), explicit rejection of positional args, and bracket-balanced extraction with string/escape awareness — exactly right for this format.
  • _safe_content_prefix holding partial [name(… until balance arrives is the correct streaming shape, and the flush_held_content hook landing where it does means the abstract-parser contract is honored.
  • 337 LOC test file + the streaming-parity fixture + the native-format flag update is good coverage.
  • Honest AI-assistance disclosure and Codex review attribution — appreciated.

Summary

  • Step 0 (does this solve a real product problem): ✅ LFM2 is a popular small-model line and #85 is a real request.
  • Supply-chain audit: ✅ Clean — no new deps, parser uses ast from stdlib only, no network calls.
  • Action:
    • Fix Blocker 1 (JSON syntax) and re-run pytest tests/ -k tool_parser locally to confirm imports work.
    • Fix Blocker 2 (rename to lfm2-24b-a2b-4bit and lfm2.5-1b-4bit).
    • Decide on #3 — convert to Draft and let me bench, or stay ready-for-review with explicit "unknown until benched" call-out.
    • #4 and #5 are NITs — happy to land follow-ups myself after merge.

Thanks again for the well-structured parser implementation — once Blockers 1 + 2 are fixed I can take it from here on bench data.

@necaris necaris marked this pull request as draft June 12, 2026 15:54
@necaris

necaris commented Jun 12, 2026

Copy link
Copy Markdown
Author

@raullenchai thank you for the thorough review! I've converted to draft and will fix 1 & 2 before pinging you for help with the benchmarks!

@necaris necaris force-pushed the feat/lfm2.5-parser branch 2 times, most recently from 299b304 to 90b1955 Compare June 12, 2026 17:37
@necaris

necaris commented Jun 12, 2026

Copy link
Copy Markdown
Author
  1. aliases.json is not valid JSON — import vllm_mlx will crash

Thanks for the catch! Looks like when I merged from main this got overwritten -- I didn't rerun the tests after that, so it was missed. Fixed and rerun:

=========================================== 4833 passed, 19 skipped, 155 deselected, 6 xfailed in 60.09s (0:01:00) ===========================================
  1. Alias names violate the project's -- SOP

I had not known this, thank you for educating me! Fixed and rebased into the main change.

Bench data

I got a colleague to run benchmarks on their M4 Max -- this is the snippet from the scorecard.md. I'm not sure what I need to extract and add to the PR but I'll dig further... really just saying I've got this.

Generated: 2026-06-12T14:08:04

Model Decode TPS Cold TTFT Cached TTFT Tool % Score Status
lfm2-24b-a2b-4bit 148.9 51ms 88ms 100% 385.5 OK
lfm2.5-1b-4bit 413.8 119ms 60ms 0% 682.4 OK

Implement `LfmToolParser` for the Liquid.AI models. Close raullenchai#85.
@necaris necaris force-pushed the feat/lfm2.5-parser branch from 90b1955 to 109b686 Compare June 14, 2026 14:53
@necaris necaris requested a review from raullenchai June 14, 2026 14:53
@necaris

necaris commented Jun 14, 2026

Copy link
Copy Markdown
Author

@raullenchai I've addressed #1, #2, #3, and #5 and added a TODO: comment for #4 to be cleaned up in a future pass. Unfortunately I'm not able to complete the benchmarking with all the models -- although I have confirmed with a colleague's machine that the benchmarks do at least run with the LFM models.

I'll leave it as draft until the benchmarking can be done, but I think it's at a point where you could pick up the benchmark runs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add Liquid LFM2 support (454K downloads) — needs parser research

2 participants