feat: add LiquidAI LFM2.x parser support#552
Conversation
raullenchai
left a comment
There was a problem hiding this comment.
Thanks for tackling LFM — the parser core is well thought out (AST-based, positional-arg rejection, balanced-bracket detection with quoted-string awareness, streaming hold + flush_held_content at stream end), and the AI-assistance disclosure is appreciated. Two blockers must be fixed before this can merge, plus a few observations.
Blockers
1. aliases.json is not valid JSON — import vllm_mlx will crash
Reproducing from the PR branch:
$ python3 -c "import json; json.load(open('vllm_mlx/aliases.json'))"
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 779 column 3
The lfm2.5-1b entry is missing its closing } and trailing comma. The current shape is:
"lfm2.5-1b": {
"hf_path": "mlx-community/LFM2.5-1.2B-Instruct-4bit",
...
"supports_spec_decode": false
"diffusion-gemma-26b-4bit": { ← syntax error hereThis means the [x] Tests pass locally (python3 -m pytest tests/ -x) checkbox can't have been ticked truthfully — aliases.json is loaded at module import time, so the very first from vllm_mlx import … in any test crashes. Could you run the test suite locally end-to-end (no -x, full collection) to catch import-time failures like this one before re-pushing?
2. Alias names violate the project's <family>-<size>-<quant> SOP
Both new aliases ship without an explicit quant suffix:
"lfm2-24b": { "hf_path": "lmstudio-community/LFM2-24B-A2B-MLX-4bit", ... },
"lfm2.5-1b": { "hf_path": "mlx-community/LFM2.5-1.2B-Instruct-4bit", ... },Per the project-local naming SOP (see CLAUDE.md in this repo):
Every alias key in
vllm_mlx/aliases.jsonMUST follow<family>-<size>-<quant>. … An alias whose hf_path quant doesn't match the alias suffix … silently swaps weights on operators.
lfm2-24b is MoE (is_moe: true) with A2B active experts and a 4-bit checkpoint, and lfm2.5-1b is a 4-bit dense checkpoint. Please rename:
lfm2-24b→lfm2-24b-a2b-4bit(matches HF namingLFM2-24B-A2B-MLX-4bit)lfm2.5-1b→lfm2.5-1b-4bit
Why this matters: operators pin rapid-mlx serve <alias> in startup scripts. If a future PR offers an 8-bit variant under the same family and someone retargets the bare lfm2-24b alias to it, every pinned deployment silently doubles its VRAM and may OOM in production. The suffix makes the pin explicit and forces a documented migration when the quant tier changes. This rule cost us a revert on PR #558 (diffusion-gemma-26b bare-alias swap), so I'm tight on enforcing it.
Significant concerns (not strict blockers)
3. Author-acknowledged WIP — convert to Draft until benched
The PR description leads with:
"Work in progress: I don't have the hardware to run the full benchmarks, so the README and benchmark updates are not done."
Two ways forward, either is fine:
- (A) Mark as Draft until you can borrow time on a 32GB+ Mac. I'm happy to do the bench pass on my M3 Ultra (256GB) and post results once Blockers 1+2 are fixed — drop me a ping. The Model Onboarding SOP wants
suffix_decoding_tierset and at least a smoke run of the canonical tool-call prompts before going green. - (B) Keep as ready-for-review with
suffix_decoding_tier: "unknown"(which you already have — good) andsupports_dflash: false, but please call out in the PR body that those fields stayunknown/falseuntil benched, so a future onboarding sweep knows to fill them in.
Either way: the title should not be feat: if the work is still WIP — the convention here is feat(wip): or just Draft state. (Squash-merge keeps the prefix in the commit message.)
4. Postprocessor → parser layering inversion (NIT)
vllm_mlx/service/postprocessor.py now hard-imports LFM_CALL_START from a specific parser to extend its "plausible markup" pre-check:
from ..tool_parsers.lfm_tool_parser import LFM_CALL_START
…
_has_plausible_markup = bool(_fallback_text) and (
"<" in _fallback_text
or "{" in _fallback_text
or "[Calling" in _fallback_text
or LFM_CALL_START.search(_fallback_text) is not None
)This grows linearly with every new parser format we add — eventually the postprocessor sprouts an import-and-marker line per parser. Cleaner shape: each ToolParser subclass exposes a quick_marker_present(text) -> bool classmethod, and the postprocessor iterates over registered parsers. Out of scope for this PR (the current shape is already established with <, {, [Calling), but flagging so we can clean it up across all parsers in one go.
5. extract_tool_calls_streaming loose end-marker (NIT)
Both AutoToolParser (your changes) and LfmToolParser triggered extraction on a bare ] in the delta, then re-trigger on every subsequent ]. The _streaming_tools_emitted flag correctly fixes the "re-emit corrupts arguments" case (good catch), but every chatty response containing ] (markdown lists, JSON output, even code) now runs the full extractor and fails — adding latency per ]-containing delta on non-tool responses.
Lighter shape: gate the bare-] trigger on LFM_CALL_START.search(current_text) being non-None first. That keeps the extra extractor call off the prose path entirely.
Not blocking — performance impact is per-delta-with-], not per-token — but worth tightening in a follow-up.
What's good
- AST-only argument parsing (no
eval👍), explicit rejection of positional args, and bracket-balanced extraction with string/escape awareness — exactly right for this format. _safe_content_prefixholding partial[name(…until balance arrives is the correct streaming shape, and theflush_held_contenthook landing where it does means the abstract-parser contract is honored.- 337 LOC test file + the streaming-parity fixture + the native-format flag update is good coverage.
- Honest AI-assistance disclosure and Codex review attribution — appreciated.
Summary
- Step 0 (does this solve a real product problem): ✅ LFM2 is a popular small-model line and #85 is a real request.
- Supply-chain audit: ✅ Clean — no new deps, parser uses
astfrom stdlib only, no network calls. - Action:
- Fix Blocker 1 (JSON syntax) and re-run
pytest tests/ -k tool_parserlocally to confirm imports work. - Fix Blocker 2 (rename to
lfm2-24b-a2b-4bitandlfm2.5-1b-4bit). - Decide on #3 — convert to Draft and let me bench, or stay ready-for-review with explicit "unknown until benched" call-out.
- #4 and #5 are NITs — happy to land follow-ups myself after merge.
- Fix Blocker 1 (JSON syntax) and re-run
Thanks again for the well-structured parser implementation — once Blockers 1 + 2 are fixed I can take it from here on bench data.
|
@raullenchai thank you for the thorough review! I've converted to draft and will fix 1 & 2 before pinging you for help with the benchmarks! |
299b304 to
90b1955
Compare
Thanks for the catch! Looks like when I merged from main this got overwritten -- I didn't rerun the tests after that, so it was missed. Fixed and rerun:
I had not known this, thank you for educating me! Fixed and rebased into the main change. Bench dataI got a colleague to run benchmarks on their M4 Max -- this is the snippet from the Generated: 2026-06-12T14:08:04
|
Implement `LfmToolParser` for the Liquid.AI models. Close raullenchai#85.
90b1955 to
109b686
Compare
|
@raullenchai I've addressed #1, #2, #3, and #5 and added a TODO: comment for #4 to be cleaned up in a future pass. Unfortunately I'm not able to complete the benchmarking with all the models -- although I have confirmed with a colleague's machine that the benchmarks do at least run with the LFM models. I'll leave it as draft until the benchmarking can be done, but I think it's at a point where you could pick up the benchmark runs? |
Implement
LfmToolParserfor the Liquid.AI models.Why is this needed?
Fixes #85, adding support for a popular model series.
AI assistance disclosure
Initial implementation generated by Claude, review and additional tests from Codex, and manual touch-ups and refactorings in concert with Gemini.
Test plan
tests/test_lfm_tool_parser.pytests/test_native_tool_format.pyto markLfmToolParseras non-nativetests/test_tool_call_streaming_parity.pyto add a streaming-parity fixture for the LFM parserChecklist
python3 -m pytest tests/ -x)ruff check && ruff format --check)python3 -m scripts.pr_validate.pr_validate <PR#>— see CONTRIBUTING.md (opt out heavy steps withPR_VALIDATE_NO_DEEPSEEK=1 PR_VALIDATE_NO_STRESS=1if you don't have the hardware/keys)NOTE: Leaving in draft state till benchmarks can be run.