Trying to export LiquidAI/LFM2-ColBERT-350M (and jinaai/jina-colbert-v2 as a fallback) to ONNX for use with ColGREP via pylate-onnx-export (package colbert_export v0.1.0, pulled from PyPI).
The exporter only produces tokenizer.json + config_sentence_transformers.json — no model.onnx — and exits with a series of errors. Each error below was resolved by a small patch; the final blocker is architectural and specific to LFM2.
Environment: Python 3.12, pylate-onnx-export==0.1.0, torch==2.x, Linux.
Bug 1 — KeyError: 'token_type_ids' on any non-ModernBERT architecture
Model architecture: Lfm2Model
Uses token_type_ids: True
Saved tokenizer to: ...
Saved config to: ...
Error: 'token_type_ids'
detect_model_architecture in colbert_export/export.py has a hardcoded allowlist:
uses_token_type_ids = True
if "ModernBert" in model_class_name:
uses_token_type_ids = False
Any non-ModernBERT backbone (LFM2, XLM-Roberta variants, Qwen, Llama-based ColBERTs…) defaults to True, then the exporter does inputs["token_type_ids"] against a tokenizer that didn't emit that key → KeyError.
Proposed fix: probe the tokenizer directly — authoritative for any architecture:
tokenizer = pylate_model[0].tokenizer
probe = tokenizer("probe", return_tensors="pt")
uses_token_type_ids = "token_type_ids" in probe
Bug 2 — new torch.onnx dynamo path fails on non-stock architectures
With bug 1 fixed, the export proceeds to torch.onnx.export and fails:
RuntimeError: 8*s72 (…) is not tracked with proxy for
<torch.fx.experimental.proxy_tensor._ModuleStackTracer object at 0x...>
[torch.onnx] Obtain model graph for `ColBERTForONNX([...]` with
`torch.export.export(..., strict=False)`... ❌
[torch.onnx] Obtain model graph for `ColBERTForONNX([...]` with
`torch.export.export(..., strict=True)`... ❌
Recent PyTorch defaults the exporter to the dynamo path (torch.export.export → onnxscript), which doesn't handle some shape-dependent control flow in non-stock modeling code.
Proposed fix: pass dynamo=False to torch.onnx.export to fall back to the legacy TorchScript tracer, which is far more forgiving. Optionally make it configurable.
Bug 3 — missing onnxscript dependency not declared
Even if you keep the dynamo path, onnxscript is needed and not declared in the package deps:
Error: No module named 'onnxscript'
Proposed fix: add onnxscript to install_requires (or at minimum document it in the README prerequisites).
Bug 4 — no trust_remote_code=True for models that ship custom modeling code
When trying Jina-v2 as an alternative:
Error: jinaai/xlm-roberta-flash-implementation You can inspect the repository
content at https://hf.co/jinaai/jina-colbert-v2.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
pylate_models.ColBERT(...) accepts trust_remote_code, but the exporter doesn't forward it. Blocks any HF model that ships custom modeling code (Jina, a bunch of community ColBERTs).
Proposed fix:
pylate_model = pylate_models.ColBERT(
model_name_or_path=model_name,
device="cpu",
do_query_expansion=False,
trust_remote_code=True,
)
(Or expose it as a CLI flag, defaulting true — the user already has to opt in to running the exporter on this specific model.)
Bug 5 — BF16 weights cause symbolic-registry misses during ONNX graph build
On Jina-v2 with all the above fixed, the full 24-layer graph traces, then fails at linalg_vector_norm/clamp_min/aten::add inside F.normalize:
Error: Argument passed to at() was not in the map.
(This is std::unordered_map::at() throwing out_of_range — the ONNX symbolic registry has no handler for the op/dtype combination.)
Root cause: Jina's xlm-roberta-flash-implementation loads in BFloat16; several ONNX symbolic handlers (at least through opset 20) have no BF16 coverage for these ops.
Proposed fix: force FP32 before export:
model = ColBERTForONNX(pylate_model, uses_token_type_ids=arch_info["uses_token_type_ids"])
model = model.float()
model.eval()
After this patch, Jina-v2 still fails (see bug 6), but this unblocks any model that loads in BF16 by default.
Bug 6 — architectural blocker on LFM2 and Jina's flash-XLM-Roberta (not actionable upstream)
With all above patches applied, both models still hit variants of Argument passed to at() was not in the map.:
- LFM2 (
LiquidAI/LFM2-ColBERT-350M): fails inside Liquid-Foundation-Model blocks. Liquid's gated convolution and linear-recurrence ops have no ONNX equivalents at all — moving handlers to a higher opset doesn't help.
- Jina-v2 (
jinaai/jina-colbert-v2): fails inside the XLM-Roberta-flash encoder at seemingly ordinary ops (aten::linear, aten::add, aten::linalg_vector_norm). Each successive workaround (hand-rolled L2 normalize, tensor-wrapped epsilon, skipping normalization entirely) just moves the error to the next op.
Trying to export
LiquidAI/LFM2-ColBERT-350M(andjinaai/jina-colbert-v2as a fallback) to ONNX for use with ColGREP viapylate-onnx-export(packagecolbert_exportv0.1.0, pulled from PyPI).The exporter only produces
tokenizer.json+config_sentence_transformers.json— nomodel.onnx— and exits with a series of errors. Each error below was resolved by a small patch; the final blocker is architectural and specific to LFM2.Environment: Python 3.12,
pylate-onnx-export==0.1.0,torch==2.x, Linux.Bug 1 —
KeyError: 'token_type_ids'on any non-ModernBERT architecturedetect_model_architectureincolbert_export/export.pyhas a hardcoded allowlist:Any non-ModernBERT backbone (LFM2, XLM-Roberta variants, Qwen, Llama-based ColBERTs…) defaults to
True, then the exporter doesinputs["token_type_ids"]against a tokenizer that didn't emit that key → KeyError.Proposed fix: probe the tokenizer directly — authoritative for any architecture:
Bug 2 — new
torch.onnxdynamo path fails on non-stock architecturesWith bug 1 fixed, the export proceeds to
torch.onnx.exportand fails:Recent PyTorch defaults the exporter to the dynamo path (
torch.export.export→onnxscript), which doesn't handle some shape-dependent control flow in non-stock modeling code.Proposed fix: pass
dynamo=Falsetotorch.onnx.exportto fall back to the legacy TorchScript tracer, which is far more forgiving. Optionally make it configurable.Bug 3 — missing
onnxscriptdependency not declaredEven if you keep the dynamo path,
onnxscriptis needed and not declared in the package deps:Proposed fix: add
onnxscriptto install_requires (or at minimum document it in the README prerequisites).Bug 4 — no
trust_remote_code=Truefor models that ship custom modeling codeWhen trying Jina-v2 as an alternative:
pylate_models.ColBERT(...)acceptstrust_remote_code, but the exporter doesn't forward it. Blocks any HF model that ships custom modeling code (Jina, a bunch of community ColBERTs).Proposed fix:
(Or expose it as a CLI flag, defaulting true — the user already has to opt in to running the exporter on this specific model.)
Bug 5 — BF16 weights cause symbolic-registry misses during ONNX graph build
On Jina-v2 with all the above fixed, the full 24-layer graph traces, then fails at
linalg_vector_norm/clamp_min/aten::addinsideF.normalize:(This is
std::unordered_map::at()throwingout_of_range— the ONNX symbolic registry has no handler for the op/dtype combination.)Root cause: Jina's
xlm-roberta-flash-implementationloads in BFloat16; several ONNX symbolic handlers (at least through opset 20) have no BF16 coverage for these ops.Proposed fix: force FP32 before export:
After this patch, Jina-v2 still fails (see bug 6), but this unblocks any model that loads in BF16 by default.
Bug 6 — architectural blocker on LFM2 and Jina's flash-XLM-Roberta (not actionable upstream)
With all above patches applied, both models still hit variants of
Argument passed to at() was not in the map.:LiquidAI/LFM2-ColBERT-350M): fails inside Liquid-Foundation-Model blocks. Liquid's gated convolution and linear-recurrence ops have no ONNX equivalents at all — moving handlers to a higher opset doesn't help.jinaai/jina-colbert-v2): fails inside the XLM-Roberta-flash encoder at seemingly ordinary ops (aten::linear,aten::add,aten::linalg_vector_norm). Each successive workaround (hand-rolled L2 normalize, tensor-wrapped epsilon, skipping normalization entirely) just moves the error to the next op.