Skip to content

pylate-onnx-export fails on several PyLate-compatible models (LFM2, Jina-v2) — multiple small bugs, proposed fixes #93

@SylvainDeaure

Description

@SylvainDeaure

Trying to export LiquidAI/LFM2-ColBERT-350M (and jinaai/jina-colbert-v2 as a fallback) to ONNX for use with ColGREP via pylate-onnx-export (package colbert_export v0.1.0, pulled from PyPI).

The exporter only produces tokenizer.json + config_sentence_transformers.json — no model.onnx — and exits with a series of errors. Each error below was resolved by a small patch; the final blocker is architectural and specific to LFM2.

Environment: Python 3.12, pylate-onnx-export==0.1.0, torch==2.x, Linux.

Bug 1 — KeyError: 'token_type_ids' on any non-ModernBERT architecture

Model architecture: Lfm2Model
Uses token_type_ids: True
Saved tokenizer to: ...
Saved config to: ...
Error: 'token_type_ids'

detect_model_architecture in colbert_export/export.py has a hardcoded allowlist:

uses_token_type_ids = True
if "ModernBert" in model_class_name:
    uses_token_type_ids = False

Any non-ModernBERT backbone (LFM2, XLM-Roberta variants, Qwen, Llama-based ColBERTs…) defaults to True, then the exporter does inputs["token_type_ids"] against a tokenizer that didn't emit that key → KeyError.

Proposed fix: probe the tokenizer directly — authoritative for any architecture:

tokenizer = pylate_model[0].tokenizer
probe = tokenizer("probe", return_tensors="pt")
uses_token_type_ids = "token_type_ids" in probe

Bug 2 — new torch.onnx dynamo path fails on non-stock architectures

With bug 1 fixed, the export proceeds to torch.onnx.export and fails:

RuntimeError: 8*s72 (…) is not tracked with proxy for
  <torch.fx.experimental.proxy_tensor._ModuleStackTracer object at 0x...>
[torch.onnx] Obtain model graph for `ColBERTForONNX([...]` with
  `torch.export.export(..., strict=False)`... ❌
[torch.onnx] Obtain model graph for `ColBERTForONNX([...]` with
  `torch.export.export(..., strict=True)`... ❌

Recent PyTorch defaults the exporter to the dynamo path (torch.export.exportonnxscript), which doesn't handle some shape-dependent control flow in non-stock modeling code.

Proposed fix: pass dynamo=False to torch.onnx.export to fall back to the legacy TorchScript tracer, which is far more forgiving. Optionally make it configurable.

Bug 3 — missing onnxscript dependency not declared

Even if you keep the dynamo path, onnxscript is needed and not declared in the package deps:

Error: No module named 'onnxscript'

Proposed fix: add onnxscript to install_requires (or at minimum document it in the README prerequisites).

Bug 4 — no trust_remote_code=True for models that ship custom modeling code

When trying Jina-v2 as an alternative:

Error: jinaai/xlm-roberta-flash-implementation You can inspect the repository
  content at https://hf.co/jinaai/jina-colbert-v2.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.

pylate_models.ColBERT(...) accepts trust_remote_code, but the exporter doesn't forward it. Blocks any HF model that ships custom modeling code (Jina, a bunch of community ColBERTs).

Proposed fix:

pylate_model = pylate_models.ColBERT(
    model_name_or_path=model_name,
    device="cpu",
    do_query_expansion=False,
    trust_remote_code=True,
)

(Or expose it as a CLI flag, defaulting true — the user already has to opt in to running the exporter on this specific model.)

Bug 5 — BF16 weights cause symbolic-registry misses during ONNX graph build

On Jina-v2 with all the above fixed, the full 24-layer graph traces, then fails at linalg_vector_norm/clamp_min/aten::add inside F.normalize:

Error: Argument passed to at() was not in the map.

(This is std::unordered_map::at() throwing out_of_range — the ONNX symbolic registry has no handler for the op/dtype combination.)

Root cause: Jina's xlm-roberta-flash-implementation loads in BFloat16; several ONNX symbolic handlers (at least through opset 20) have no BF16 coverage for these ops.

Proposed fix: force FP32 before export:

model = ColBERTForONNX(pylate_model, uses_token_type_ids=arch_info["uses_token_type_ids"])
model = model.float()
model.eval()

After this patch, Jina-v2 still fails (see bug 6), but this unblocks any model that loads in BF16 by default.

Bug 6 — architectural blocker on LFM2 and Jina's flash-XLM-Roberta (not actionable upstream)

With all above patches applied, both models still hit variants of Argument passed to at() was not in the map.:

  • LFM2 (LiquidAI/LFM2-ColBERT-350M): fails inside Liquid-Foundation-Model blocks. Liquid's gated convolution and linear-recurrence ops have no ONNX equivalents at all — moving handlers to a higher opset doesn't help.
  • Jina-v2 (jinaai/jina-colbert-v2): fails inside the XLM-Roberta-flash encoder at seemingly ordinary ops (aten::linear, aten::add, aten::linalg_vector_norm). Each successive workaround (hand-rolled L2 normalize, tensor-wrapped epsilon, skipping normalization entirely) just moves the error to the next op.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions