Skip to content

olive auto-opt for CPU INT4 fails without --use_model_builderDynamicCache.from_legacy_cache AttributeError with transformers 5.x #2335

@HamidOna

Description

@HamidOna

Describe the bug

Running olive auto-opt to optimize a model for CPU with INT4 precision fails when --use_model_builder is not specified. The default ONNX export path in olive/passes/onnx/conversion.py calls DynamicCache.from_legacy_cache(), which was removed in transformers 5.x, causing an AttributeError.

Adding --use_model_builder (and --use_ort_genai) bypasses this by using the onnxruntime-genai model builder instead of torch.onnx.export, and the optimization + inference completes successfully.

The --use_model_builder flag is documented as optional, but omitting it when targeting CPU with INT4 precision on transformers 5.x results in a crash. The official quickstart example in the README omits this flag, which may lead users to the same failure.

This was discovered while investigating a related issue where, on older package versions (transformers 4.x, onnxruntime-genai 0.5.0), the model builds successfully without --use_model_builder but fails at inference time with an OrtException related to GatherBlockQuantized and uint8 tensors. On current package versions, the failure occurs earlier — at the conversion stage itself.

To Reproduce

  1. Install current packages:

    • olive-ai[all] (0.11.0)
    • onnxruntime-genai==0.11.4
    • transformers==5.1.0
    • torch==2.10.0
    • Python 3.13
  2. Run optimization without --use_model_builder:

python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path models/qwen3-cpu-int4 \
  --device cpu \
  --provider CPUExecutionProvider \
  --precision int4 \
  --log_level 1
  1. Observe crash at the ONNX conversion stage.

Expected behavior

olive auto-opt should either:

  1. Default to --use_model_builder when targeting CPU with INT4 precision, or
  2. Be compatible with transformers 5.x on the standard ONNX export path, or
  3. Surface a clear error message directing users to use --use_model_builder

Olive config

No JSON config — reproduced via CLI.

Working command:

python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path models/qwen3-cpu-int4 \
  --device cpu \
  --provider CPUExecutionProvider \
  --use_model_builder \
  --use_ort_genai \
  --precision int4 \
  --log_level 1

Olive logs

Traceback (most recent call last):
  File "...\olive\engine\engine.py", line 732, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, output_model_path)
  File "...\olive\systems\local.py", line 52, in run_pass
    return the_pass.run(model_config, output_model_path)
  File "...\olive\passes\onnx\conversion.py", line 196, in run
    return self._run_for_config(model_config, config, output_model_path)
  File "...\olive\passes\onnx\conversion.py", line 390, in _run_for_config
    return OnnxConversion._convert_model_on_device(...)
  File "...\olive\passes\onnx\conversion.py", line 596, in _convert_model_on_device
    ir_model = _export_pytorch_model(...)
  File "...\torch\utils\_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "...\olive\passes\onnx\conversion.py", line 267, in _export_pytorch_model
    torch.onnx.export(...)
  File "...\torch\onnx\__init__.py", line 341, in export
    export(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 552, in export
    _export(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 1513, in _export
    graph, params_dict, torch_out = _model_to_graph(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 1112, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 996, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 903, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(...)
  File "...\torch\jit\_trace.py", line 1432, in _get_trace_graph
    outs = ONNXTracedModule(...)(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "...\torch\jit\_trace.py", line 140, in forward
    graph, _out = torch._C._create_graph_by_tracing(...)
  File "...\torch\jit\_trace.py", line 131, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "...\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1766, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "...\olive\passes\onnx\conversion.py", line 104, in patched_forward
    args[pkv_index] = DynamicCache.from_legacy_cache(args[pkv_index])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'DynamicCache' has no attribute 'from_legacy_cache'

Other information

  • OS: Windows 11
  • Olive version: 0.11.0
  • ONNXRuntime package and version: onnxruntime-genai==0.11.4
  • Transformers package version: transformers==5.1.0
  • Torch version: 2.10.0
  • Python version: 3.13

Additional context

  • The root cause is in olive/passes/onnx/conversion.py line 104, which calls DynamicCache.from_legacy_cache() — a method that was removed in transformers 5.x.
  • This likely affects all models optimized via olive auto-opt without --use_model_builder on transformers 5.x, not just Qwen2.5.
  • The olive-recipes repo currently has no CPU recipe for Qwen2.5-0.5B-Instruct — all existing recipes target GPU/NPU runtimes. Happy to contribute a CPU recipe PR.
  • Related: an earlier report of the same underlying issue (missing --use_model_builder) on older packages (transformers 4.x, onnxruntime-genai 0.5.0) manifested as a GatherBlockQuantized / uint8 OrtException at inference time rather than at conversion time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions