Skip to content

Fix/mmraw hub download fallback#83

Closed
hubert-marek wants to merge 9 commits into
mainfrom
fix/mmraw-hub-download-fallback
Closed

Fix/mmraw hub download fallback#83
hubert-marek wants to merge 9 commits into
mainfrom
fix/mmraw-hub-download-fallback

Conversation

@hubert-marek

@hubert-marek hubert-marek commented Jun 9, 2026

Copy link
Copy Markdown

Add mmraw multimodal payload mode to Qwen VL renderers and fix Hub download fallback for preprocessor_config.json

  • Introduces mm_store.py, a new multimodal feature store supporting three payload modes: raw (mmraw refs), processed (msgpack artifacts written to disk as mmfile refs), and inline (base64-encoded).
  • Qwen3-VL, Qwen3.5, and Qwen3.6 renderers now call qwen_image_item_for_render, which emits raw-ref descriptors instead of pixel tensors when in raw mode; pixels can be materialized later via new materialize_pixels / materialize_raw_refs methods.
  • Adds a cached loader for preprocessor_config.json that falls back to a HuggingFace Hub download when the file is not available locally, with an explicit offline-failure path.
  • image_cache_max defaults to 0 (disabled) for all Qwen and Kimi K2.5 renderer configs; caching now only activates when explicitly set positive.
  • Adds RendererPool.materialize_pixels as a proxy that checks out a renderer and forwards the call to the underlying implementation.
  • Behavioral Change: generate in client.py may now return mmraw or mmfile refs in kwargs_data['image'] instead of tensors depending on mm_payload_mode; callers that previously consumed pixel tensors directly will need to handle the new ref types.

Macroscope summarized 91f3040.

eligotts and others added 9 commits May 27, 2026 01:59
generate() now hands back descriptor-only multi_modal_data (image_grid_thw +
mm_hashes + mm_placeholders, no pixel_values). Pixels are re-attached only for
the engine POST via the new materialize_pixels (cache hit, else reprocess from
the message base64; grid_thw asserted), then stripped again. This keeps the env
worker from retaining decoded image tensors for the life of a rollout — resident
pixel memory is now bounded by the per-image cache instead of growing with
turns x concurrency.

Also fixes a latent bridge bug: the merge shallow-copied the mm dict but shared
the inner lists, so .extend mutated previous_multi_modal_data in place and
corrupted earlier trajectory steps' cumulative sets. Copy the lists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- generate(force_full_pixels): first attempt sends new-turn images full and
  prior descriptor-only images hash-only; cache-miss fallback materializes all.
- _build_qwen_vl_features: descriptor-aware — encode only pixel-bearing items,
  emit hash-only (None kwargs slot) for the rest, scattered back to original
  positions so kwargs_data stays aligned with mm_hashes / mm_placeholders.
- image_cache_max default 0 (processed pixels stay request-scoped) + a guard so
  the disabled path never pops an empty cache; RENDERERS_MM_MAX_INFLIGHT
  semaphore bounds concurrent payload builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single source of truth for the on-disk MM-offload contract, imported by the
verifiers env-worker (images), the renderers feature writer, and prime-rl
(both readers):
- run-scoped paths under /data/outputs/run_<RUN_ID>/assets/{images,mm_features}
  (run_id_from_env, run_dir, image_asset_dir, feature_asset_dir + subdir consts).
- mmfile format: version-pinned feature fingerprint, mm_feature_path (+ traversal
  guard), mmfile_ref emit + split_mmfile_ref parse (co-located so they can't drift).
- msgpack envelope build + match helpers.
- sweep_stale_artifacts: mtime TTL eviction over both asset dirs (content-addressed
  + re-writable, so over-eviction is safe).

Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- _build_qwen_vl_features writes processed vLLM features to mm_store and ships
  mmfile refs; import the format/paths from mm_store (no local copies).
- Collapse RENDERERS_MM_FEATURE_STORE_MODE to off/on, default on (deleted the
  never-differentiated disk-write-through/disk-read-nonstrict/disk-strict ladder;
  the latter two emitted identical refs).
- _existing_mm_feature_valid now also checks placeholder_length: vLLM validates it
  on load but the envelope match did not, so a stale wrong-length artifact would
  fail in vLLM and never self-repair (we kept skipping the rewrite). Mismatch ->
  treat as invalid -> rewrite.

Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sweep_stale_artifacts now evicts only assets/mm_features (the expensive
processed MultiModalKwargsItem payloads). assets/images are never swept:
screenshots are terminal browser output with no regeneration path, so they are
kept for the whole run as the recoverable source of truth, whereas features are
a regenerable cache (the trainer rebuilds pixels from the image and never reads
these files; the env-worker rewrites any missing feature on demand). Over-
eviction of a feature is therefore safe; over-eviction of an image is not, which
is why the sweep deliberately excludes the image subdir.

The feature writer (_write_mm_feature_artifact) now refreshes mtime on the
already-on-disk-and-valid path, so a recurring feature is treated as hot by the
last-use sweep instead of aging out on its first-write mtime and forcing an
expensive force_full_pixels reprocess. Test updated to the features-only /
keep-images contract.

Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- client.py: remove the inline_count/mmfile_count debug counters and the
  "built qwen-vl mm features ..." debug log (the mode=="off" inline path
  itself is unchanged).
- mm_store.py: fold the redundant mm_feature_run_root alias into
  feature_asset_dir (internal-only, no external importers).

Pure cleanup; no behavior change.

Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Env workers emit layout-only descriptors + mmraw refs instead of running the
HF image processor; vLLM materializes pixels from the raw image on shared disk
(hash + fingerprint + grid/placeholder validated). Avoids AutoProcessor and
pixel_values on the env worker, cutting RSS.

Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nfig.json

The raw-image (mmraw) layout path resolves the model's image geometry from
preprocessor_config.json, but _load_preprocessor_config_json only checked
local paths and the local HF cache (try_to_load_from_cache). Hosted env
workers render models they never loaded locally, so hub-style ids always
missed the cache and every image rollout failed with

  RuntimeError: Qwen raw image layout could not find
  preprocessor_config.json for 'Qwen/Qwen3.6-35B-A3B' ...

even when the file is publicly available on the Hub.

Add an hf_hub_download fallback on cache miss (a few hundred bytes, lands in
the HF cache, then memoized by the lru_cache). Offline/no-network workers
fall through to the existing RuntimeError, whose message now also mentions
Hub reachability alongside the explicit image_* config escape hatch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hubert-marek

Copy link
Copy Markdown
Author

Superseded by #82 — same branch, correct base (feat/ephemeral-mm-pixels); this one diffs the whole feat branch against main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants