Fix/mmraw hub download fallback#83
Closed
hubert-marek wants to merge 9 commits into
Closed
Conversation
generate() now hands back descriptor-only multi_modal_data (image_grid_thw + mm_hashes + mm_placeholders, no pixel_values). Pixels are re-attached only for the engine POST via the new materialize_pixels (cache hit, else reprocess from the message base64; grid_thw asserted), then stripped again. This keeps the env worker from retaining decoded image tensors for the life of a rollout — resident pixel memory is now bounded by the per-image cache instead of growing with turns x concurrency. Also fixes a latent bridge bug: the merge shallow-copied the mm dict but shared the inner lists, so .extend mutated previous_multi_modal_data in place and corrupted earlier trajectory steps' cumulative sets. Copy the lists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- generate(force_full_pixels): first attempt sends new-turn images full and prior descriptor-only images hash-only; cache-miss fallback materializes all. - _build_qwen_vl_features: descriptor-aware — encode only pixel-bearing items, emit hash-only (None kwargs slot) for the rest, scattered back to original positions so kwargs_data stays aligned with mm_hashes / mm_placeholders. - image_cache_max default 0 (processed pixels stay request-scoped) + a guard so the disabled path never pops an empty cache; RENDERERS_MM_MAX_INFLIGHT semaphore bounds concurrent payload builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single source of truth for the on-disk MM-offload contract, imported by the
verifiers env-worker (images), the renderers feature writer, and prime-rl
(both readers):
- run-scoped paths under /data/outputs/run_<RUN_ID>/assets/{images,mm_features}
(run_id_from_env, run_dir, image_asset_dir, feature_asset_dir + subdir consts).
- mmfile format: version-pinned feature fingerprint, mm_feature_path (+ traversal
guard), mmfile_ref emit + split_mmfile_ref parse (co-located so they can't drift).
- msgpack envelope build + match helpers.
- sweep_stale_artifacts: mtime TTL eviction over both asset dirs (content-addressed
+ re-writable, so over-eviction is safe).
Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- _build_qwen_vl_features writes processed vLLM features to mm_store and ships mmfile refs; import the format/paths from mm_store (no local copies). - Collapse RENDERERS_MM_FEATURE_STORE_MODE to off/on, default on (deleted the never-differentiated disk-write-through/disk-read-nonstrict/disk-strict ladder; the latter two emitted identical refs). - _existing_mm_feature_valid now also checks placeholder_length: vLLM validates it on load but the envelope match did not, so a stale wrong-length artifact would fail in vLLM and never self-repair (we kept skipping the rewrite). Mismatch -> treat as invalid -> rewrite. Co-Authored-By: Codex <codex@openai.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sweep_stale_artifacts now evicts only assets/mm_features (the expensive processed MultiModalKwargsItem payloads). assets/images are never swept: screenshots are terminal browser output with no regeneration path, so they are kept for the whole run as the recoverable source of truth, whereas features are a regenerable cache (the trainer rebuilds pixels from the image and never reads these files; the env-worker rewrites any missing feature on demand). Over- eviction of a feature is therefore safe; over-eviction of an image is not, which is why the sweep deliberately excludes the image subdir. The feature writer (_write_mm_feature_artifact) now refreshes mtime on the already-on-disk-and-valid path, so a recurring feature is treated as hot by the last-use sweep instead of aging out on its first-write mtime and forcing an expensive force_full_pixels reprocess. Test updated to the features-only / keep-images contract. Co-Authored-By: Codex <codex@openai.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- client.py: remove the inline_count/mmfile_count debug counters and the "built qwen-vl mm features ..." debug log (the mode=="off" inline path itself is unchanged). - mm_store.py: fold the redundant mm_feature_run_root alias into feature_asset_dir (internal-only, no external importers). Pure cleanup; no behavior change. Co-Authored-By: Codex <codex@openai.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Env workers emit layout-only descriptors + mmraw refs instead of running the HF image processor; vLLM materializes pixels from the raw image on shared disk (hash + fingerprint + grid/placeholder validated). Avoids AutoProcessor and pixel_values on the env worker, cutting RSS. Co-Authored-By: Codex <codex@openai.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nfig.json The raw-image (mmraw) layout path resolves the model's image geometry from preprocessor_config.json, but _load_preprocessor_config_json only checked local paths and the local HF cache (try_to_load_from_cache). Hosted env workers render models they never loaded locally, so hub-style ids always missed the cache and every image rollout failed with RuntimeError: Qwen raw image layout could not find preprocessor_config.json for 'Qwen/Qwen3.6-35B-A3B' ... even when the file is publicly available on the Hub. Add an hf_hub_download fallback on cache miss (a few hundred bytes, lands in the HF cache, then memoized by the lru_cache). Offline/no-network workers fall through to the existing RuntimeError, whose message now also mentions Hub reachability alongside the explicit image_* config escape hatch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author
|
Superseded by #82 — same branch, correct base (feat/ephemeral-mm-pixels); this one diffs the whole feat branch against main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add mmraw multimodal payload mode to Qwen VL renderers and fix Hub download fallback for
preprocessor_config.jsonraw(mmraw refs),processed(msgpack artifacts written to disk as mmfile refs), andinline(base64-encoded).qwen_image_item_for_render, which emits raw-ref descriptors instead of pixel tensors when in raw mode; pixels can be materialized later via newmaterialize_pixels/materialize_raw_refsmethods.preprocessor_config.jsonthat falls back to a HuggingFace Hub download when the file is not available locally, with an explicit offline-failure path.image_cache_maxdefaults to0(disabled) for all Qwen and Kimi K2.5 renderer configs; caching now only activates when explicitly set positive.RendererPool.materialize_pixelsas a proxy that checks out a renderer and forwards the call to the underlying implementation.generatein client.py may now return mmraw or mmfile refs inkwargs_data['image']instead of tensors depending onmm_payload_mode; callers that previously consumed pixel tensors directly will need to handle the new ref types.Macroscope summarized 91f3040.