Conversation
# Qwen3-VL 适配问题说明与补丁修复记录 ## 1. 问题现象 在运行 JSQ v1 压缩 Qwen3-VL-4B-Instruct 流程时,模型在第 1 个 block 完成后进入下一次 block 前向,触发如下错误: - `RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 3` - 报错栈最终落在 Qwen3-VL 的 RoPE 计算:`apply_rotary_pos_emb`。 ## 2. 根因定位 核心问题出在压缩阶段的 block 输出处理逻辑,位于: - `JSQ4LMM/mllm-jsq/jsq/compression/collector.py` 原逻辑在多个位置写死为: - `out = block(...)[0]` 该写法默认 **所有 decoder layer 都返回 tuple/list**。但 Qwen3-VL 的 decoder layer 返回值是 `Tensor`(hidden_states 本体),不是 tuple。 这会导致: 1. 对 Tensor 执行 `[0]` 时,错误地切掉 batch 维; 2. 传递到下一层时,hidden_states 的形状与 position embeddings(cos/sin)不再对齐; 3. 在注意力 RoPE 处暴露为维度不匹配(即 32 vs 128)。 ## 3. 补丁内容 ### 3.1 新增兼容提取函数 在 `collector.py` 新增 `_extract_hidden_states(layer_out)`: - 若 `layer_out` 为 tuple/list,返回第 1 个元素; - 若 `layer_out` 为 Tensor,直接返回本体。 ### 3.2 替换所有固定 `[0]` 取值 将以下路径中的 block 输出解析统一改为 `_extract_hidden_states(...)`: - `collect_block_input_feat_and_output`(text / multimodal 分支) - `run_block`(text / multimodal 分支) 即把所有 `block(...)[0]` 改为兼容逻辑,避免对 Tensor 返回类型误索引。 ## 4. 影响范围评估 该修复是**向后兼容**的: - 对返回 tuple/list 的模型行为不变; - 对返回 Tensor 的模型(如 Qwen3-VL)修复维度错误; - 仅影响 block 输出解包,不改变剪枝、平滑、裁剪、量化算法本身。 ## 5. 与其他日志项的关系 日志中的: - ``torch_dtype is deprecated! Use dtype instead!`` 属于 transformers 的接口弃用提示,不是本次崩溃根因。可后续单独清理(将 `torch_dtype=` 改为 `dtype=`)。 ## 6. 建议回归验证 建议按以下顺序验证: 1. 使用小规模校准样本(例如 `nsamples=8`)跑通 `jsq_v1`,确认不再出现 RoPE 维度错误; 2. 完整跑一次 `jsq_v1` 任务列表,确认压缩流程可跨 block 持续执行; 3. 对比 baseline 与 jsq_v1 的评测输出,确认结果可复现且无异常回退。 ## 7. 本次结论 本次不适配并非 Qwen3-VL 校准数据构造错误,而是压缩 collector 对 decoder layer 返回类型假设过强(强制 `[0]`)引起。补丁已将该逻辑改为 tuple/tensor 双兼容,能够修复当前报错路径。
There was a problem hiding this comment.
Pull request overview
This PR adds orchestration and helper scripts for running Qwen3-VL-4B compression/evaluation runs and improves logging so runs can be captured to files; it also updates ignore rules for generated artifacts.
Changes:
- Add bash scripts to run baseline/JSQ v1/JSQ v2 flows and (intended) log summarization.
- Add log file support to the CLI and ensure lmms-eval results are written via loguru.
- Make the collector more tolerant of varying decoder-layer return formats; ignore
submissions/outputs.
Reviewed changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| submissions/textvqa_submission_2026-03-22-20-27-07.json | Adds a TextVQA submission artifact (should likely not be committed). |
| scripts/summarize_qwen3_logs.py | New log summarization utility (Markdown/CSV). |
| scripts/run_qwen3vl4b_all.sh | New end-to-end runner for baseline → JSQ v1 → JSQ v2. |
| scripts/eval_qwen3vl4b.sh | New eval-only wrapper script for Qwen3-VL-4B. |
| scripts/compress_qwen3vl4b_jsq_v1.sh | New compression+eval script for JSQ v1. |
| scripts/compress_qwen3vl4b_jsq_v2.sh | New compression+eval script for JSQ v2. |
| main.py | Adds --log_dir and file logging setup for runs. |
| jsq/eval/lmms_eval.py | Attaches loguru file sink after lmms-eval logger reset; logs tables via logger. |
| jsq/config.py | Adds log_dir to CompressConfig. |
| jsq/compression/collector.py | Attempts to normalize block outputs by extracting hidden states. |
| .gitignore | Ignores submissions/ directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.info(f"日志文件路径: {file_path}") | ||
|
|
||
|
|
||
| def parse_args() -> CompressConfig: |
There was a problem hiding this comment.
parse_args is annotated as returning CompressConfig, but it actually returns a tuple (eval_only, CompressConfig). Update the return type annotation (and ideally the docstring) to reflect the real return type to avoid type-checking and IDE confusion.
| def parse_args() -> CompressConfig: | |
| def parse_args() -> tuple[bool, CompressConfig]: | |
| """Parse CLI arguments and return (eval_only flag, compression config).""" |
| def _extract_hidden_states(layer_out): | ||
| """兼容不同 decoder layer 返回格式,统一提取 hidden_states。""" | ||
| if isinstance(layer_out, (tuple, list)): | ||
| return layer_out[0] | ||
| return layer_out |
There was a problem hiding this comment.
_extract_hidden_states only unwraps tuple/list outputs. Many HuggingFace decoder layers return a ModelOutput (e.g., BaseModelOutputWithPast) which is not a tuple/list; this helper would return the whole object instead of a tensor, breaking downstream code that expects a Tensor (e.g., torch.cat, subsequent block inputs). Consider also handling ModelOutput/mapping-like outputs (or falling back to layer_out[0] when indexing is supported) to reliably extract the hidden states tensor.
| # export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2} | ||
| export CUDA_VISIBLE_DEVICES=2 |
There was a problem hiding this comment.
This script hard-codes CUDA_VISIBLE_DEVICES=2, overriding any value provided by the caller/environment. This makes the script unexpectedly grab a specific GPU; prefer respecting an existing setting (e.g., CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-...}) or making the GPU selection an argument.
| # export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2} | |
| export CUDA_VISIBLE_DEVICES=2 | |
| export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2} |
| # Compress Qwen3-VL-4B-Instruct with JSQ v2 (W8A8, sparsity=0.4375) and run 5 eval tasks. | ||
| # Usage: bash scripts/compress_qwen3vl4b_jsq_v2.sh [model] [save_dir] [tasks] [log_dir] | ||
|
|
||
| # 仅在外部未指定时使用默认卡,避免覆盖总控或命令行传入的设置。 | ||
| export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0} | ||
| echo "[GPU] 当前 CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}" | ||
|
|
||
| set -e | ||
|
|
||
| MODEL=${1:-"Qwen/Qwen3-VL-4B-Instruct"} | ||
| SAVE_DIR=${2:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/outputs/qwen3vl-4b-jsq-v2-w8a8-sp04375"} | ||
| TASKS=${3:-"gqa,mme,textvqa_val,mmstar,mmmu_val"} | ||
| LOG_DIR=${4:-"${SAVE_DIR}/logs"} | ||
| DATA_DIR=${5:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/storage/datasets"} |
There was a problem hiding this comment.
The usage line documents 4 arguments, but the script also reads a 5th positional argument (DATA_DIR=${5:-...}). Update the usage comment (and/or argument handling) so users know they can pass data_dir as the 5th parameter.
| @@ -0,0 +1,125 @@ | |||
| d#!/usr/bin/env python3 | |||
There was a problem hiding this comment.
The script’s shebang line has an extra leading character (d#!/usr/bin/env python3), which will break direct execution on Unix-like systems. Remove the stray d so the first line starts with #!/usr/bin/env python3.
| d#!/usr/bin/env python3 | |
| #!/usr/bin/env python3 |
| # # 第 4 步:自动汇总 | ||
| # echo "[总控] 开始自动汇总" | ||
| # python scripts/summarize_qwen3_logs.py \ | ||
| # --run "baseline:${BASELINE_LOG_DIR}" \ | ||
| # --run "jsq_v1:${V1_LOG_DIR}" \ |
There was a problem hiding this comment.
The header comment says this script will "自动汇总结果", but the summarization step is currently commented out below. Either re-enable the summarize step or update the header/commentary so the script’s documented behavior matches what it actually does.
| # Compress Qwen3-VL-4B-Instruct with JSQ v1 (W8A8, sparsity=0.4375) and run 5 eval tasks. | ||
| # Usage: bash scripts/compress_qwen3vl4b_jsq_v1.sh [model] [save_dir] [tasks] [log_dir] | ||
|
|
||
| # 仅在外部未指定时使用默认卡,避免覆盖总控或命令行传入的设置。 | ||
| export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0} | ||
| echo "[GPU] 当前 CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}" | ||
|
|
||
| set -e | ||
|
|
||
| MODEL=${1:-"Qwen/Qwen3-VL-4B-Instruct"} | ||
| SAVE_DIR=${2:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/outputs/qwen3vl-4b-jsq-v1-w8a8-sp04375"} | ||
| TASKS=${3:-"gqa,mme,textvqa_val,mmstar,mmmu_val"} | ||
| LOG_DIR=${4:-"${SAVE_DIR}/logs"} | ||
| DATA_DIR=${5:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/storage/datasets"} |
There was a problem hiding this comment.
The usage line documents 4 arguments, but the script also reads a 5th positional argument (DATA_DIR=${5:-...}). Update the usage comment (and/or argument handling) so users know they can pass data_dir as the 5th parameter.
No description provided.