Skip to content

fix scripts, update .gitignore#1

Open
WhXcjm wants to merge 3 commits intoMercuryB1:mainfrom
WhXcjm:main
Open

fix scripts, update .gitignore#1
WhXcjm wants to merge 3 commits intoMercuryB1:mainfrom
WhXcjm:main

Conversation

@WhXcjm
Copy link
Copy Markdown

@WhXcjm WhXcjm commented Mar 30, 2026

No description provided.

WhXcjm added 3 commits March 22, 2026 21:54
# Qwen3-VL 适配问题说明与补丁修复记录

## 1. 问题现象
在运行 JSQ v1 压缩 Qwen3-VL-4B-Instruct 流程时,模型在第 1 个 block 完成后进入下一次 block 前向,触发如下错误:

- `RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 3`
- 报错栈最终落在 Qwen3-VL 的 RoPE 计算:`apply_rotary_pos_emb`。

## 2. 根因定位
核心问题出在压缩阶段的 block 输出处理逻辑,位于:

- `JSQ4LMM/mllm-jsq/jsq/compression/collector.py`

原逻辑在多个位置写死为:

- `out = block(...)[0]`

该写法默认 **所有 decoder layer 都返回 tuple/list**。但 Qwen3-VL 的 decoder layer 返回值是 `Tensor`(hidden_states 本体),不是 tuple。

这会导致:

1. 对 Tensor 执行 `[0]` 时,错误地切掉 batch 维;
2. 传递到下一层时,hidden_states 的形状与 position embeddings(cos/sin)不再对齐;
3. 在注意力 RoPE 处暴露为维度不匹配(即 32 vs 128)。

## 3. 补丁内容
### 3.1 新增兼容提取函数
在 `collector.py` 新增 `_extract_hidden_states(layer_out)`:

- 若 `layer_out` 为 tuple/list,返回第 1 个元素;
- 若 `layer_out` 为 Tensor,直接返回本体。

### 3.2 替换所有固定 `[0]` 取值
将以下路径中的 block 输出解析统一改为 `_extract_hidden_states(...)`:

- `collect_block_input_feat_and_output`(text / multimodal 分支)
- `run_block`(text / multimodal 分支)

即把所有 `block(...)[0]` 改为兼容逻辑,避免对 Tensor 返回类型误索引。

## 4. 影响范围评估
该修复是**向后兼容**的:

- 对返回 tuple/list 的模型行为不变;
- 对返回 Tensor 的模型(如 Qwen3-VL)修复维度错误;
- 仅影响 block 输出解包,不改变剪枝、平滑、裁剪、量化算法本身。

## 5. 与其他日志项的关系
日志中的:

- ``torch_dtype is deprecated! Use dtype instead!``

属于 transformers 的接口弃用提示,不是本次崩溃根因。可后续单独清理(将 `torch_dtype=` 改为 `dtype=`)。

## 6. 建议回归验证
建议按以下顺序验证:

1. 使用小规模校准样本(例如 `nsamples=8`)跑通 `jsq_v1`,确认不再出现 RoPE 维度错误;
2. 完整跑一次 `jsq_v1` 任务列表,确认压缩流程可跨 block 持续执行;
3. 对比 baseline 与 jsq_v1 的评测输出,确认结果可复现且无异常回退。

## 7. 本次结论
本次不适配并非 Qwen3-VL 校准数据构造错误,而是压缩 collector 对 decoder layer 返回类型假设过强(强制 `[0]`)引起。补丁已将该逻辑改为 tuple/tensor 双兼容,能够修复当前报错路径。
Copilot AI review requested due to automatic review settings March 30, 2026 11:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds orchestration and helper scripts for running Qwen3-VL-4B compression/evaluation runs and improves logging so runs can be captured to files; it also updates ignore rules for generated artifacts.

Changes:

  • Add bash scripts to run baseline/JSQ v1/JSQ v2 flows and (intended) log summarization.
  • Add log file support to the CLI and ensure lmms-eval results are written via loguru.
  • Make the collector more tolerant of varying decoder-layer return formats; ignore submissions/ outputs.

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
submissions/textvqa_submission_2026-03-22-20-27-07.json Adds a TextVQA submission artifact (should likely not be committed).
scripts/summarize_qwen3_logs.py New log summarization utility (Markdown/CSV).
scripts/run_qwen3vl4b_all.sh New end-to-end runner for baseline → JSQ v1 → JSQ v2.
scripts/eval_qwen3vl4b.sh New eval-only wrapper script for Qwen3-VL-4B.
scripts/compress_qwen3vl4b_jsq_v1.sh New compression+eval script for JSQ v1.
scripts/compress_qwen3vl4b_jsq_v2.sh New compression+eval script for JSQ v2.
main.py Adds --log_dir and file logging setup for runs.
jsq/eval/lmms_eval.py Attaches loguru file sink after lmms-eval logger reset; logs tables via logger.
jsq/config.py Adds log_dir to CompressConfig.
jsq/compression/collector.py Attempts to normalize block outputs by extracting hidden states.
.gitignore Ignores submissions/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

logger.info(f"日志文件路径: {file_path}")


def parse_args() -> CompressConfig:
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_args is annotated as returning CompressConfig, but it actually returns a tuple (eval_only, CompressConfig). Update the return type annotation (and ideally the docstring) to reflect the real return type to avoid type-checking and IDE confusion.

Suggested change
def parse_args() -> CompressConfig:
def parse_args() -> tuple[bool, CompressConfig]:
"""Parse CLI arguments and return (eval_only flag, compression config)."""

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +52
def _extract_hidden_states(layer_out):
"""兼容不同 decoder layer 返回格式,统一提取 hidden_states。"""
if isinstance(layer_out, (tuple, list)):
return layer_out[0]
return layer_out
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_hidden_states only unwraps tuple/list outputs. Many HuggingFace decoder layers return a ModelOutput (e.g., BaseModelOutputWithPast) which is not a tuple/list; this helper would return the whole object instead of a tensor, breaking downstream code that expects a Tensor (e.g., torch.cat, subsequent block inputs). Consider also handling ModelOutput/mapping-like outputs (or falling back to layer_out[0] when indexing is supported) to reliably extract the hidden states tensor.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +9
# export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}
export CUDA_VISIBLE_DEVICES=2
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script hard-codes CUDA_VISIBLE_DEVICES=2, overriding any value provided by the caller/environment. This makes the script unexpectedly grab a specific GPU; prefer respecting an existing setting (e.g., CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-...}) or making the GPU selection an argument.

Suggested change
# export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}
export CUDA_VISIBLE_DEVICES=2
export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +15
# Compress Qwen3-VL-4B-Instruct with JSQ v2 (W8A8, sparsity=0.4375) and run 5 eval tasks.
# Usage: bash scripts/compress_qwen3vl4b_jsq_v2.sh [model] [save_dir] [tasks] [log_dir]

# 仅在外部未指定时使用默认卡,避免覆盖总控或命令行传入的设置。
export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0}
echo "[GPU] 当前 CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}"

set -e

MODEL=${1:-"Qwen/Qwen3-VL-4B-Instruct"}
SAVE_DIR=${2:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/outputs/qwen3vl-4b-jsq-v2-w8a8-sp04375"}
TASKS=${3:-"gqa,mme,textvqa_val,mmstar,mmmu_val"}
LOG_DIR=${4:-"${SAVE_DIR}/logs"}
DATA_DIR=${5:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/storage/datasets"}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage line documents 4 arguments, but the script also reads a 5th positional argument (DATA_DIR=${5:-...}). Update the usage comment (and/or argument handling) so users know they can pass data_dir as the 5th parameter.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,125 @@
d#!/usr/bin/env python3
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script’s shebang line has an extra leading character (d#!/usr/bin/env python3), which will break direct execution on Unix-like systems. Remove the stray d so the first line starts with #!/usr/bin/env python3.

Suggested change
d#!/usr/bin/env python3
#!/usr/bin/env python3

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +54
# # 第 4 步:自动汇总
# echo "[总控] 开始自动汇总"
# python scripts/summarize_qwen3_logs.py \
# --run "baseline:${BASELINE_LOG_DIR}" \
# --run "jsq_v1:${V1_LOG_DIR}" \
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says this script will "自动汇总结果", but the summarization step is currently commented out below. Either re-enable the summarize step or update the header/commentary so the script’s documented behavior matches what it actually does.

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +15
# Compress Qwen3-VL-4B-Instruct with JSQ v1 (W8A8, sparsity=0.4375) and run 5 eval tasks.
# Usage: bash scripts/compress_qwen3vl4b_jsq_v1.sh [model] [save_dir] [tasks] [log_dir]

# 仅在外部未指定时使用默认卡,避免覆盖总控或命令行传入的设置。
export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0}
echo "[GPU] 当前 CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}"

set -e

MODEL=${1:-"Qwen/Qwen3-VL-4B-Instruct"}
SAVE_DIR=${2:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/outputs/qwen3vl-4b-jsq-v1-w8a8-sp04375"}
TASKS=${3:-"gqa,mme,textvqa_val,mmstar,mmmu_val"}
LOG_DIR=${4:-"${SAVE_DIR}/logs"}
DATA_DIR=${5:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/storage/datasets"}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage line documents 4 arguments, but the script also reads a 5th positional argument (DATA_DIR=${5:-...}). Update the usage comment (and/or argument handling) so users know they can pass data_dir as the 5th parameter.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants