fix scripts, update .gitignore by WhXcjm · Pull Request #1 · MercuryB1/MLLM-JSQ

WhXcjm · 2026-03-30T11:42:29Z

No description provided.

# Qwen3-VL 适配问题说明与补丁修复记录 ## 1. 问题现象在运行 JSQ v1 压缩 Qwen3-VL-4B-Instruct 流程时，模型在第 1 个 block 完成后进入下一次 block 前向，触发如下错误： - `RuntimeError: The size of tensor a (32) must match the size of tensor b (128) at non-singleton dimension 3` - 报错栈最终落在 Qwen3-VL 的 RoPE 计算：`apply_rotary_pos_emb`。 ## 2. 根因定位核心问题出在压缩阶段的 block 输出处理逻辑，位于： - `JSQ4LMM/mllm-jsq/jsq/compression/collector.py` 原逻辑在多个位置写死为： - `out = block(...)[0]` 该写法默认 **所有 decoder layer 都返回 tuple/list**。但 Qwen3-VL 的 decoder layer 返回值是 `Tensor`（hidden_states 本体），不是 tuple。这会导致： 1. 对 Tensor 执行 `[0]` 时，错误地切掉 batch 维； 2. 传递到下一层时，hidden_states 的形状与 position embeddings（cos/sin）不再对齐； 3. 在注意力 RoPE 处暴露为维度不匹配（即 32 vs 128）。 ## 3. 补丁内容 ### 3.1 新增兼容提取函数在 `collector.py` 新增 `_extract_hidden_states(layer_out)`： - 若 `layer_out` 为 tuple/list，返回第 1 个元素； - 若 `layer_out` 为 Tensor，直接返回本体。 ### 3.2 替换所有固定 `[0]` 取值将以下路径中的 block 输出解析统一改为 `_extract_hidden_states(...)`： - `collect_block_input_feat_and_output`（text / multimodal 分支） - `run_block`（text / multimodal 分支）即把所有 `block(...)[0]` 改为兼容逻辑，避免对 Tensor 返回类型误索引。 ## 4. 影响范围评估该修复是**向后兼容**的： - 对返回 tuple/list 的模型行为不变； - 对返回 Tensor 的模型（如 Qwen3-VL）修复维度错误； - 仅影响 block 输出解包，不改变剪枝、平滑、裁剪、量化算法本身。 ## 5. 与其他日志项的关系日志中的： - ``torch_dtype is deprecated! Use dtype instead!`` 属于 transformers 的接口弃用提示，不是本次崩溃根因。可后续单独清理（将 `torch_dtype=` 改为 `dtype=`）。 ## 6. 建议回归验证建议按以下顺序验证： 1. 使用小规模校准样本（例如 `nsamples=8`）跑通 `jsq_v1`，确认不再出现 RoPE 维度错误； 2. 完整跑一次 `jsq_v1` 任务列表，确认压缩流程可跨 block 持续执行； 3. 对比 baseline 与 jsq_v1 的评测输出，确认结果可复现且无异常回退。 ## 7. 本次结论本次不适配并非 Qwen3-VL 校准数据构造错误，而是压缩 collector 对 decoder layer 返回类型假设过强（强制 `[0]`）引起。补丁已将该逻辑改为 tuple/tensor 双兼容，能够修复当前报错路径。

Copilot

Pull request overview

This PR adds orchestration and helper scripts for running Qwen3-VL-4B compression/evaluation runs and improves logging so runs can be captured to files; it also updates ignore rules for generated artifacts.

Changes:

Add bash scripts to run baseline/JSQ v1/JSQ v2 flows and (intended) log summarization.
Add log file support to the CLI and ensure lmms-eval results are written via loguru.
Make the collector more tolerant of varying decoder-layer return formats; ignore submissions/ outputs.

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
submissions/textvqa_submission_2026-03-22-20-27-07.json	Adds a TextVQA submission artifact (should likely not be committed).
scripts/summarize_qwen3_logs.py	New log summarization utility (Markdown/CSV).
scripts/run_qwen3vl4b_all.sh	New end-to-end runner for baseline → JSQ v1 → JSQ v2.
scripts/eval_qwen3vl4b.sh	New eval-only wrapper script for Qwen3-VL-4B.
scripts/compress_qwen3vl4b_jsq_v1.sh	New compression+eval script for JSQ v1.
scripts/compress_qwen3vl4b_jsq_v2.sh	New compression+eval script for JSQ v2.
main.py	Adds `--log_dir` and file logging setup for runs.
jsq/eval/lmms_eval.py	Attaches loguru file sink after lmms-eval logger reset; logs tables via logger.
jsq/config.py	Adds `log_dir` to `CompressConfig`.
jsq/compression/collector.py	Attempts to normalize block outputs by extracting hidden states.
.gitignore	Ignores `submissions/` directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T11:47:38Z

main.py

+    logger.info(f"日志文件路径: {file_path}")
+
+
 def parse_args() -> CompressConfig:


parse_args is annotated as returning CompressConfig, but it actually returns a tuple (eval_only, CompressConfig). Update the return type annotation (and ideally the docstring) to reflect the real return type to avoid type-checking and IDE confusion.

Suggested change

def parse_args() -> CompressConfig:

def parse_args() -> tuple[bool, CompressConfig]:

"""Parse CLI arguments and return (eval_only flag, compression config)."""

Copilot · 2026-03-30T11:47:38Z

jsq/compression/collector.py

+def _extract_hidden_states(layer_out):
+    """兼容不同 decoder layer 返回格式，统一提取 hidden_states。"""
+    if isinstance(layer_out, (tuple, list)):
+        return layer_out[0]
+    return layer_out


_extract_hidden_states only unwraps tuple/list outputs. Many HuggingFace decoder layers return a ModelOutput (e.g., BaseModelOutputWithPast) which is not a tuple/list; this helper would return the whole object instead of a tensor, breaking downstream code that expects a Tensor (e.g., torch.cat, subsequent block inputs). Consider also handling ModelOutput/mapping-like outputs (or falling back to layer_out[0] when indexing is supported) to reliably extract the hidden states tensor.

Copilot · 2026-03-30T11:47:39Z

scripts/run_qwen3vl4b_all.sh

+# export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}
+export CUDA_VISIBLE_DEVICES=2


This script hard-codes CUDA_VISIBLE_DEVICES=2, overriding any value provided by the caller/environment. This makes the script unexpectedly grab a specific GPU; prefer respecting an existing setting (e.g., CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-...}) or making the GPU selection an argument.

Suggested change

# export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}

export CUDA_VISIBLE_DEVICES=2

export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}

Copilot · 2026-03-30T11:47:39Z

scripts/compress_qwen3vl4b_jsq_v2.sh

+# Compress Qwen3-VL-4B-Instruct with JSQ v2 (W8A8, sparsity=0.4375) and run 5 eval tasks.
+# Usage: bash scripts/compress_qwen3vl4b_jsq_v2.sh [model] [save_dir] [tasks] [log_dir]
+
+# 仅在外部未指定时使用默认卡，避免覆盖总控或命令行传入的设置。
+export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0}
+echo "[GPU] 当前 CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}"
+
+set -e
+
+MODEL=${1:-"Qwen/Qwen3-VL-4B-Instruct"}
+SAVE_DIR=${2:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/outputs/qwen3vl-4b-jsq-v2-w8a8-sp04375"}
+TASKS=${3:-"gqa,mme,textvqa_val,mmstar,mmmu_val"}
+LOG_DIR=${4:-"${SAVE_DIR}/logs"}
+DATA_DIR=${5:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/storage/datasets"}


The usage line documents 4 arguments, but the script also reads a 5th positional argument (DATA_DIR=${5:-...}). Update the usage comment (and/or argument handling) so users know they can pass data_dir as the 5th parameter.

Copilot · 2026-03-30T11:47:39Z

scripts/summarize_qwen3_logs.py

@@ -0,0 +1,125 @@
+d#!/usr/bin/env python3


The script’s shebang line has an extra leading character (d#!/usr/bin/env python3), which will break direct execution on Unix-like systems. Remove the stray d so the first line starts with #!/usr/bin/env python3.

Suggested change

d#!/usr/bin/env python3

#!/usr/bin/env python3

Copilot · 2026-03-30T11:47:39Z

scripts/run_qwen3vl4b_all.sh

+# # 第 4 步：自动汇总
+# echo "[总控] 开始自动汇总"
+# python scripts/summarize_qwen3_logs.py \
+#   --run "baseline:${BASELINE_LOG_DIR}" \
+#   --run "jsq_v1:${V1_LOG_DIR}" \


The header comment says this script will "自动汇总结果", but the summarization step is currently commented out below. Either re-enable the summarize step or update the header/commentary so the script’s documented behavior matches what it actually does.

Copilot · 2026-03-30T11:47:40Z

scripts/compress_qwen3vl4b_jsq_v1.sh

+# Compress Qwen3-VL-4B-Instruct with JSQ v1 (W8A8, sparsity=0.4375) and run 5 eval tasks.
+# Usage: bash scripts/compress_qwen3vl4b_jsq_v1.sh [model] [save_dir] [tasks] [log_dir]
+
+# 仅在外部未指定时使用默认卡，避免覆盖总控或命令行传入的设置。
+export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0}
+echo "[GPU] 当前 CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}"
+
+set -e
+
+MODEL=${1:-"Qwen/Qwen3-VL-4B-Instruct"}
+SAVE_DIR=${2:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/outputs/qwen3vl-4b-jsq-v1-w8a8-sp04375"}
+TASKS=${3:-"gqa,mme,textvqa_val,mmstar,mmmu_val"}
+LOG_DIR=${4:-"${SAVE_DIR}/logs"}
+DATA_DIR=${5:-"/mnt/disk3/wxj/JSQ4LMM/mllm-jsq/storage/datasets"}


The usage line documents 4 arguments, but the script also reads a 5th positional argument (DATA_DIR=${5:-...}). Update the usage comment (and/or argument handling) so users know they can pass data_dir as the 5th parameter.

WhXcjm added 3 commits March 22, 2026 21:54

fix logger

fe3cc0b

fix scripts, update .gitignore

a6214b8

Copilot AI review requested due to automatic review settings March 30, 2026 11:42

Copilot started reviewing on behalf of WhXcjm March 30, 2026 11:43 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix scripts, update .gitignore#1

fix scripts, update .gitignore#1
WhXcjm wants to merge 3 commits intoMercuryB1:mainfrom
WhXcjm:main

WhXcjm commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		logger.info(f"日志文件路径: {file_path}")


		def parse_args() -> CompressConfig:

	def parse_args() -> CompressConfig:
	def parse_args() -> tuple[bool, CompressConfig]:
	"""Parse CLI arguments and return (eval_only flag, compression config)."""

		# export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}
		export CUDA_VISIBLE_DEVICES=2

	# export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}
	export CUDA_VISIBLE_DEVICES=2
	export CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-2}

Conversation

WhXcjm commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants