Skip to content

redai-infra/HiSVD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HiSVD: Hierarchical SVD Compression for Large Language Models

HiSVD is a training-free, two-stage SVD-based compression framework for LLMs. It combines rank-density-aware baseline allocation (Stage 1) with cluster-level sublayer redundancy refinement (Stage 2), followed by whitening SVD decomposition to produce the compressed model.

Directory Structure

HiSVD/
├── compress.py          # Main compression pipeline (Stage 1 + 2 + SVD)
├── calibrate.py         # Generate calibration data (whitening matrices)
├── evaluate.py          # Evaluate PPL for HuggingFace or compressed models
├── hisvd/               # Core algorithm modules
│   ├── calibration.py       # Calibration data collector
│   ├── rank_allocation.py   # Stage 1: rank-density-aware baseline
│   ├── clustering.py        # LESA layer clustering (reserved)
│   ├── cluster_sublayer_lwe.py  # LWE score computation
│   ├── refinement.py        # Stage 2: cluster-level LWE refinement
│   ├── whitening_svd.py     # Whitening SVD decomposition
│   └── shape_utils.py       # Weight shape inference (GQA support)
├── component/           # Model-specific SVD wrappers
│   ├── svd_llama.py
│   └── svd_mistral.py
├── utils/               # Shared utilities
│   ├── data_utils.py        # Dataset loading (C4, WikiText-2, PTB)
│   ├── model_utils.py       # Model loading helpers
│   └── eval_utils.py        # Perplexity evaluation
├── scripts/             # Reproduction shell scripts
│   ├── calibrate_all.sh
│   ├── compress_llama7b.sh
│   ├── compress_llama2_7b.sh
│   ├── compress_llama3_8b.sh
│   ├── compress_llama_30b.sh
│   ├── compress_mistral_7b.sh
│   ├── evaluate_all.sh
│   └── (optional) extra scripts: compress_llama13b.sh / compress_opt6.7b.sh
├── data/                # Local dataset files
│   └── c4-validation.json
└── requirements.txt

Installation

pip install -r requirements.txt

Quick Start

1. Generate calibration data

Calibration only needs to be done once per model. It collects whitening matrices and weight statistics from a small set of WikiText-2 samples.

python calibrate.py \
  --model_path /path/to/llama-7b \
  --model_name llama-7b \
  --nsamples 256 \
  --seqlen 2048

This produces calibration_cache/{name}_n256_s2048_with_hessian.pkl containing whitening matrices, Hessian matrices, and weight statistics (including "Vh").

2. Compress

python compress.py \
  --model_path /path/to/llama-7b \
  --calib_cache calibration_cache/llama-7b_n256_s2048_with_hessian.pkl \
  --target_compression 0.8 \
  --rank_method stable \
  --alpha 1.0 \
  --lwe_strength 1.0 \
  --tail_redundancy_weight 1.0 \
  --modulation logarithmic \
  --save_model compressed_models/llama7b_80p.pt

Key arguments:

Argument Description Default
--target_compression Compression ratio (e.g. 0.8 = keep 80% params)
--rank_method Stage 1 rank metric: stable, effective, mathematical stable
--alpha Rank density sensitivity 1.0
--modulation Stage 2 modulation: logarithmic, power, exponential, linear logarithmic
--lwe_strength Modulation strength (beta) 1.0

3. Evaluate

# Evaluate a compressed .pt model
python evaluate.py --model_path compressed_models/llama7b_80p.pt --datasets wikitext2,c4

# Evaluate an original HuggingFace model
python evaluate.py --model_path /path/to/llama-7b --datasets wikitext2

Reproduce Paper Results

Shell scripts under scripts/ compress each model at 40%–80% ratios. Set MODEL_DIR to your local model directory before running:

export MODEL_DIR=/path/to/models

# Generate calibration for all models (run once)
bash scripts/calibrate_all.sh

# Compress (one model per GPU)
CUDA_VISIBLE_DEVICES=0 bash scripts/compress_llama7b.sh 0 &
CUDA_VISIBLE_DEVICES=1 bash scripts/compress_llama2_7b.sh 1 &
CUDA_VISIBLE_DEVICES=2 bash scripts/compress_llama3_8b.sh 2 &
CUDA_VISIBLE_DEVICES=3 bash scripts/compress_llama_30b.sh 3 &
CUDA_VISIBLE_DEVICES=4 bash scripts/compress_mistral_7b.sh 4 &

# Evaluate all compressed .pt files (after compression finishes)
bash scripts/evaluate_all.sh 0

Supported Models

  • LLaMA-7B (see scripts/compress_llama7b.sh)
  • LLaMA-2-7B (see scripts/compress_llama2_7b.sh)
  • LLaMA-3-8B (see scripts/compress_llama3_8b.sh)
  • LLaMA-30B (see scripts/compress_llama_30b.sh)
  • Mistral-7B (see scripts/compress_mistral_7b.sh)

Method Overview

Stage 1 — Rank-Density-Aware Baseline: Computes per-sublayer baseline ranks using stable rank density, allocating more capacity to sublayers with higher information density.

Stage 2 — Cluster-Level LWE Refinement: Refines ranks using sublayer-type-level tail redundancy scores. All layers sharing the same sublayer type receive a unified modulation factor, ensuring structural consistency.

Compression (after Stage 2): Applies calibration-based whitening to remove input distribution bias, then performs truncated SVD at the allocated rank.

License

MIT

About

[ACL 2026] HiSVD: Principled Low-Rank Approximation of LLMs via Hierarchical Modeling of Information Capacity and Spectral Structure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors