Skip to content

OOM on 32GB while running auto-opt Qwen/Qwen2.5-Coder-7B-Instruct #2316

@stiller-leser

Description

@stiller-leser

Hi,

I've been playing around with olive after being frustrated with the limited amounts of models available for my "AI" PC. Anyhow, I have been trying to run the following on a 32GB Lenovo Yoga Slim 7x as well as a dedicated cloud server with 32GB:

olive auto-opt \
    --model_name_or_path "$CACHE_MODEL_PATH" \
    $TRUST_FLAG \
    --output_path "$OUTPUT_PATH" \
    --device "$DEVICE" \
    --provider "$PROVIDER" \
    --use_ort_genai \
    --precision "$PRECISION" \
    --log_level "$LOG_LEVEL" \
    $EXTRA_ARGS

with the following parameters

==========================================
QNN Model Conversion Container
==========================================
Model: Qwen/Qwen2.5-Coder-7B-Instruct
Output Directory: .
Output Name: qwen-coder-7b
Precision: int4
Device: npu
Provider: QNNExecutionProvider
Cache Directory: model-cache
==========================================

But no matter what I run into an OOM error:

[ 1299.228531] Out of memory: Killed process 11552 (olive) total-vm:43667312kB, anon-rss:31484340kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:64956kB oom_score_adj:0

Is this expected? If so is there any way to reduce the memory load while keeping the model "intelligent"? Originally I had planned on converting the devstral-23b from Mistral to finally run on my Qualcomm NPU, but as it seems that will continue to be a dream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions