Skip to content

Test CPU-only mode on Jetson Nano (no CUDA) for reduced memory usage #4

@coverblew

Description

@coverblew

Idea

Running Bonsai-8B on Jetson Nano in CPU-only mode (-ngl 0) instead of GPU could significantly reduce memory usage, similar to what we see on Raspberry Pi 4.

Expected benefits

GPU mode (current) CPU-only (proposed)
RAM used (est.) 2500 MB ~1400-1500 MB
RAM free (est.) 980 MB ~2400 MB
Speed 1.1 tok/s ~0.4-0.5 tok/s (A57 < A72)
KV Q8_0 SEGFAULT (#2) Should work
Max context 4096 (tight) 8K+ possible

Why it matters

  • 1 GB more free RAM for system stability
  • KV cache Q8_0 would work (only crashes with CUDA kernels)
  • Context could be doubled or more
  • Trade-off: ~2x slower

Questions to investigate

  1. Does the PrismML fork compile CPU-only on Jetson Nano (Ubuntu 18.04, GCC 8)?

    • GCC 8 supports most C++17 but may need -lstdc++fs
    • The NEON patch from llamita.cpp may still be needed for GCC 8
    • Need to verify which patches are CUDA-specific vs GCC 8-specific
  2. Actual memory usage vs GPU mode

  3. Actual speed on Cortex-A57 (slower than A72 on RPi)

  4. Does KV Q8_0 work in CPU mode on Jetson?

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions