Skip to content

create-atl-delete/intel-arc-llama

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Llama.cpp for Intel Arc GPUs (SYCL): Docker or Bare-Metal

Support for running LLMs on Intel Arc GPUs (including the latest B50/B60 Battlemage series) using Llama.cpp and Intel's SYCL backend. This repository provides both a fully containerized Docker setup and an automated Bare-Metal installation script.

🛑 The Problem This Solves

Getting Llama.cpp to run on Intel Arc GPUs is a nightmare. If you've spent days trying to figure this out, this repository is for you. Here is why this is needed:

  • Deprecated Official Tools: Intel's official ipex-llm project has been abandoned, and many community guides rely on it. Intel is pivoting towards Vulkan, but Vulkan's performance on LLMs is currently subpar compared to SYCL.
  • Ollama's Intel Support: While standard Ollama now officially supports Intel Arc GPUs out of the box, it currently relies entirely on the currently subpar Vulkan backend. This project gives you the SYCL performance without the setup headache.
  • Driver Support Lacking: Getting Intel Arc drivers working — especially for the newer B50/B60 GPUs—on older LTS Linux distributions (like Ubuntu 24.04) requires jumping through some hoops.
  • OneAPI Bloat: Official guides recommend Intel's OneAPI Base Toolkit, which is roughly 25GB, and unnecessary if you just want to compile and run Llama.cpp.

💡 How the Docker Container Works

This project simplifies the stack into a clean Docker container:

  1. Ubuntu 25.10 Base Image: We use Ubuntu 25.10 because it ships with the latest Intel Arc drivers out-of-the-box. This eliminates the need to install kernel and user-space drivers for newer GPUs.
  2. Minimal OneAPI Toolkit: Instead of installing the bloated 25GB toolkit, this Dockerfile installs intel-cpp-essentials. This provides exactly what is needed (the icx/icpx compilers) to build Llama.cpp with SYCL support, without the extra baggage.
  3. Intel Compute Runtimes (Level Zero / OpenCL): SYCL acts as a high-level programming model, but it relies on lower-level APIs to talk to the hardware. We install libze1 (Level Zero) and intel-opencl-icd (OpenCL) to facilitate this connection.
  4. Device Passthrough (/dev/dri): In Docker, mapping /dev/dri (Direct Rendering Infrastructure) into the container allows it to utilize the host machine's GPU hardware.

🚀 Getting Started

Prerequisites & Host Setup

To use this container, your host machine must be able to recognize the GPU and expose it via /dev/dri.

If you are using Ubuntu 25.10 or newer: You are largely good to go. The native kernel supports the Intel Arc B50/B60 xe driver out of the box. Just ensure your user has permission to access the GPU:

sudo usermod -aG render $USER
sudo usermod -aG video $USER

If you are using Ubuntu 24.04 LTS: Ubuntu 24.04 ships with an older kernel by default that does not recognize Battlemage GPUs. You must install the Hardware Enablement (HWE) kernel to get a modern kernel capable of exposing the GPU:

sudo apt update
sudo apt install --install-recommends linux-generic-hwe-24.04
sudo usermod -aG render $USER
sudo usermod -aG video $USER

Reboot your machine after installing. Verify the host sees the GPU by checking that /dev/dri/renderD128 (or similar) exists.

Quick Start

  1. Clone the repository:

    git clone https://github.com/create-atl-delete/llama-sycl.git
    cd llama-sycl
  2. Add a Model: Create a models directory and download a GGUF format model into it.

    mkdir models
    # Download your preferred model into the models/ folder
  3. Configure your Model (.env): Copy the example environment file and edit it to match your downloaded model's filename.

    cp .env.example .env
    nano .env

    You can also adjust CTX_SIZE and GPU_LAYERS in this file to control VRAM usage.

  4. Build and Run: Start the container in the background. Docker will handle downloading the base image, installing the minimal OneAPI components, and compiling Llama.cpp from source.

    docker compose up -d --build
  5. Access the Llama.cpp Server: Once the container is running, the Llama.cpp API and Web UI will be exposed on your host machine at: http://localhost:8080

🐳 Understanding the Docker Setup

If you are new to Docker, here is a quick breakdown of what the docker-compose.yml is doing:

  • build: . : Tells Docker to build the image using the Dockerfile in the current directory.
  • ports: - "8080:8080" : Maps port 8080 from inside the container to port 8080 on your host machine.
  • devices: - /dev/dri:/dev/dri : CRITICAL. This passes your host's GPU hardware interfaces into the container so Llama.cpp can use them.
  • volumes: - ./models:/models : Mounts your local ./models folder into the container at /models. This is standard practice so you don't have to rebuild the entire container just to switch or add AI models.

Troubleshooting

  • Container crashes immediately: Check the logs using docker compose logs. Ensure the model filename in your .env exactly matches the file in your models/ directory.
  • GPU not detected: Ensure your host machine's drivers are correctly installed and that the user running Docker has permissions to access /dev/dri (often requires being in the render or video group).
    • To verify your host kernel sees the GPU, run: lspci -k | grep -A 2 -E "VGA|3D". You should see your Intel GPU listed with Kernel driver in use: xe (or i915).
    • To verify the device files exist, run: ls -l /dev/dri. You should see card0 and renderD128.

🖥️ Bare-Metal Installation (No Docker)

If you prefer not to use Docker and want to install everything directly on your host machine, a bare-metal installation script is provided.

  1. Ensure you have a model downloaded to ~/models/.
  2. Run the bare-metal setup script:
    sudo ./llama_baremetal.sh

This script will install the necessary Intel compute runtimes, compile llama.cpp from source with SYCL support, and create a systemd service (llama-sycl.service) to run the server in the background.

Releases

No releases published

Packages

 
 
 

Contributors