Skip to content

Hardcoded energy1_input in gpu_device_stub.cpp make gpu device un-discoverable #123

@Stonepia

Description

@Stonepia

When I tried to install an agama build driver, I found that the clinfo -l could get the gpu device but xpu-smi discovery can't find it.

The root cause is that the hwmon file changed.

# Only exists:
/sys/class/hwmon/hwmon3/energy2_input

# Does NOT exist (xpu-smi looks for this):
/sys/class/hwmon/hwmon3/energy1_input

So the return value is empty. I did the change like this and the issue is fixed. Not very sure this is the right fix, but put here as a reference:

cd ~/xpumanager && git diff V1.3.6 -- core/src/device/gpu/gpu_device_stub.cpp
diff --git a/core/src/device/gpu/gpu_device_stub.cpp b/core/src/device/gpu/gpu_device_stub.cpp
index 258ebab..f5e5334 100644
--- a/core/src/device/gpu/gpu_device_stub.cpp
+++ b/core/src/device/gpu/gpu_device_stub.cpp
@@ -224,8 +224,14 @@ std::shared_ptr<MeasurementData> GPUDeviceStub::loadPVCIdlePowers(std::string bd
                     std::string name = getFileValue("/sys/class/hwmon/" + std::string(pdirent->d_name) +"/name");
                     name.erase(0, name.find_first_not_of(" \n\r\t"));                                                                                               
                     name.erase(name.find_last_not_of(" \n\r\t") + 1);
-                    auto energy_path = "/sys/class/hwmon/" + std::string(pdirent->d_name) +"/energy1_input";
-                    uint64_t value = std::stoull(getFileValue(energy_path));
+                    // xe driver (kernel >= 6.8) uses energy2_input; i915 uses energy1_input
+                    std::string energy_path = "/sys/class/hwmon/" + std::string(pdirent->d_name) + "/energy1_input";
+                    if (access(energy_path.c_str(), F_OK) != 0)
+                        energy_path = "/sys/class/hwmon/" + std::string(pdirent->d_name) + "/energy2_input";
+                    std::string energy_str = getFileValue(energy_path);
+                    if (energy_str.empty())
+                        continue;
+                    uint64_t value = std::stoull(energy_str);
                     auto timestamp = Utility::getCurrentMillisecond();
                     XPUM_LOG_TRACE("[{}] path:{}, value: {}, timestamp: {}", gpu_bdf, energy_path, value, timestamp);
                     if (pvc_idle_powers.count(gpu_bdf) == 0)

Environment

Hardware	Intel Data Center GPU Max 1550 (8086:0BD5)
OS	Ubuntu 24.04
Kernel	Linux 984fee015d7d.jf.intel.com 6.18.0-rc2+prerelease3000+ #1 SMP PREEMPT_DYNAMIC Sun Oct 26 04:57:21 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux
Level Zero	1.28.0.0
xpu-smi version	1.3.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions