Skip to content

raise thermal trip points for GPU, DDR, LMH and CPU core emerg#110

Open
elkoled wants to merge 3 commits intomasterfrom
raise-throttle-temp
Open

raise thermal trip points for GPU, DDR, LMH and CPU core emerg#110
elkoled wants to merge 3 commits intomasterfrom
raise-throttle-temp

Conversation

@elkoled
Copy link
Copy Markdown

@elkoled elkoled commented Apr 16, 2026

The thermal trip points are raised to ensure that, under normal operating conditions, these thresholds are never reached.

GPU (gpu_trip0): 95°C to 110°C
DDR/PoP (pop_trip): 95°C to 110°C
LMH DCVS (silver/gold): 95°C to 110°C
CPU (per-core emerg): 110°C to 115°C
LMH LIMITS_TEMP_DEFAULT: 75°C to 90°C

master new HW Triggers Margin to Tjmax (125°C)
95°C 110°C GPU, DDR/PoP, LMH GPU throttling, CPU throttling 15°C
110°C 115°C CPU Per-core CPU offline 10°C

Validation with modified AGNOS:

paste <(cat /sys/class/thermal/thermal_zone*/type) <(cat /sys/class/thermal/thermal_zone*/trip_point_0_temp) | grep -E "step|lmh"
gpu-virt-max-step	110000
silv-virt-max-step	120000 # unchanged, has no effect on throttling
gold-virt-max-step	120000 # unchanged, has no effect on throttling
pop-mem-step	110000
cpu0-silver-step	115000
cpu1-silver-step	115000
cpu2-silver-step	115000
cpu3-silver-step	115000
cpu0-gold-step	115000
cpu1-gold-step	115000
cpu2-gold-step	115000
cpu3-gold-step	115000
lmh-dcvs-01	110000
lmh-dcvs-00	110000

@elkoled elkoled marked this pull request as ready for review April 16, 2026 02:26
@elkoled elkoled requested a review from adeebshihadeh April 16, 2026 02:38
@adeebshihadeh
Copy link
Copy Markdown
Contributor

Did you confirm there's no throttling under this new threshold? We should also confirm there's no hysteresis, i.e. <110C never throttles.

@elkoled
Copy link
Copy Markdown
Author

elkoled commented Apr 16, 2026

I am running the oven test tonight, will post the freq/temp plots here after.

Absolute hysteresis values are unchanged:
GPU/PoP: 0°C
LMH: 30°C - 110°C active, 80°C reset

@adeebshihadeh
Copy link
Copy Markdown
Contributor

adeebshihadeh commented Apr 16, 2026

Kk, we need to make sure whatever temp that we pick, there is never any throttling at or below it.

@elkoled
Copy link
Copy Markdown
Author

elkoled commented Apr 16, 2026

gpu_temp_vs_freq

@elkoled
Copy link
Copy Markdown
Author

elkoled commented Apr 16, 2026

cpu_temp_vs_power

@elkoled
Copy link
Copy Markdown
Author

elkoled commented Apr 16, 2026

Ran mici in oven from 75°C to 120°C with openpilot onroad, compared stock kernel vs kernel raised limits.
10 Hz sysfs sampling of all CPU/GPU frequencies, temperatures, and SOM power.

GPU limit is spot on, stock throttles at 95°C, raised limits hold until 110°C.
CPU freq reports static 1,689 MHz over full temp range in both configs, using SOM power instead.
Stock kernel seems to be throttling at over 107°C which is 12°C over LMH.
Raised kernel limits increase it by about 4°C to 111°C.

Device shuts down at just below 120°C CPU/GPU temp

ToDo: Validate LMH behavior and Hysteresis using cooldown ramp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants