Skip to content

Conversation

@markshannon
Copy link
Member

@markshannon markshannon commented Jan 13, 2026

This reduces the overhead of performing boolean guards in jitted code.

On Aarch64 reduces the size of the stencil from 5 to 2 instructions.

GUARD_IS_FALSE_POP_r10:

  adrp    x8, 0x0   // _Py_FalseStruct
  ldr     x8, [x8]   // _Py_FalseStruct
  orr     x8, x8, #0x1
  cmp     x24, x8
  b.ne    jump_target

_GUARD_BIT_IS_SET_POP_4_r10:

  tbnz    w24, #0x4, next
  b    jump_target
next:
    // 0000000000000004:  R_AARCH64_JUMP26     _JIT_JUMP_TARGET

int bit = get_test_bit_for_bools();
if (bit) {
REPLACE_OP(this_instr,
test_bit_set_in_true(bit) ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this once at the context initialization and set it in the context, then fetch the corresponding op to use from the context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but decided that having simpler, stateless code was better than a saving a few cycles out of many thousands optimizing a trace.
These functions are only called 2 or 3 times per trace (on average) and are really small and fast.

int bit = get_test_bit_for_bools();
if (bit) {
REPLACE_OP(this_instr,
test_bit_set_in_true(bit) ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jan 13, 2026

Also do you have performance numbers? Even microbenchmarks are fine.

I see a pretty big difference (around 10%) in Richards.

@markshannon
Copy link
Member Author

Also do you have performance numbers? Even microbenchmarks are fine.

No. It didn't seem worth measuring as I didn't expect the impact to be above the noise.
The generated code is just better: fewer instructions, no memory access, and shorter instructions (for x86).

I see a pretty big difference (around 10%) in Richards.

I'm surprised it makes much difference.
Better or worse?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants