Skip to content

[BUG] C ABI passes struct values incorrectly when compiling with compile() #175

@gmarkall

Description

@gmarkall

Describe the bug

As observed in NVIDIA/cccl#4248:

Consider an extern "C" device function accepting two structs as inputs:

struct MyStruct {
    char a;
    char b;
    char c;
};

extern "C" __device__ void foo(MyStruct a, MyStruct b) {

}

Here's the corresponding PTX:

.visible .func foo(
        .param .align 1 .b8 foo_param_0[3],
        .param .align 1 .b8 foo_param_1[3]
)
{

        ret;

}

Now, consider defining this device function using numba.cuda, using @gpu_struct to define the type of the arguments, and keeping in mind that gpu_struct uses StructModel to define the underlying numba type:

import numba
from numba import cuda
import numpy as np

from cuda.parallel.experimental.struct import gpu_struct

@gpu_struct
class MyStruct:
    a: np.int8
    b: np.int8
    c: np.int8

def op(a, b):
    pass

tp = numba.typeof(MyStruct(1, 2, 3))
ptx, _ = cuda.compile(op, (tp, tp))

print(ptx)

Here's the output of the above script:

.visible .func  (.param .b64 func_retval0) op(
        .param .b32 op_param_0,
        .param .b32 op_param_1,
        .param .b32 op_param_2,
        .param .b32 op_param_3,
        .param .b32 op_param_4,
        .param .b32 op_param_5
)
{
        .reg .b64       %rd<2>;


        mov.u64         %rd1, 0;
        st.param.b64    [func_retval0+0], %rd1;
        ret;

}

Comparing the two PTXs, we see an ABI difference: on the C++ side, struct arguments are of type .b8[N] where N is the size of the struct. On the numba side however, we note that structs have been decomposed into their members, and the function accepts a .b32 for each member. (reason for promotion to 32-bit).

This is because the C ABI calling convention is packing struct value arguments according to the Numba calling convention, not the C one. It just uses the default argument packer:

arginfo = self._get_arg_packer(argtys)

Steps/Code to reproduce bug

Reproducer in code above.

Expected behavior

The C ABI calling convention should pack struct values according to the C ABI calling convention, yielding the same function prototypes as CUDA C/C++ code compiled with NVCC.

This will probably require the implementation of a new arg packer for the C calling convention.

Environment details (please complete the following information):

All environments.

Additional context

No other context.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions