[BUG] C ABI passes struct values incorrectly when compiling with `compile()`

**Describe the bug**

As observed in https://github.com/NVIDIA/cccl/issues/4248: 

Consider an extern "C" device function accepting two structs as inputs:

```c
struct MyStruct {
    char a;
    char b;
    char c;
};

extern "C" __device__ void foo(MyStruct a, MyStruct b) {

}
```

Here's the corresponding PTX:

```assembly
.visible .func foo(
        .param .align 1 .b8 foo_param_0[3],
        .param .align 1 .b8 foo_param_1[3]
)
{

        ret;

}
```

Now, consider defining this device function using numba.cuda, using `@gpu_struct` to define the type of the arguments, and keeping in mind that gpu_struct uses [StructModel](https://github.com/numba/numba/blob/c21aa9273ef4298392695b1f4613d29456b53e5c/numba/core/datamodel/old_models.py#L514) to define the underlying numba type:

```python
import numba
from numba import cuda
import numpy as np

from cuda.parallel.experimental.struct import gpu_struct

@gpu_struct
class MyStruct:
    a: np.int8
    b: np.int8
    c: np.int8

def op(a, b):
    pass

tp = numba.typeof(MyStruct(1, 2, 3))
ptx, _ = cuda.compile(op, (tp, tp))

print(ptx)
```

Here's the output of the above script:

```assembly
.visible .func  (.param .b64 func_retval0) op(
        .param .b32 op_param_0,
        .param .b32 op_param_1,
        .param .b32 op_param_2,
        .param .b32 op_param_3,
        .param .b32 op_param_4,
        .param .b32 op_param_5
)
{
        .reg .b64       %rd<2>;


        mov.u64         %rd1, 0;
        st.param.b64    [func_retval0+0], %rd1;
        ret;

}
```

Comparing the two PTXs, we see an ABI difference: on the C++ side, struct arguments are of type `.b8[N]` where `N` is the size of the struct. On the numba side however, we note that structs have been decomposed into their members, and the function accepts a `.b32` for each member. ([reason for promotion to 32-bit](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html?highlight=abi#function-declarations-and-definitions)).

This is because the C ABI calling convention is packing struct value arguments according to the Numba calling convention, not the C one. It just uses the default argument packer:

https://github.com/NVIDIA/numba-cuda/blob/fbbc0402ca14b931fd00f2484c0af34730058bfe/numba_cuda/numba/cuda/target.py#L306

**Steps/Code to reproduce bug**

Reproducer in code above.

**Expected behavior**

The C ABI calling convention should pack struct values according to the C ABI calling convention, yielding the same function prototypes as CUDA C/C++ code compiled with NVCC.

This will probably require the implementation of a new arg packer for the C calling convention.

**Environment details (please complete the following information):**

All environments.

**Additional context**

No other context.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] C ABI passes struct values incorrectly when compiling with `compile()` #175

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] C ABI passes struct values incorrectly when compiling with compile() #175

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[BUG] C ABI passes struct values incorrectly when compiling with `compile()` #175