Skip to content

Device could not be initialized or missing initialization #41

@Zamion101

Description

@Zamion101

I'm trying to setup Aurora-HLS on Noctua2 on Node n2fpga17 with 2 xcu280_u55c_0 connected together in the following configuration. (https://pc2.github.io/fpgalink-gui/index.html?import=%20--fpgalink%3Dn00%3Aacl0%3Ach0-n00%3Aacl1%3Ach0%20--fpgalink%3Dn00%3Aacl0%3Ach1-n00%3Aacl1%3Ach1)

I followed the README and compiled the project with make aurora than build the example with

make host
make xclbin

after the 2 hours of wating for xclbin to be compiled, I tried using ./host_aurora_hls_test and got [n2fpga17:3185664] MCW rank 0 is not bound (or bound to all available processors) error/warning as well as [n2fpga17:3185664:0:3185664] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)).

I added following debug prints to host/host_aurora_hls_test.cpp and tried again

uint32_t device_id = emulation ? 0 : (((node_rank / 2) + config.device_id_offset) % 3);
    printf ("device_id: %u %u \n", device_id, config.device_id_offset);

    uint32_t instance = node_rank % 2;
    printf ("instance: %u \n", instance);

    xrt::device device = xrt::device(device_id);
    std::cout << "device name:     " << device.get_info<xrt::info::device::name>() << "\n";
    std::cout << "device bdf:      " << device.get_info<xrt::info::device::bdf>() << "\n";

which gave me following output and the place where the problem occurs.

emulation: 0 
device_id: 0 0 
instance: 0 
[n2fpga17:3185664:0:3185664] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:3185664) ====
 0 0x0000000000012cf0 __funlockfile()  :0
 1 0x00000000000ccc6f xrt_core::system_linux::get_userpf_device()  ???:0
 2 0x00000000000d1187 xrt_core::get_userpf_device()  ???:0
 3 0x00000000001186cb xrt::device::device()  ???:0
 4 0x00000000004046b8 main()  /scratch/hpc-prf-gripv/Aurora-HLS/host/host_aurora_hls_test.cpp:102
 5 0x000000000003ad85 __libc_start_main()  ???:0
 6 0x000000000040415e _start()  ???:0
=================================
[n2fpga17:3185664] *** Process received signal ***
[n2fpga17:3185664] Signal: Segmentation fault (11)
[n2fpga17:3185664] Signal code:  (-6)
[n2fpga17:3185664] Failing at address: 0x12d6f00309c00
[n2fpga17:3185664] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x1553fbcd6cf0]
[n2fpga17:3185664] [ 1] /opt/software/FPGA/Xilinx/xrt/xrt_2.15/lib/libxrt_core.so.2(_ZNK8xrt_core12system_linux17get_userpf_deviceEj+0x1f)[0x1553e1719c6f]
[n2fpga17:3185664] [ 2] /opt/software/FPGA/Xilinx/xrt/xrt_2.15/lib/libxrt_coreutil.so.2(_ZN8xrt_core17get_userpf_deviceEj+0x67)[0x1553fc564187]
[n2fpga17:3185664] [ 3] /opt/software/FPGA/Xilinx/xrt/xrt_2.15/lib/libxrt_coreutil.so.2(_ZN3xrt6deviceC2Ej+0x5b)[0x1553fc5ab6cb]
[n2fpga17:3185664] [ 4] ./host_aurora_hls_test[0x4046b8]
[n2fpga17:3185664] [ 5] /lib64/libc.so.6(__libc_start_main+0xe5)[0x1553fb939d85]
[n2fpga17:3185664] [ 6] ./host_aurora_hls_test[0x40415e]
[n2fpga17:3185664] *** End of error message ***
Segmentation fault (core dumped)

I know the cards are present and working as shown in the screenshot below:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions