Skip to content

Segfault and core dump on Summit #31

@williamfgc

Description

@williamfgc

I need some guidance on how to run MACSio on Summit.
To reproduce: I was able to build successfully the MACSio binary with the following dependencies:

ldd ~/opt/macsio/macsio 
	linux-vdso64.so.1 =>  (0x00007fffb6120000)
	libjson-cwx.so.2 => /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2 (0x00007fffb60e0000)
	libmpiprofilesupport.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libmpiprofilesupport.so.3 (0x00007fffb60b0000)
	libmpi_ibm.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libmpi_ibm.so.3 (0x00007fffb5f30000)
	libstdc++.so.6 => /sw/summit/gcc/6.4.0/lib64/libstdc++.so.6 (0x00007fffb5d20000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fffb5c10000)
	libgcc_s.so.1 => /sw/summit/gcc/6.4.0/lib64/libgcc_s.so.1 (0x00007fffb5bd0000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fffb59e0000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fffb59b0000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00007fffb5980000)
	libhwloc_ompi.so.15 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libhwloc_ompi.so.15 (0x00007fffb5910000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fffb58e0000)
	libevent-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libevent-2.1.so.6 (0x00007fffb5860000)
	libevent_pthreads-2.1.so.6 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libevent_pthreads-2.1.so.6 (0x00007fffb5830000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fffb57f0000)
	libopen-rte.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libopen-rte.so.3 (0x00007fffb56e0000)
	libopen-pal.so.3 => /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-6.4.0/spectrum-mpi-10.3.1.2-20200121-awz2q5brde7wgdqqw4ugalrkukeub4eb/lib/libopen-pal.so.3 (0x00007fffb55f0000)
	/lib64/ld64.so.2 (0x00007fffb6140000)

Unfortunately, any combination of input parameters result in a core dump being emitted and a seg fault.

jsrun -n 2 ${MACSIO_EXEC} --interface hdf5 --parallel_file_mode MIF 2 --part_size 1M

Results:

cat output.490227 
[b28n03:147697] *** Process received signal ***
[b28n03:147697] Signal: Segmentation fault (11)
[b28n03:147697] Signal code: Address not mapped (1)
[b28n03:147697] Failing at address: 0x40
[b28n03:147697] [ 0] [0x2000000504d8]
[b28n03:147697] [ 1] [0x20000004d6b0]
[b28n03:147697] [ 2] /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2(json_object_set_string+0x58)[0x2000000f9838]
[b28n03:147697] [ 3] /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2(json_object_path_set_string+0x34)[0x2000000fa764]
[b28n03:147697] [ 4] /ccs/home/wgodoy/opt/macsio/macsio[0x10018cc0]
[b28n03:147697] [ 5] /ccs/home/wgodoy/opt/macsio/macsio(main+0x900)[0x10005980]
[b28n03:147697] [ 6] /lib64/libc.so.6(+0x25200)[0x200000655200]
[b28n03:147697] [ 7] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000006553f4]
[b28n03:147697] *** End of error message ***
[b28n03:147696] *** Process received signal ***
[b28n03:147696] Signal: Segmentation fault (11)
[b28n03:147696] Signal code: Address not mapped (1)
[b28n03:147696] Failing at address: 0x40
[b28n03:147696] [ 0] [0x2000000504d8]
[b28n03:147696] [ 1] [0x20000004d6b0]
[b28n03:147696] [ 2] /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2(json_object_set_string+0x58)[0x2000000f9838]
[b28n03:147696] [ 3] /ccs/home/wgodoy/opt/json-cwx/lib/libjson-cwx.so.2(json_object_path_set_string+0x34)[0x2000000fa764]
[b28n03:147696] [ 4] /ccs/home/wgodoy/opt/macsio/macsio[0x10018cc0]
[b28n03:147696] [ 5] /ccs/home/wgodoy/opt/macsio/macsio(main+0x900)[0x10005980]
[b28n03:147696] [ 6] /lib64/libc.so.6(+0x25200)[0x200000655200]
[b28n03:147696] [ 7] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000006553f4]
[b28n03:147696] *** End of error message ***
ERROR:  One or more process (first noticed rank 0) terminated with signal 11 (core dumped)

macsio-log.log:

--------------------------------------------------------Processor 000000-------------------------------------------------------

Any help would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions