Skip to content

Open-MPI win_allocate issue #35

@jeffhammond

Description

@jeffhammond

We should root-cause this. My money is on Travis CI environment or a Open-MPI bug, rather than Casper.

testing mpiexec=mpiexec --oversubscribe -np 4 CSP_NG=0 win_allocate ...
CASPER Configuration:
    RMA_ERR_CHECK    (enabled) 
    CSP_VERBOSE      = err|conf_g|warn|conf_win|conf_comm|info
    CSP_NG           = 0
    CSP_ASYNC_CONFIG = on
    CSP_TOPO         = machine
    CSP_ASYNC_MODE   = rma|pt2pt
PT2PT Offloading Options:
    CSP_OFFLOAD_MIN_MSGSZ   = 8192 bytes
    CSP_OFFLOAD_SHMQ_NCELLS = 64 (total 13 Kbytes)
                              cell size = 208 bytes, cell size(aligned) = 256 bytes
--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.
  Local host:  travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74
  System call: open(2) 
  Error:       No such file or directory (errno 2)
--------------------------------------------------------------------------
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** An error occurred in MPI_Win_allocate
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** reported by process [893255681,2]
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** MPI_ERR_WIN: invalid window
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19159] ***    and potentially your MPI job)
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] PMIX ERROR: UNREACHABLE in file ../../../../../../../opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 2147
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[travis-job-daea1d28-3e8e-48bf-b0db-e18066dffe74:19152] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
test failed ! mpiexec --oversubscribe -np 4 /home/travis/build/pmodels/casper/test/win_allocate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions