Skip to content

fix ima in causal_conv1d_channellast_fwd_kernel#101

Open
POI-WX wants to merge 1 commit into
Dao-AILab:mainfrom
POI-WX:fix_ima_in_causal_conv1d_channellast_fwd_kernel
Open

fix ima in causal_conv1d_channellast_fwd_kernel#101
POI-WX wants to merge 1 commit into
Dao-AILab:mainfrom
POI-WX:fix_ima_in_causal_conv1d_channellast_fwd_kernel

Conversation

@POI-WX
Copy link
Copy Markdown

@POI-WX POI-WX commented Mar 25, 2026

  • Align seq_idx boundary condition handling with the bwd implementation to fix the out-of-bounds read access of seq_idx in the causal_conv1d_channellast_fwd_kernel function. Related issue: Problem with seq_idx #67

  • The aforementioned illegal memory access error is masked by the environment variable below during actual training. After unsetting it, we consistently reproduced the error with specific inputs and ultimately traced the issue to improper boundary condition handling.

export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
export PYTORCH_ALLOC_CONF=expandable_segments:True

  • The following unit test script is constructed based on the input that triggered the error. Please note that to reproduce the error on a single GPU, you also need to configure the cudaMallocAsync backend or disable caching, and run it in conjunction with the cuda-memcheck tool.

replay_causal_conv1d_oob.py

export PYTORCH_ALLOC_CONF=backend:cudaMallocAsync

# export PYTORCH_NO_CUDA_MEMORY_CACHING=1

compute-sanitizer --tool memcheck --show-backtrace=yes  \
python3 replay_causal_conv1d_oob.py \
   --device cuda:0 \
   --batch 1 \
   --dim 10240 \
   --seqlen 4099 \
   --width 4 \
   --dtype bfloat16```

@POI-WX
Copy link
Copy Markdown
Author

POI-WX commented May 22, 2026

@tridao could you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant