Skip to content

[UIL] Fix triton LSA test library loading and allocator robustness#484

Open
MC952-arch wants to merge 1 commit into
flagos-ai:mainfrom
MC952-arch:fix-triton-testcase
Open

[UIL] Fix triton LSA test library loading and allocator robustness#484
MC952-arch wants to merge 1 commit into
flagos-ai:mainfrom
MC952-arch:fix-triton-testcase

Conversation

@MC952-arch
Copy link
Copy Markdown
Collaborator

@MC952-arch MC952-arch commented May 28, 2026

PR Category

UIL

PR Types

Bug Fixes

PR Description

This PR improves reliability of the Triton-based LSA (Local Shared Access) test by making FlagCX shared library discovery and the CUDA pluggable allocator setup more robust, reducing failures due to missing/incorrect library paths and partially-initialized wrapper objects.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves reliability of the Triton-based LSA (Local Shared Access) test by making FlagCX shared library discovery and the CUDA pluggable allocator setup more robust, reducing failures due to missing/incorrect library paths and partially-initialized wrapper objects.

Changes:

  • Add FLAGCX_LIB_PATH support and cache the compiled CUDA pluggable allocator + torch.cuda.MemPool, with clearer failure signaling in the LSA test.
  • Add default libflagcx.so discovery logic in FLAGCXLibrary (via $FLAGCX_PATH or repo-local build/lib) and harden destructor cleanup to avoid crashes on partial initialization.
  • Adjust shared library caching in FLAGCXLibrary to reuse loaded ctypes.CDLL instances per path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
plugin/interservice/test_triton_lsa.py Adds configurable linker path for libflagcx.so, caches allocator/mempool compilation results, and fails fast if the pool can’t be initialized.
plugin/interservice/flagcx_wrapper.py Adds default library search logic and improves robustness around library loading and cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +139 to +141
import atexit
atexit.register(_cleanup_flagcx_mem_pool)
atexit.register(_cleanup_flagcx_allocator_wrapper)
Comment on lines 468 to +471
def __del__(self):
# free flagcx handler
self.FLAGCX_CHECK(self._funcs["flagcxHandleFree"](self.handler))
if hasattr(self, '_funcs') and hasattr(self, 'handler'):
self.FLAGCX_CHECK(self._funcs["flagcxHandleFree"](self.handler))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants