Skip to content

[Fix] destory pg for megatorn trainer on cleanup#761

Draft
GeneDer wants to merge 4 commits into
mainfrom
genesu/test-calling-destroy-process-group
Draft

[Fix] destory pg for megatorn trainer on cleanup#761
GeneDer wants to merge 4 commits into
mainfrom
genesu/test-calling-destroy-process-group

Conversation

@GeneDer

@GeneDer GeneDer commented Jun 10, 2026

Copy link
Copy Markdown
Member

No description provided.

Signed-off-by: Gene Der Su <e870252314@gmail.com>
Copilot AI review requested due to automatic review settings June 10, 2026 18:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Megatron backend trainer cleanup path to explicitly tear down the active torch.distributed default process group during trainer shutdown, preventing lingering distributed resources after training completes.

Changes:

  • Add dist.destroy_process_group() to MegatronBaseTrainer.cleanup() when torch.distributed is initialized.

Comment on lines +51 to +53
# clean up torch pg resources on exit
if dist.is_initialized():
dist.destroy_process_group()
GeneDer added 2 commits June 10, 2026 11:47
Signed-off-by: Gene Der Su <e870252314@gmail.com>
Signed-off-by: Gene Der Su <e870252314@gmail.com>
Copilot AI review requested due to automatic review settings June 10, 2026 18:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment on lines +51 to +55
# clean up torch pg resources on exit
if dist.is_initialized():
log_rank_0("[MegatronBaseTrainer] calling dist.destroy_process_group()")
try:
dist.destroy_process_group()
Signed-off-by: Gene Der Su <e870252314@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants