Skip to content

added a watcher to see if any of the tasks fails and exit on panic#32

Closed
purusang wants to merge 2 commits into
mainfrom
fix/monitor-critical-task-failure
Closed

added a watcher to see if any of the tasks fails and exit on panic#32
purusang wants to merge 2 commits into
mainfrom
fix/monitor-critical-task-failure

Conversation

@purusang

Copy link
Copy Markdown
Collaborator

No description provided.

@purusang purusang requested a review from krsnapaudel November 10, 2025 15:43

@krsnapaudel krsnapaudel left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supposed to abort only in case of error? I see abort in both Ok and Err cases.

}

async fn block_exists(&self, height: i64) -> bool {
pub async fn block_exists(&self, height: i64) -> bool {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this made public?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense let me push better solution for exiting.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused for some time but the logic is correct. All the tasks are supposed to run in an infinite loop so regardless what we get we should abort as both are bad, no any infinite loop should exit. In case of Err we are sure we are getting panics and in case of Ok() we the task completed which should not have happened.

@krsnapaudel krsnapaudel Nov 11, 2025

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be right but both the comment and the error logs are confusing. The comment is not clear about what failure means. Same for log messages.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested by Cursor AI. Feel free to make it concise:

// Monitor critical background tasks that are expected to run indefinitely.
// These tasks should never complete normally - they run in infinite loops.
// If any task terminates (either by completing or panicking), we abort the application
// to prevent inconsistent state and ensure proper error handling.
let abort_handle = tokio::spawn(async move {
    loop {
        tokio::select! {
            // Monitor task 1: Block fetcher
            result = &mut block_fetcher_handle => {
                match result {
                    Ok(_) => {
                        // CRITICAL BUG: Task completed normally when it should run forever.
                        // This indicates the infinite loop exited unexpectedly, which should never happen.
                        // Possible causes:
                        // - Loop exit condition was added/triggered incorrectly
                        // - Early return statement in the task function
                        // - Refactoring that removed the infinite loop
                        error!(
                            task = "block_fetcher",
                            "Critical task 'block_fetcher' completed unexpectedly. This task should run in an infinite loop but returned normally. This is a programming error that requires immediate investigation."
                        );
                        abort_handle.abort();
                        return;
                    }
                    Err(e) => {
                        // CRITICAL ERROR: Task panicked or was aborted.
                        // Panics in critical tasks are fatal and indicate a runtime error.
                        // The JoinError contains information about the panic.
                        error!(
                            task = "block_fetcher",
                            error = ?e,
                            "Critical task 'block_fetcher' panicked or was aborted. This is a fatal runtime error that requires immediate attention. Application will abort."
                        );
                        abort_handle.abort();
                        return;
                    }
                }
            }
...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem adding comments for clarity. Let me do that.

@purusang purusang requested review from krsnapaudel and removed request for krsnapaudel November 11, 2025 06:10
@purusang

Copy link
Copy Markdown
Collaborator Author

This concern is handled in #36. So closing it. cc: @krsnapaudel

@purusang purusang closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants