added a watcher to see if any of the tasks fails and exit on panic#32
added a watcher to see if any of the tasks fails and exit on panic#32purusang wants to merge 2 commits into
Conversation
krsnapaudel
left a comment
There was a problem hiding this comment.
Are we supposed to abort only in case of error? I see abort in both Ok and Err cases.
| } | ||
|
|
||
| async fn block_exists(&self, height: i64) -> bool { | ||
| pub async fn block_exists(&self, height: i64) -> bool { |
There was a problem hiding this comment.
Why was this made public?
There was a problem hiding this comment.
Yeah makes sense let me push better solution for exiting.
There was a problem hiding this comment.
I got confused for some time but the logic is correct. All the tasks are supposed to run in an infinite loop so regardless what we get we should abort as both are bad, no any infinite loop should exit. In case of Err we are sure we are getting panics and in case of Ok() we the task completed which should not have happened.
There was a problem hiding this comment.
You may be right but both the comment and the error logs are confusing. The comment is not clear about what failure means. Same for log messages.
There was a problem hiding this comment.
Suggested by Cursor AI. Feel free to make it concise:
// Monitor critical background tasks that are expected to run indefinitely.
// These tasks should never complete normally - they run in infinite loops.
// If any task terminates (either by completing or panicking), we abort the application
// to prevent inconsistent state and ensure proper error handling.
let abort_handle = tokio::spawn(async move {
loop {
tokio::select! {
// Monitor task 1: Block fetcher
result = &mut block_fetcher_handle => {
match result {
Ok(_) => {
// CRITICAL BUG: Task completed normally when it should run forever.
// This indicates the infinite loop exited unexpectedly, which should never happen.
// Possible causes:
// - Loop exit condition was added/triggered incorrectly
// - Early return statement in the task function
// - Refactoring that removed the infinite loop
error!(
task = "block_fetcher",
"Critical task 'block_fetcher' completed unexpectedly. This task should run in an infinite loop but returned normally. This is a programming error that requires immediate investigation."
);
abort_handle.abort();
return;
}
Err(e) => {
// CRITICAL ERROR: Task panicked or was aborted.
// Panics in critical tasks are fatal and indicate a runtime error.
// The JoinError contains information about the panic.
error!(
task = "block_fetcher",
error = ?e,
"Critical task 'block_fetcher' panicked or was aborted. This is a fatal runtime error that requires immediate attention. Application will abort."
);
abort_handle.abort();
return;
}
}
}
...
There was a problem hiding this comment.
No problem adding comments for clarity. Let me do that.
|
This concern is handled in #36. So closing it. cc: @krsnapaudel |
No description provided.