fix: set job to FAILED on exception in job_pipeline#3661
fix: set job to FAILED on exception in job_pipeline#3661Abhishek9639 wants to merge 1 commit intointelowlproject:developfrom
Conversation
|
Hi @mlodic, This PR fixes the stuck job issue reported in #3653. The except block in I aligned the except block with the same cleanup pattern that Let me know if anything needs changing. |
There was a problem hiding this comment.
Hey @Abhishek9639, nice work getting this fix in — the core logic is correct and addresses the real gap in job_pipeline. I did a close review and have a few hardening suggestions below to make it production better. Overall the direction is exactly right and aligned with run_plugin / set_final_status.
| JobConsumer.serialize_and_send_job(job) | ||
|
|
||
|
|
||
| @shared_task(base=FailureLoggedTask, name="run_plugin", soft_time_limit=500) |
There was a problem hiding this comment.
Suggestion: Wrap cleanup in its own try/except
If any of these cleanup lines raise (e.g., job.get_root() hits a treebeard MultipleObjectsReturned edge case, or the WebSocket layer has a config issue), the original exception e gets swallowed and replaced with the cleanup exception. The original error would be lost.
Consider:
except Exception as e:
logger.exception(e)
# ... existing report cleanup ...
try:
job.status = Job.STATUSES.FAILED.value
job.errors.append(str(e))
job.finished_analysis_time = now()
job.save(update_fields=["status", "errors", "finished_analysis_time"])
if root_investigation := job.get_root().investigation:
root_investigation.set_correct_status(save=True)
JobConsumer.serialize_and_send_job(job)
except Exception as cleanup_err:
logger.exception(
f"Failed to clean up job {job_id} after pipeline exception: {cleanup_err}"
)This ensures the original error is always logged regardless of cleanup failures.
| "api_app.models.Job.objects.get", | ||
| return_value=_job, | ||
| ), | ||
| ): |
There was a problem hiding this comment.
Unnecessary mock — this can be removed
The Job.objects.get mock isn't needed here since _job already exists in the test DB, so the real Job.objects.get(pk=_job.pk) will work fine. Removing it makes the test more realistic and would catch regressions where someone accidentally switches from .save() to .filter().update() in the fix.
| "execute", | ||
| side_effect=Exception(error_message), | ||
| ), | ||
| patch("api_app.websocket.JobConsumer.serialize_and_send_job"), |
There was a problem hiding this comment.
Missing assertion: WebSocket notification was actually sent
You're patching JobConsumer.serialize_and_send_job here, but never asserting it was called. Since the whole point of this fix (from the user's perspective) is that the frontend stops showing an infinite spinner, this is worth verifying:
with (
patch.object(
_job.__class__,
"execute",
side_effect=Exception(error_message),
),
patch("api_app.websocket.JobConsumer.serialize_and_send_job") as mock_ws,
):
job_pipeline(_job.pk)
mock_ws.assert_called_once()| _job.delete() | ||
| an.delete() | ||
|
|
||
| def test_job_pipeline_exception_sets_job_to_failed(self): |
There was a problem hiding this comment.
Suggestion: Add a second test case for the Investigation status update path
The root_investigation.set_correct_status(save=True) branch in the fix is currently untested because this job has no parent investigation. Consider adding a subTest (or separate test) that creates a job attached to an Investigation and asserts the investigation status transitions correctly after the pipeline failure. That way both branches of the if root_investigation conditional are covered.
Closes #3653
Description
When
job.execute()throws an exception insidejob_pipeline, the except block only marks individual plugin reports as FAILED but never updates the Job object itself. Sinceexecute()setsstatus = RUNNINGbefore building the pipeline, any failure after that point leaves the job stuck in RUNNING forever.This adds the missing cleanup to the except block setting the job status to FAILED, recording the error, setting
finished_analysis_time, updating the parent investigation, and sending a WebSocket notification. This mirrors the same pattern already used inrun_pluginandset_final_status.Type of change
Checklist
develop.Ruff) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.testsfolder). All the tests (new and old ones) gave 0 errors.