fix: set job to FAILED on exception in job_pipeline by Abhishek9639 · Pull Request #3661 · intelowlproject/IntelOwl

Abhishek9639 · 2026-04-22T16:31:44Z

Description

When job.execute() throws an exception inside job_pipeline, the except block only marks individual plugin reports as FAILED but never updates the Job object itself. Since execute() sets status = RUNNING before building the pipeline, any failure after that point leaves the job stuck in RUNNING forever.

This adds the missing cleanup to the except block setting the job status to FAILED, recording the error, setting finished_analysis_time, updating the parent investigation, and sending a WebSocket notification. This mirrors the same pattern already used in run_plugin and set_final_status.

Type of change

Bug fix (non-breaking change which fixes an issue).

Checklist

I have read and understood the rules about how to Contribute to this project
The pull request is for the branch develop.
Linters (Ruff) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
I have added tests for the feature/bug I solved (see tests folder). All the tests (new and old ones) gave 0 errors.

Abhishek9639 · 2026-04-22T16:41:02Z

Hi @mlodic,

This PR fixes the stuck job issue reported in #3653. The except block in job_pipeline was only marking plugin reports as failed but never updating the job itself so it stayed stuck in RUNNING with no WebSocket notification and a NULL finished_analysis_time.

I aligned the except block with the same cleanup pattern that run_plugin and set_final_status already follow. Added a regression test for it too.

Let me know if anything needs changing.
Thanks

gks281263

Hey @Abhishek9639, nice work getting this fix in — the core logic is correct and addresses the real gap in job_pipeline. I did a close review and have a few hardening suggestions below to make it production better. Overall the direction is exactly right and aligned with run_plugin / set_final_status.

gks281263 · 2026-04-24T13:47:32Z

+        JobConsumer.serialize_and_send_job(job)


 @shared_task(base=FailureLoggedTask, name="run_plugin", soft_time_limit=500)


Suggestion: Wrap cleanup in its own try/except

If any of these cleanup lines raise (e.g., job.get_root() hits a treebeard MultipleObjectsReturned edge case, or the WebSocket layer has a config issue), the original exception e gets swallowed and replaced with the cleanup exception. The original error would be lost.

Consider:

except Exception as e: logger.exception(e) # ... existing report cleanup ... try: job.status = Job.STATUSES.FAILED.value job.errors.append(str(e)) job.finished_analysis_time = now() job.save(update_fields=["status", "errors", "finished_analysis_time"]) if root_investigation := job.get_root().investigation: root_investigation.set_correct_status(save=True) JobConsumer.serialize_and_send_job(job) except Exception as cleanup_err: logger.exception( f"Failed to clean up job {job_id} after pipeline exception: {cleanup_err}" )

This ensures the original error is always logged regardless of cleanup failures.

gks281263 · 2026-04-24T13:47:32Z

+                "api_app.models.Job.objects.get",
+                return_value=_job,
+            ),
+        ):


Unnecessary mock — this can be removed

The Job.objects.get mock isn't needed here since _job already exists in the test DB, so the real Job.objects.get(pk=_job.pk) will work fine. Removing it makes the test more realistic and would catch regressions where someone accidentally switches from .save() to .filter().update() in the fix.

gks281263 · 2026-04-24T13:47:32Z

+                "execute",
+                side_effect=Exception(error_message),
+            ),
+            patch("api_app.websocket.JobConsumer.serialize_and_send_job"),


Missing assertion: WebSocket notification was actually sent

You're patching JobConsumer.serialize_and_send_job here, but never asserting it was called. Since the whole point of this fix (from the user's perspective) is that the frontend stops showing an infinite spinner, this is worth verifying:

with ( patch.object( _job.__class__, "execute", side_effect=Exception(error_message), ), patch("api_app.websocket.JobConsumer.serialize_and_send_job") as mock_ws, ): job_pipeline(_job.pk) mock_ws.assert_called_once()

gks281263 · 2026-04-24T13:47:32Z

        _job.delete()
        an.delete()

+    def test_job_pipeline_exception_sets_job_to_failed(self):


Suggestion: Add a second test case for the Investigation status update path

The root_investigation.set_correct_status(save=True) branch in the fix is currently untested because this job has no parent investigation. Consider adding a subTest (or separate test) that creates a job attached to an Investigation and asserts the investigation status transitions correctly after the pipeline failure. That way both branches of the if root_investigation conditional are covered.

fix: set job to FAILED on exception in job_pipeline

e792feb

gks281263 reviewed Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: set job to FAILED on exception in job_pipeline#3661

fix: set job to FAILED on exception in job_pipeline#3661
Abhishek9639 wants to merge 1 commit intointelowlproject:developfrom
Abhishek9639:stuck

Abhishek9639 commented Apr 22, 2026

Uh oh!

Abhishek9639 commented Apr 22, 2026

Uh oh!

gks281263 left a comment •

edited

Loading

Uh oh!

gks281263 Apr 24, 2026

Uh oh!

gks281263 Apr 24, 2026

Uh oh!

gks281263 Apr 24, 2026

Uh oh!

gks281263 Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		JobConsumer.serialize_and_send_job(job)


		@shared_task(base=FailureLoggedTask, name="run_plugin", soft_time_limit=500)

Uh oh!

Conversation

Abhishek9639 commented Apr 22, 2026

Description

Type of change

Checklist

Uh oh!

Abhishek9639 commented Apr 22, 2026

Uh oh!

gks281263 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gks281263 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gks281263 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gks281263 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

gks281263 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gks281263 left a comment •

edited

Loading