[BUG] : Cleanup jobs silently skip roughly half of eligible documents due to OFFSET pagination over a self-mutating query

### Description of the Bug

File: backend/app/services/cleanup.py (`cleanup_stale_documents`, `cleanup_old_deleted_documents`, `cleanup_inactive_active_documents`)

All three scheduled cleanup functions paginate with the same pattern: loop, fetch `.filter(...).limit(_CLEANUP_BATCH_SIZE).offset(offset).all()`, mutate every row in the batch (either flip `status` to "failed" or `db.delete(doc)`), then increment `offset += _CLEANUP_BATCH_SIZE` for the next iteration — without ever committing per batch (the surrounding `get_db_session()` context manager, per backend/app/database.py, only commits once when the entire `with` block exits).

Because the SQLAlchemy session has default autoflush behavior, the pending UPDATE/DELETE statements from each batch are flushed to the database before the next query executes within the same transaction — which means each row that was just mutated/deleted drops out of the filter criteria for the *next* iteration's query (e.g. a doc just marked `status = "failed"` no longer matches `Document.status == "processing"`; a deleted row no longer matches at all). The matching result set shrinks by one full batch every iteration, but `offset` still advances by a full batch, so each iteration after the first skips an entire batch's worth of still-eligible rows. None of the three queries have an explicit `ORDER BY` either, so result ordering across calls within the mutating transaction isn't guaranteed stable.

### Steps to Reproduce

Proposed fix approach: don't advance `offset` across iterations at all — since each batch's mutation removes those rows from the matching set, re-querying with `offset=0` each time (or, better, ordering by primary key and committing per batch so the next `SELECT ... LIMIT` simply picks up the next naturally-remaining rows) converges correctly without needing to track a manually advancing offset.


### Expected Behavior

Net effect: with N eligible rows where N > _CLEANUP_BATCH_SIZE, a single scheduled run processes roughly half of them and silently leaves the rest unprocessed — stale "processing" documents that should be marked failed remain stuck, and old soft-deleted/inactive documents that should be purged (per `DOC_CLEANUP_MAX_AGE_DAYS` / `DOC_CLEANUP_INACTIVE_DAYS`) are retained indefinitely beyond their configured retention window, contradicting the docstring's stated retention guarantee and causing unbounded storage growth.

### Screenshots / Logs

_No response_

### Environment

Windows

### GSSoC '26

- [x] Yes, I am participating in GirlScript Summer of Code and would like to fix this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] : Cleanup jobs silently skip roughly half of eligible documents due to OFFSET pagination over a self-mutating query #638

Description of the Bug

Steps to Reproduce

Expected Behavior

Screenshots / Logs

Environment

GSSoC '26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] : Cleanup jobs silently skip roughly half of eligible documents due to OFFSET pagination over a self-mutating query #638

Description

Description of the Bug

Steps to Reproduce

Expected Behavior

Screenshots / Logs

Environment

GSSoC '26

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions