Queue#upsertScheduler starts right away.... #75

arthurvanl · 2026-04-01T11:12:17Z

arthurvanl
Apr 1, 2026

I am using vesion 2.6.95 (both in tcp and client)

Originally posted by @arthurvanl in #72

I still not working like at all:

now it always starts right away????
and other issue is still not fixed (when job running longer than pattern & app stopped, after restart job right away)

@egeominotti this is going on since the first time I started using this.

Answered by egeominotti

Apr 3, 2026

Fix released in v2.6.113

After a deep investigation, we identified the root cause — two interacting bugs in code paths that previous fixes had not covered:

Bug 1: Lock expiration re-queued cron jobs

processExpiredLockInner in lockManager.ts treated cron jobs with preventOverlap the same as regular jobs. During graceful shutdown, _doClose() stops heartbeats immediately, so if the job takes longer than the lock TTL (30s default), the lock expires and the cron job gets re-queued — sitting in the waiting state for the next worker to pull.

Fix: Cron jobs (identified by uniqueKey.startsWith("cron:")) are now discarded on lock expiration instead of re-queued. The cron scheduler will re-create th…

View full answer

arthurvanl · 2026-04-01T12:06:25Z

arthurvanl
Apr 1, 2026
Author

It also doesn't open the workers....

It also didnt add all workers:

My setup:

Im starting to dislike the tool because of this. Please fix this

0 replies

egeominotti · 2026-04-01T12:11:48Z

egeominotti
Apr 1, 2026
Maintainer

This is an open source project. If you'd like to help with development, nothing is taken for granted, as it's free. I hope you understand.

You're asking for features that weren't on bunqueue, it's a complex project if something doesn't work and you don't like it, do as you wish.

If you want to help you have to paste all the code no screenshots, from now on I will close every issue you open with images.

0 replies

arthurvanl · 2026-04-01T12:13:52Z

arthurvanl
Apr 1, 2026
Author

This is an open source project. If you'd like to help with development, nothing is taken for granted, as it's free. I hope you understand.

I understand that. But it's kinda annoying if you get told that something is fixed and after testing each time it has not been changed. Just a big frustration for me.

0 replies

arthurvanl · 2026-04-01T12:14:17Z

arthurvanl
Apr 1, 2026
Author

If you want to help you have to paste all the code no screenshots, from now on I will close every issue you open with images.

That seems fair. I understand

0 replies

egeominotti · 2026-04-01T12:15:08Z

egeominotti
Apr 1, 2026
Maintainer

If you want to help the project, please paste the actual code examples because you are leading me astray with your incomplete and partial examples.

0 replies

arthurvanl · 2026-04-01T12:15:57Z

arthurvanl
Apr 1, 2026
Author

I thought the screenshots were enough proof. But next time ill try to make a reproduction.

0 replies

egeominotti · 2026-04-01T12:16:36Z

egeominotti
Apr 1, 2026
Maintainer

If you reproduce the problem for me with real code, we can solve it very easily so it is really impossible to understand well.

0 replies

arthurvanl · 2026-04-01T12:16:53Z

arthurvanl
Apr 1, 2026
Author

Okay ill do that in a sec!

0 replies

egeominotti · 2026-04-01T12:18:35Z

egeominotti
Apr 1, 2026
Maintainer

@arthurvanl I thank you for the bugs you find, but you must help me reproduce them faithfully otherwise I cannot fully understand every detail. This step is essential.

0 replies

egeominotti · 2026-04-01T12:49:24Z

egeominotti
Apr 1, 2026
Maintainer

Fixed in the next release (cron scheduling part).

Root cause: skipMissedOnRestart defaulted to false, so when the server restarted, load() kept past nextRun values as-is. The scheduler then saw nextRun < now and fired the cron immediately.

Fix: Changed the default of skipMissedOnRestart to true. On restart, past-due crons now recalculate nextRun to the next future occurrence instead of firing immediately. If you explicitly need catch-up behavior, you can opt in with skipMissedOnRestart: false.

This also fixes the case where a job runs longer than the cron interval and the app restarts — the missed execution is skipped and the cron resumes on its normal schedule.

Regarding the workers closing immediately and not all workers appearing in bunqueue worker list — I can't diagnose this from screenshots alone. Could you please paste the actual code that creates and starts the workers? Specifically:

How you create the Queue and Worker instances
Where they are stored (are they in a top-level variable or inside a function?)
The full startup sequence

A minimal reproduction script would be ideal. Thanks for your patience.

0 replies

arthurvanl · 2026-04-01T13:04:26Z

arthurvanl
Apr 1, 2026
Author

Okay I got a reproduction now @egeominotti

0 replies

arthurvanl · 2026-04-01T13:05:57Z

arthurvanl
Apr 1, 2026
Author

import { sleep } from "bun";
import { closeSharedTcpClient, Queue, shutdownManager, Worker } from "bunqueue/client";

const BQ_TOKEN = '',
BQ_HOST = ''

//* ONLY USING SERVER MODE (AKA TCP MODE)

const EMBEDDED_MODE = false as boolean;

/**
 * Connection options for bunqueue (only used in production)
 */
const BQ_CONNECTION_CONFIG = !EMBEDDED_MODE ? 
{
    host: BQ_HOST,
    token: BQ_TOKEN
} : undefined

const DEFAULT_JOB_OPTS = {
    removeOnComplete: true,
    removeOnFail: true
}

export const DEFAULT_QUEUE_OPTS = { 
    connection: BQ_CONNECTION_CONFIG, 
    embedded: EMBEDDED_MODE, 
    defaultJobOptions: DEFAULT_JOB_OPTS
}

const disableQueueFeatures = async (...queues: Queue[]) => {
    for(let i=0; i<queues.length; i++) {
        queues[i].setStallConfig({enabled: false});
        queues[i].setDlqConfig({ autoRetry: false,  maxAutoRetries: 0 });
    }
}

const worker = new Worker('testing', async (job) => {

    console.log('started')
    await sleep((1000 * 60) + 10_000);
    console.log('done')

    await job.moveToCompleted('')
}, DEFAULT_QUEUE_OPTS);

worker.run();

const queue = new Queue('testing', DEFAULT_QUEUE_OPTS)

disableQueueFeatures(queue);

await queue.upsertJobScheduler('start-new-test-job', {
    pattern: '* * * * *',
    timezone: 'Europe/Amsterdam',
    skipIfNoWorker: true
});

const gracefulShutdown = async (signal: string) => {

    if(process.env.SHUTTING_DOWN === '1') {console.log('Triggered twice somehow');return }
    process.env.SHUTTING_DOWN = '1'
    console.log(`Received ${signal}. Starting graceful shutdown...`);
    
    //todo: remove comments and replace it with actual queues

    //* stop accepting new jobs
    console.log('Pausing all workers...');
    // orderWorker.pause();
    worker.pause();

    //* sandboxed workers will only stop if active jobs are done
    // no sandboxed workers yet

    console.log('Closing queue connections...');
    queue.close();

    await worker.close();

    //* shutdown the embedded manager (flushes SQLite)
    shutdownManager();

    // TCP mode: close the shared TCP connection pool
    if(!EMBEDDED_MODE) {
        closeSharedTcpClient()
    }
}

process.on('SIGINT', () => gracefulShutdown('SIGINT'));
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'))
process.on('unhandledRejection', (err) => {
    console.log(err)
    gracefulShutdown('unhandledRejection')
})
process.on('uncaughtException', (err) => {
    console.log(err)
    gracefulShutdown('uncaughtException')
})

As soon as I see started into the console, I press ^C to kill the app. After finishing the job, the app stops.

After starting the app again you can see that the job is executed right away.

0 replies

arthurvanl · 2026-04-01T14:07:47Z

arthurvanl
Apr 1, 2026
Author

Is this a good reproduction?

0 replies

egeominotti · 2026-04-01T14:23:39Z

egeominotti
Apr 1, 2026
Maintainer

@arthurvanl yes, thx

0 replies

egeominotti · 2026-04-01T17:41:58Z

egeominotti
Apr 1, 2026
Maintainer

@arthurvanl Fixed in v2.6.98 — published to npm.

Root cause: When a cron interval is shorter than job processing time (e.g., * * * * * with a 70s job), the scheduler pushed a second job to the queue while the first was still active. That leftover job got picked up immediately on worker restart.

Fix: Added preventOverlap option (default: true). When enabled, cron-fired jobs automatically get a uniqueKey derived from the cron name (cron:<name>). The existing deduplication mechanism blocks the push if a previous job from the same cron is still active or waiting in the queue. Once the job completes (ACK), the key is released and the next cron fire works normally.

No code changes needed on your side — preventOverlap defaults to true. Just update to 2.6.98:

bun add bunqueue@2.6.98

If you ever need multiple cron jobs to run in parallel (intentional overlap), you can opt out with:

await queue.upsertJobScheduler('my-cron', {
    pattern: '* * * * *',
    preventOverlap: false,
});

@arthurvanl can you test with 2.6.98 and let us know if it fixes the issue? Thanks for reporting this.

0 replies

arthurvanl · 2026-04-02T13:06:08Z

arthurvanl
Apr 2, 2026
Author

⚠️ Note: This is a highly complex bug that required multiple iterations to fully resolve. The issue involves 5 distinct code paths across disconnect handlers, stall detection, worker lifecycle, and cron scheduling — each of which could independently re-trigger the immediate job execution.

We've addressed all identified paths across versions 2.6.99–2.6.105, but given the complexity and the fact that your WSL→VPS network setup introduces non-trivial TCP connection behavior (delayed FIN packets, stale connections), further iterations may be needed if edge cases remain.

Please test with 2.6.105 and report back — your feedback is essential to confirm the fix covers all real-world scenarios.

@egeominotti do you suggest I do my setup differently?

0 replies

egeominotti · 2026-04-02T13:16:13Z

egeominotti
Apr 2, 2026
Maintainer

v2.6.106 — Found the 6th (and final) code path

@arthurvanl Your timestamps confirmed the exact bug: "Job started at" comes BEFORE "After upsert". This means the job fires during the upsert call itself, not from disconnect or stall detection.

Root cause

Between Run 1 disconnect and Run 2 reconnect, a cron tick fires at the next minute boundary. The stale worker (still within the 30s heartbeat timeout) makes skipIfNoWorker pass, so the job gets pushed into the queue. When Run 2 connects, the new worker immediately pulls this orphaned job — before the upsert even completes.

Fix

On upsertJobScheduler with preventOverlap, we now remove any orphaned queued job with the cron's uniqueKey before re-registering the cron. This ensures a clean slate — the cron scheduler will create a fresh job at the correct next scheduled time.

Summary of all 6 code paths fixed (v2.6.99–v2.6.106):

v2.6.99: Stall recovery on restart re-queued cron jobs from SQLite
v2.6.100: TCP/WS/SSE disconnect handlers didn't deregister workers
v2.6.103: releaseClientJobs re-queued cron jobs instead of discarding them
v2.6.104: Stall detector retry/DLQ paths re-queued cron jobs
v2.6.105: getForQueue() returned stale workers (no heartbeat check)
v2.6.106: Orphaned cron job in queue not cleaned up on upsert

Please update to 2.6.106 and test again. Your upsertJobScheduler config is correct.

If it still doesn't work, please provide:

Full server logs (start the server with DEBUG=* bunqueue start or similar)
The exact timestamps from both Run 1 and Run 2
Server OS and version
Whether the server was restarted between runs or kept running
The output of bunqueue stats or bunqueue workers between Run 1 and Run 2 (if possible)

This will help us pinpoint if there's a 7th code path we haven't found yet.

0 replies

arthurvanl · 2026-04-02T13:28:25Z

arthurvanl
Apr 2, 2026
Author

@egeominotti
Updated, same issue:

minty@Minty005:/mnt/c/Users/minty/Desktop/programs/bq-reproduction$ bun test.ts
Before upsert: 2026-04-02T13:25:20.575Z
After upsert: 2026-04-02T13:25:20.649Z
Job started at: 2026-04-02T13:26:00.039Z
^CReceived SIGINT. Starting graceful shutdown...
Pausing all workers...
Closing queue connections...
done with job
minty@Minty005:/mnt/c/Users/minty/Desktop/programs/bq-reproduction$ bun test.ts
Before upsert: 2026-04-02T13:27:12.144Z
After upsert: 2026-04-02T13:27:12.190Z
Job started at: 2026-04-02T13:27:12.196Z

I can test for another hour and 30 minutes before I need to leave.

0 replies

arthurvanl · 2026-04-02T13:34:50Z

arthurvanl
Apr 2, 2026
Author

I didnt saw your comment about DEBUG=
Let me try that aswell

0 replies

arthurvanl · 2026-04-02T13:44:54Z

arthurvanl
Apr 2, 2026
Author

I added DEBUG= but didnt show any console messages from TCP

THis is the server os & name:

Linux v222 6.1.0-43-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.162-1 (2026-02-08) x86_64 GNU/Linux

The server was never restarted when testing. Only client is restarted allowing me to do a second run.

I could not give you the output of the bunqueue stats or bunqueue workers because I only have terminal access to this vps.

0 replies

egeominotti · 2026-04-02T15:27:11Z

egeominotti
Apr 2, 2026
Maintainer

@arthurvanl This is the key issue:

The server was never restarted when testing. Only client is restarted allowing me to do a second run.

All the fixes (v2.6.99–v2.6.106) are server-side code. Updating the npm package doesn't update the running process — you need to restart the server for the fixes to take effect.

Please:

Stop the bunqueue server on your VPS
Verify the installed version: bunqueue --version (should show 2.6.106)
Start the server again: bunqueue start
Run your test

The fix won't work until the server process is restarted with the new code.

0 replies

egeominotti · 2026-04-02T15:29:16Z

egeominotti
Apr 2, 2026
Maintainer

@arthurvanl Important: you MUST restart the server process on your VPS. Simply running bun add bunqueue@latest updates the files on disk, but the running server process still uses the old code in memory. Without a restart, none of the 6 fixes are active — you've been testing against the same old version every time.

Steps:

Kill the current bunqueue server process on your VPS (pkill -f bunqueue or Ctrl+C on the terminal running it)
Verify version: npx bunqueue --version → should show 2.6.106
Start the server again: bunqueue start
Then run your test from WSL

I've also tested this on a remote Linux server myself and it works correctly — the cron job waits for the next scheduled minute after reconnect, no immediate fire.

Let me know once you've restarted the server and tested again.

0 replies

egeominotti · 2026-04-02T15:34:59Z

egeominotti
Apr 2, 2026
Maintainer

@arthurvanl I just tested this on a remote Linux server (Debian, same as yours) with v2.6.106 — server and client on the same machine:

Run 1 (clean start):

Before upsert: 2026-04-02T15:31:38.052Z
After upsert: 2026-04-02T15:31:38.064Z
Job started at: 2026-04-02T15:32:00.030Z ✅ (waited for next minute)

Run 2 (after disconnect, server kept running):

Before upsert: 2026-04-02T15:33:13.240Z
After upsert: 2026-04-02T15:33:13.242Z
Job started at: 2026-04-02T15:34:00.013Z ✅ (waited ~47s for next minute, NOT immediate)

The fix works. The key is that your server process must be restarted after updating the package. Please restart the bunqueue server on your VPS and test again.

0 replies

egeominotti · 2026-04-02T15:35:31Z

egeominotti
Apr 2, 2026
Maintainer

Test environment details:

Server: Debian 13 (Trixie), Linux 6.12.74, x86_64, cloud VPS
Bun: v1.3.11
bunqueue: v2.6.106 (fresh install, server started fresh)
Test: server and client on the same Linux machine, TCP localhost:6789
Cron pattern: * * * * * with preventOverlap: true and skipIfNoWorker: true
Procedure: Run 1 → timeout/kill client (simulating SIGINT) → Run 2 immediately after, server kept running the entire time

Both runs correctly waited for the next scheduled minute boundary. No immediate fire on reconnect.

0 replies

arthurvanl · 2026-04-03T06:13:44Z

arthurvanl
Apr 3, 2026
Author

@egeominotti
I never meant that I didnt update the server to the latest version. You asked me this:

Whether the server was restarted between runs or kept running
And I answered on that question. I have not restarted server between run 1 and run 2.

0 replies

arthurvanl · 2026-04-03T06:19:03Z

arthurvanl
Apr 3, 2026
Author

@egeominotti

Maybe its because of how the server is setup.

I updated package on the server by the following command:

bun update bunqueue@2.6.109 -g

Server bq.service file:

[Unit]
Description=bunqueue Job Queue Server
After=network.target

[Service]
Type=simple
User=root
#Group=bunqueue
WorkingDirectory=/opt/bunqueue
Environment=AUTH_TOKENS=3dbd158a-4f37-4f00-be7a-3ec2de318ebb
Environment=DATA_PATH=./data/bunq.db
#ExecStart=/root/.bun/bin/bun run bunqueue start \
 # --tcp-port 6789 \
  #--data-path /opt/bunqueue/queue.db
ExecStart=/root/.bun/bin/bun /root/.bun/bin/bunqueue
Restart=no
#RestartSec=5

# Resource limits
#LimitNOFILE=65535
#MemoryMax=2G

# Security
NoNewPrivileges=true
ProtectSystem=strict
ReadWritePaths=/opt/bunqueue

[Install]
WantedBy=multi-user.targe

Server Logs from testing:
root@v222:/opt/bunqueue# bunqueue stats
Server Statistics:

Waiting: 0
Active: 0
Delayed: 0
Completed: 0
Failed: 0
DLQ: 0
root@v222:/opt/bunqueue# bunqueue worker list
worker-testing-1775196679394-w5amx8xp: testing (testing)
root@v222:/opt/bunqueue# bunqueue queue list
No queues found
root@v222:/opt/bunqueue# bunqueue cron list
start-new-test-job-1
Queue: testing
Schedule: * * * * *
Executions: 0
root@v222:/opt/bunqueue# bunqueue cron list (this is the result after doing testing)
start-new-test-job-1
Queue: testing
Schedule: * * * * *
Executions: 3
root@v222:/opt/bunqueue#

My client logs (client lives now on the vps aswell, because you tested that):

root@v222:/opt/bq-reproduction# bun index.ts
Before upsert: 2026-04-03T06:11:19.397Z
After upsert: 2026-04-03T06:11:19.437Z
Job started at: 2026-04-03T06:12:00.020Z
^CReceived SIGINT. Starting graceful shutdown...
Pausing all workers...
Closing queue connections...
done with job
root@v222:/opt/bq-reproduction# bun index.ts
Before upsert: 2026-04-03T06:13:25.831Z
Job started at: 2026-04-03T06:13:25.838Z
After upsert: 2026-04-03T06:13:25.838Z
^CReceived SIGINT. Starting graceful shutdown...
Pausing all workers...
Closing queue connections...
done with job

My client test file:

import { sleep } from "bun";
import { closeSharedTcpClient, Queue, shutdownManager, Worker } from "bunqueue/client";

const BQ_TOKEN = '3dbd158a-4f37-4f00-be7a-3ec2de318ebb',
BQ_HOST = '148.251.119.252'

//* ONLY USING SERVER MODE (AKA TCP MODE)

const EMBEDDED_MODE = false as boolean;

/**
 * Connection options for bunqueue (only used in production)
 */
const BQ_CONNECTION_CONFIG = !EMBEDDED_MODE ?
{
    host: BQ_HOST,
    token: BQ_TOKEN
} : undefined

const DEFAULT_JOB_OPTS = {
    removeOnComplete: true,
    removeOnFail: true
}

export const DEFAULT_QUEUE_OPTS = {
    connection: BQ_CONNECTION_CONFIG,
    embedded: EMBEDDED_MODE,
    defaultJobOptions: DEFAULT_JOB_OPTS
}

const disableQueueFeatures = async (...queues: Queue[]) => {
    for(let i=0; i<queues.length; i++) {
        queues[i].setStallConfig({enabled: false});
        queues[i].setDlqConfig({ autoRetry: false,  maxAutoRetries: 0 });
    }
}

const worker = new Worker('testing', async (job) => {

    console.log('Job started at:', new Date().toISOString());
    await sleep((1000 * 60) + 10_000);
    console.log('done with job')

    await job.moveToCompleted('')
}, DEFAULT_QUEUE_OPTS);

worker.run();

const queue = new Queue('testing', DEFAULT_QUEUE_OPTS)

disableQueueFeatures(queue);

console.log('Before upsert:', new Date().toISOString());

await queue.upsertJobScheduler('start-new-test-job-1', {
    pattern: '* * * * *',
    timezone: 'Europe/Amsterdam',
    preventOverlap: true,
    skipIfNoWorker: true
});

console.log('After upsert:', new Date().toISOString());

const gracefulShutdown = async (signal: string) => {

    if(process.env.SHUTTING_DOWN === '1') {console.log('Triggered twice somehow');return }
    process.env.SHUTTING_DOWN = '1'
    console.log(`Received ${signal}. Starting graceful shutdown...`);

    //todo: remove comments and replace it with actual queues

    //* stop accepting new jobs
    console.log('Pausing all workers...');
    // orderWorker.pause();
    worker.pause();

    //* sandboxed workers will only stop if active jobs are done
    // no sandboxed workers yet

    console.log('Closing queue connections...');
    queue.close();

    await worker.close();

    //* shutdown the embedded manager (flushes SQLite)
    shutdownManager();

    // TCP mode: close the shared TCP connection pool
    if(!EMBEDDED_MODE) {
        closeSharedTcpClient()
    }
}

process.on('SIGINT', () => gracefulShutdown('SIGINT'));
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'))
process.on('unhandledRejection', (err) => {
    console.log(err)
    gracefulShutdown('unhandledRejection')
})
process.on('uncaughtException', (err) => {
    console.log(err)
    gracefulShutdown('uncaughtException')
})

Procedure: Run 1 → timeout/kill client (simulating SIGINT) → Run 2 immediately after, server was restarted before testing

0 replies

arthurvanl · 2026-04-03T09:23:39Z

arthurvanl
Apr 3, 2026
Author

@egeominotti I have included my credentials so you can test with my bunqueue tcp server.

0 replies

egeominotti · 2026-04-03T11:31:45Z

egeominotti
Apr 3, 2026
Maintainer

@arthurvanl I tested your exact scenario on v2.6.111 with server restart, long-running job (70s), kill during execution, and reconnect — the cron correctly waits for the next minute boundary every time. I cannot reproduce the bug.

I suspect the issue is that your server is not actually running the latest version. Here's why:

The problem with `bun update bunqueue -g`

When you run bun update bunqueue@2.6.111 -g, it updates the package in bun's global cache. But your systemd service runs:

ExecStart=/root/.bun/bin/bun /root/.bun/bin/bunqueue

The binary at /root/.bun/bin/bunqueue is a symlink that may be cached or stale. Also, systemctl restart bq.service might not pick up the new binary if the process was already loaded.

How to verify and properly update

# 1. Check what version the server is actually running
curl http://localhost:6789/health | jq .version
# or
bunqueue --version

# 2. Stop the service completely
sudo systemctl stop bq.service

# 3. Update the package
bun install -g bunqueue@latest

# 4. Verify the binary is updated
bunqueue --version
# Should show 2.6.111

# 5. Clear any old SQLite data (optional, for clean test)
rm -f /opt/bunqueue/data/bunq.db*

# 6. Start the service
sudo systemctl start bq.service

# 7. Verify running version
curl http://localhost:6789/health | jq .version

Key points

Always check the version with curl http://localhost:6789/health after restart — this tells you exactly what version the running server is
Your service has Restart=no — if the process dies, systemd won't restart it. A systemctl restart bq.service is required after every update
The cron fixes went through multiple iterations (v2.6.98 → v2.6.106 → v2.6.109). All the fixes are cumulative in v2.6.111

Can you run curl http://localhost:6789/health | jq .version on your VPS and paste the output? That will confirm if the server is actually running the latest code.

2 replies

arthurvanl Apr 3, 2026
Author

@egeominotti I have tested it again. I did all the commands above you did:

The same result

root@arthur:/opt/bq-repro# systemctl stop bq
root@arthur:/opt/bq-repro# syst^C
root@arthur:/opt/bq-repro# systemctl start ^C
root@arthur:/opt/bq-repro# systemctl start bq
root@arthur:/opt/bq-repro# curl http://arthurvanl.nl:6790/health | jq | grep "version"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   216  100   216    0     0  26090      0 --:--:-- --:--:-- --:--:-- 27000
  "version": "2.6.112",
root@arthur:/opt/bq-repro# bun index.ts
Before upsert: 2026-04-03T13:50:51.005Z
After upsert: 2026-04-03T13:50:51.046Z
Job started at: 2026-04-03T13:51:00.048Z
^CReceived SIGINT. Starting graceful shutdown...
Pausing all workers...
Closing queue connections...
done with job
root@arthur:/opt/bq-repro# bunx bunqueue --version
bunqueue v2.6.112
root@arthur:/opt/bq-repro# bun index.ts
Before upsert: 2026-04-03T13:52:33.704Z
Job started at: 2026-04-03T13:52:33.714Z
After upsert: 2026-04-03T13:52:33.715Z

arthurvanl Apr 3, 2026
Author

I just found out that it keeps going wrong:

root@arthur:/opt/bq-repro# bun index.ts
Before upsert: 2026-04-03T13:52:33.704Z
Job started at: 2026-04-03T13:52:33.714Z
After upsert: 2026-04-03T13:52:33.715Z
done with job
Job started at: 2026-04-03T13:53:43.773Z
done with job
Job started at: 2026-04-03T13:54:53.828Z
done with job
Job started at: 2026-04-03T13:56:03.884Z
done with job
Job started at: 2026-04-03T13:57:13.939Z
done with job
Job started at: 2026-04-03T13:58:23.994Z
done with job
Job started at: 2026-04-03T13:59:34.049Z

egeominotti · 2026-04-03T16:18:49Z

egeominotti
Apr 3, 2026
Maintainer

Fix released in v2.6.113

After a deep investigation, we identified the root cause — two interacting bugs in code paths that previous fixes had not covered:

Bug 1: Lock expiration re-queued cron jobs

processExpiredLockInner in lockManager.ts treated cron jobs with preventOverlap the same as regular jobs. During graceful shutdown, _doClose() stops heartbeats immediately, so if the job takes longer than the lock TTL (30s default), the lock expires and the cron job gets re-queued — sitting in the waiting state for the next worker to pull.

Fix: Cron jobs (identified by uniqueKey.startsWith("cron:")) are now discarded on lock expiration instead of re-queued. The cron scheduler will re-create them at the next scheduled tick.

Bug 2: Batch ACK skipped stall recovery

ackBatch and ackBatchWithResults in queueManager.ts silently skipped jobs whose lock verification failed (continue), without calling completeStallRetriedJob. The single ack() method had this recovery path, but the batch paths — which are what the worker's ackBatcher actually uses — did not.

Fix: Both batch ACK methods now call completeStallRetriedJob when lock verification fails, matching the single ack() behavior.

Why previous fixes didn't resolve it

Versions v2.6.99–v2.6.112 correctly fixed 6 other code paths (stall detection, client tracking, background recovery, TCP disconnect, heartbeat filtering, upsert cleanup), but lock expiration was the 7th path that could re-queue a cron job — and the batch ACK gap meant even stall-retried jobs weren't cleaned up.

@arthurvanl Could you please update to v2.6.113 and test? This should resolve the immediate-fire-on-reconnect behavior you've been experiencing.

bun add bunqueue@2.6.113

Let us know if the issue persists — happy to investigate further if needed.

1 reply

arthurvanl Apr 3, 2026
Author

@egeominotti
Yes this resolved the issue! Thank you a lot for resolving!

egeominotti · 2026-04-03T16:57:46Z

egeominotti
Apr 3, 2026
Maintainer

This was a really complex bug to track down — 7 different code paths that could re-queue a cron job, and the batch ACK gap that kept it alive. But it's finally working! 🎉

Thanks for your patience and all the detailed logs throughout the investigation @arthurvanl, they were essential to narrowing this down. Closing this as resolved.

0 replies

Queue#upsertScheduler starts right away.... #75

Uh oh!

arthurvanl Apr 1, 2026

Fix released in v2.6.113

Bug 1: Lock expiration re-queued cron jobs

Replies: 52 comments · 3 replies

Uh oh!

arthurvanl Apr 1, 2026 Author

Uh oh!

Uh oh!

egeominotti Apr 1, 2026 Maintainer

Uh oh!

arthurvanl Apr 1, 2026 Author

Uh oh!

Uh oh!

arthurvanl Apr 1, 2026 Author

Uh oh!

egeominotti Apr 1, 2026 Maintainer

Uh oh!

arthurvanl Apr 1, 2026 Author

Uh oh!

egeominotti Apr 1, 2026 Maintainer

Uh oh!

arthurvanl Apr 1, 2026 Author

Uh oh!

egeominotti Apr 1, 2026 Maintainer

Uh oh!

egeominotti Apr 1, 2026 Maintainer

Uh oh!

arthurvanl Apr 1, 2026 Author

Uh oh!

Uh oh!

arthurvanl Apr 1, 2026 Author

Uh oh!

arthurvanl Apr 1, 2026 Author

Uh oh!

egeominotti Apr 1, 2026 Maintainer

Uh oh!

Uh oh!

egeominotti Apr 1, 2026 Maintainer

Uh oh!

arthurvanl Apr 2, 2026 Author

Uh oh!

egeominotti Apr 2, 2026 Maintainer

v2.6.106 — Found the 6th (and final) code path

Root cause

Fix

Summary of all 6 code paths fixed (v2.6.99–v2.6.106):

Uh oh!

Uh oh!

arthurvanl Apr 2, 2026 Author

Uh oh!

arthurvanl Apr 2, 2026 Author

Uh oh!

arthurvanl Apr 2, 2026 Author

Uh oh!

egeominotti Apr 2, 2026 Maintainer

Uh oh!

egeominotti Apr 2, 2026 Maintainer

Uh oh!

egeominotti Apr 2, 2026 Maintainer

Uh oh!

egeominotti Apr 2, 2026 Maintainer

Uh oh!

arthurvanl Apr 3, 2026 Author

Uh oh!

Uh oh!

arthurvanl Apr 3, 2026 Author

Uh oh!

arthurvanl Apr 3, 2026 Author

Uh oh!

arthurvanl
Apr 1, 2026

Replies: 52 comments 3 replies

arthurvanl
Apr 1, 2026
Author

egeominotti
Apr 1, 2026
Maintainer

arthurvanl
Apr 1, 2026
Author

arthurvanl
Apr 1, 2026
Author

egeominotti
Apr 1, 2026
Maintainer

arthurvanl
Apr 1, 2026
Author

egeominotti
Apr 1, 2026
Maintainer

arthurvanl
Apr 1, 2026
Author

egeominotti
Apr 1, 2026
Maintainer

egeominotti
Apr 1, 2026
Maintainer

arthurvanl
Apr 1, 2026
Author

arthurvanl
Apr 1, 2026
Author

arthurvanl
Apr 1, 2026
Author

egeominotti
Apr 1, 2026
Maintainer

egeominotti
Apr 1, 2026
Maintainer

arthurvanl
Apr 2, 2026
Author

egeominotti
Apr 2, 2026
Maintainer

arthurvanl
Apr 2, 2026
Author

arthurvanl
Apr 2, 2026
Author

arthurvanl
Apr 2, 2026
Author

egeominotti
Apr 2, 2026
Maintainer

egeominotti
Apr 2, 2026
Maintainer

egeominotti
Apr 2, 2026
Maintainer

egeominotti
Apr 2, 2026
Maintainer

arthurvanl
Apr 3, 2026
Author

arthurvanl
Apr 3, 2026
Author

arthurvanl
Apr 3, 2026
Author