Queue#upsertScheduler starts right away.... #75
-
Originally posted by @arthurvanl in #72 I still not working like at all:
@egeominotti this is going on since the first time I started using this.
|
Beta Was this translation helpful? Give feedback.
Replies: 52 comments 3 replies
-
|
It also doesn't open the workers....
It also didnt add all workers:
My setup:
Im starting to dislike the tool because of this. Please fix this |
Beta Was this translation helpful? Give feedback.
-
|
This is an open source project. If you'd like to help with development, nothing is taken for granted, as it's free. I hope you understand. You're asking for features that weren't on bunqueue, it's a complex project if something doesn't work and you don't like it, do as you wish. If you want to help you have to paste all the code no screenshots, from now on I will close every issue you open with images. |
Beta Was this translation helpful? Give feedback.
-
I understand that. But it's kinda annoying if you get told that something is fixed and after testing each time it has not been changed. Just a big frustration for me. |
Beta Was this translation helpful? Give feedback.
-
That seems fair. I understand |
Beta Was this translation helpful? Give feedback.
-
|
If you want to help the project, please paste the actual code examples because you are leading me astray with your incomplete and partial examples. |
Beta Was this translation helpful? Give feedback.
-
|
I thought the screenshots were enough proof. But next time ill try to make a reproduction. |
Beta Was this translation helpful? Give feedback.
-
|
If you reproduce the problem for me with real code, we can solve it very easily so it is really impossible to understand well. |
Beta Was this translation helpful? Give feedback.
-
|
Okay ill do that in a sec! |
Beta Was this translation helpful? Give feedback.
-
|
@arthurvanl I thank you for the bugs you find, but you must help me reproduce them faithfully otherwise I cannot fully understand every detail. This step is essential. |
Beta Was this translation helpful? Give feedback.
-
|
Fixed in the next release (cron scheduling part). Root cause: Fix: Changed the default of This also fixes the case where a job runs longer than the cron interval and the app restarts — the missed execution is skipped and the cron resumes on its normal schedule. Regarding the workers closing immediately and not all workers appearing in
A minimal reproduction script would be ideal. Thanks for your patience. |
Beta Was this translation helpful? Give feedback.
-
|
Okay I got a reproduction now @egeominotti |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Is this a good reproduction? |
Beta Was this translation helpful? Give feedback.
-
|
@arthurvanl yes, thx |
Beta Was this translation helpful? Give feedback.
-
|
@arthurvanl Fixed in v2.6.98 — published to npm. Root cause: When a cron interval is shorter than job processing time (e.g., Fix: Added No code changes needed on your side — bun add bunqueue@2.6.98If you ever need multiple cron jobs to run in parallel (intentional overlap), you can opt out with: await queue.upsertJobScheduler('my-cron', {
pattern: '* * * * *',
preventOverlap: false,
});@arthurvanl can you test with 2.6.98 and let us know if it fixes the issue? Thanks for reporting this. |
Beta Was this translation helpful? Give feedback.
-
@egeominotti do you suggest I do my setup differently? |
Beta Was this translation helpful? Give feedback.
-
v2.6.106 — Found the 6th (and final) code path@arthurvanl Your timestamps confirmed the exact bug: "Job started at" comes BEFORE "After upsert". This means the job fires during the upsert call itself, not from disconnect or stall detection. Root causeBetween Run 1 disconnect and Run 2 reconnect, a cron tick fires at the next minute boundary. The stale worker (still within the 30s heartbeat timeout) makes FixOn Summary of all 6 code paths fixed (v2.6.99–v2.6.106):
Please update to 2.6.106 and test again. Your If it still doesn't work, please provide:
This will help us pinpoint if there's a 7th code path we haven't found yet. |
Beta Was this translation helpful? Give feedback.
-
|
@egeominotti I can test for another hour and 30 minutes before I need to leave. |
Beta Was this translation helpful? Give feedback.
-
|
I didnt saw your comment about DEBUG= |
Beta Was this translation helpful? Give feedback.
-
|
I added DEBUG= but didnt show any console messages from TCP THis is the server os & name: The server was never restarted when testing. Only client is restarted allowing me to do a second run. I could not give you the output of the |
Beta Was this translation helpful? Give feedback.
-
|
@arthurvanl This is the key issue:
All the fixes (v2.6.99–v2.6.106) are server-side code. Updating the npm package doesn't update the running process — you need to restart the server for the fixes to take effect. Please:
The fix won't work until the server process is restarted with the new code. |
Beta Was this translation helpful? Give feedback.
-
|
@arthurvanl Important: you MUST restart the server process on your VPS. Simply running Steps:
I've also tested this on a remote Linux server myself and it works correctly — the cron job waits for the next scheduled minute after reconnect, no immediate fire. Let me know once you've restarted the server and tested again. |
Beta Was this translation helpful? Give feedback.
-
|
@arthurvanl I just tested this on a remote Linux server (Debian, same as yours) with v2.6.106 — server and client on the same machine: Run 1 (clean start): Run 2 (after disconnect, server kept running): The fix works. The key is that your server process must be restarted after updating the package. Please restart the bunqueue server on your VPS and test again. |
Beta Was this translation helpful? Give feedback.
-
|
Test environment details:
Both runs correctly waited for the next scheduled minute boundary. No immediate fire on reconnect. |
Beta Was this translation helpful? Give feedback.
-
|
@egeominotti
|
Beta Was this translation helpful? Give feedback.
-
|
Maybe its because of how the server is setup. I updated package on the server by the following command: Server bq.service file: Server Logs from testing: Waiting: 0 My client logs (client lives now on the vps aswell, because you tested that): My client test file: import { sleep } from "bun";
import { closeSharedTcpClient, Queue, shutdownManager, Worker } from "bunqueue/client";
const BQ_TOKEN = '3dbd158a-4f37-4f00-be7a-3ec2de318ebb',
BQ_HOST = '148.251.119.252'
//* ONLY USING SERVER MODE (AKA TCP MODE)
const EMBEDDED_MODE = false as boolean;
/**
* Connection options for bunqueue (only used in production)
*/
const BQ_CONNECTION_CONFIG = !EMBEDDED_MODE ?
{
host: BQ_HOST,
token: BQ_TOKEN
} : undefined
const DEFAULT_JOB_OPTS = {
removeOnComplete: true,
removeOnFail: true
}
export const DEFAULT_QUEUE_OPTS = {
connection: BQ_CONNECTION_CONFIG,
embedded: EMBEDDED_MODE,
defaultJobOptions: DEFAULT_JOB_OPTS
}
const disableQueueFeatures = async (...queues: Queue[]) => {
for(let i=0; i<queues.length; i++) {
queues[i].setStallConfig({enabled: false});
queues[i].setDlqConfig({ autoRetry: false, maxAutoRetries: 0 });
}
}
const worker = new Worker('testing', async (job) => {
console.log('Job started at:', new Date().toISOString());
await sleep((1000 * 60) + 10_000);
console.log('done with job')
await job.moveToCompleted('')
}, DEFAULT_QUEUE_OPTS);
worker.run();
const queue = new Queue('testing', DEFAULT_QUEUE_OPTS)
disableQueueFeatures(queue);
console.log('Before upsert:', new Date().toISOString());
await queue.upsertJobScheduler('start-new-test-job-1', {
pattern: '* * * * *',
timezone: 'Europe/Amsterdam',
preventOverlap: true,
skipIfNoWorker: true
});
console.log('After upsert:', new Date().toISOString());
const gracefulShutdown = async (signal: string) => {
if(process.env.SHUTTING_DOWN === '1') {console.log('Triggered twice somehow');return }
process.env.SHUTTING_DOWN = '1'
console.log(`Received ${signal}. Starting graceful shutdown...`);
//todo: remove comments and replace it with actual queues
//* stop accepting new jobs
console.log('Pausing all workers...');
// orderWorker.pause();
worker.pause();
//* sandboxed workers will only stop if active jobs are done
// no sandboxed workers yet
console.log('Closing queue connections...');
queue.close();
await worker.close();
//* shutdown the embedded manager (flushes SQLite)
shutdownManager();
// TCP mode: close the shared TCP connection pool
if(!EMBEDDED_MODE) {
closeSharedTcpClient()
}
}
process.on('SIGINT', () => gracefulShutdown('SIGINT'));
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'))
process.on('unhandledRejection', (err) => {
console.log(err)
gracefulShutdown('unhandledRejection')
})
process.on('uncaughtException', (err) => {
console.log(err)
gracefulShutdown('uncaughtException')
})Procedure: Run 1 → timeout/kill client (simulating SIGINT) → Run 2 immediately after, server was restarted before testing |
Beta Was this translation helpful? Give feedback.
-
|
@egeominotti I have included my credentials so you can test with my bunqueue tcp server. |
Beta Was this translation helpful? Give feedback.
-
|
@arthurvanl I tested your exact scenario on v2.6.111 with server restart, long-running job (70s), kill during execution, and reconnect — the cron correctly waits for the next minute boundary every time. I cannot reproduce the bug. I suspect the issue is that your server is not actually running the latest version. Here's why: The problem with
|
Beta Was this translation helpful? Give feedback.
-
Fix released in v2.6.113After a deep investigation, we identified the root cause — two interacting bugs in code paths that previous fixes had not covered: Bug 1: Lock expiration re-queued cron jobs
Fix: Cron jobs (identified by Bug 2: Batch ACK skipped stall recovery
Fix: Both batch ACK methods now call Why previous fixes didn't resolve itVersions v2.6.99–v2.6.112 correctly fixed 6 other code paths (stall detection, client tracking, background recovery, TCP disconnect, heartbeat filtering, upsert cleanup), but lock expiration was the 7th path that could re-queue a cron job — and the batch ACK gap meant even stall-retried jobs weren't cleaned up. @arthurvanl Could you please update to v2.6.113 and test? This should resolve the immediate-fire-on-reconnect behavior you've been experiencing. bun add bunqueue@2.6.113Let us know if the issue persists — happy to investigate further if needed. |
Beta Was this translation helpful? Give feedback.
-
|
This was a really complex bug to track down — 7 different code paths that could re-queue a cron job, and the batch ACK gap that kept it alive. But it's finally working! 🎉 Thanks for your patience and all the detailed logs throughout the investigation @arthurvanl, they were essential to narrowing this down. Closing this as resolved. |
Beta Was this translation helpful? Give feedback.





Fix released in v2.6.113
After a deep investigation, we identified the root cause — two interacting bugs in code paths that previous fixes had not covered:
Bug 1: Lock expiration re-queued cron jobs
processExpiredLockInnerinlockManager.tstreated cron jobs withpreventOverlapthe same as regular jobs. During graceful shutdown,_doClose()stops heartbeats immediately, so if the job takes longer than the lock TTL (30s default), the lock expires and the cron job gets re-queued — sitting in the waiting state for the next worker to pull.Fix: Cron jobs (identified by
uniqueKey.startsWith("cron:")) are now discarded on lock expiration instead of re-queued. The cron scheduler will re-create th…