Skip to content

feat: add cron workflow scheduling with overlap protection and schedule context#7

Open
marcelom97 wants to merge 3 commits into
SokratisVidros:mainfrom
marcelom97:feat/cron-workflows
Open

feat: add cron workflow scheduling with overlap protection and schedule context#7
marcelom97 wants to merge 3 commits into
SokratisVidros:mainfrom
marcelom97:feat/cron-workflows

Conversation

@marcelom97
Copy link
Copy Markdown
Contributor

Summary

  • Cron-based workflow scheduling via cron option (expression + optional timezone)
  • schedule context on cron runs with timestamp, lastTimestamp, timezone — enables incremental sync without manual state tracking
  • getRuns() supports triggerSource filter ('api' | 'cron')
  • Overlap protection via pg-boss schedule()

Usage

const sync = workflow('sync-data', async ({ step, schedule }) => {
  const since = schedule?.lastTimestamp ?? new Date(0);
  const data = await step.run('fetch', async () => fetchSince(since));
  await step.run('write', async () => writeToDB(data));
}, {
  cron: { expression: '*/15 * * * *', timezone: 'UTC' },
});

Test plan

  • 14 new tests covering cron scheduling, schedule context, timezone defaults, trigger source filtering, and post-start registration

@SokratisVidros
Copy link
Copy Markdown
Owner

@marcelom97 thanks for submitting this. Can you please elaborate on the incremental sync?

Other than that, I need to think about the DX a bit more. I will start by adding inline comments.

Comment thread src/db/queries.ts Outdated
max_retries: number;
job_id: string | null;
trigger_source: 'api' | 'cron';
schedule_context: string | { timestamp: string; lastTimestamp?: string; timezone: string } | null;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a cron: string column and a timezone, both being optional to avoid a nesting object?

Comment thread src/db/queries.ts Outdated
return mapRowToWorkflowRun(run);
}

export async function getLastCronCompletedAt(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a specific getLastCronCompletedAt method, I'd suggest introducing a getWorkflowLastRun that works for all workflows. This feels more generic and aligns with the naming of the rest of the methods.

Comment thread src/engine.ts Outdated
};
}

private async setupCronSchedule(wf: InternalWorkflowDefinition): Promise<void> {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private async setupCronSchedule(wf: InternalWorkflowDefinition): Promise<void> {
private async scheduleCronWorkflow(wf: InternalWorkflowDefinition): Promise<void> {

Comment thread src/engine.ts Outdated

let run = await this.getRun({ runId, resourceId });

const schedule: ScheduleContext | undefined = run.scheduleContext
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to my previous comment about modeling scheduleContext.

Comment thread src/engine.ts Outdated
const schedule: ScheduleContext | undefined = run.scheduleContext
? {
timestamp: run.scheduleContext.timestamp,
lastTimestamp: run.scheduleContext.lastTimestamp ?? undefined,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following on the modeling, lastTimestamp comes from the workflow_runs table. We can replace it with the timestamp of the latest record.

Comment thread src/engine.ts Outdated
limit?: number;
statuses?: WorkflowStatus[];
workflowId?: string;
triggerSource?: 'api' | 'cron';
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need triggerSource? If not let's remove it.

Comment thread AGENTS.md
| `WORKFLOW_RUN_WORKERS` | Number of worker processes | `3` |
| `WORKFLOW_RUN_EXPIRE_IN_SECONDS` | Job expiration time in seconds | `300` |

## Cron Workflows
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create the following DX:

  1. The cron expression can be a valid cron string or a human friendly cron string such as https://github.com/rainder/human-to-cron. Note, this library is quite old. Let's see if there is a modern one.

  2. Cron can be either a string or an object of the expression and the timezone. If the timezone is not specified, we assume UTC or the current timezone of the running container.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this — human-to-cron has critical bugs (e.g. "every hour" produces * */1 * * * which fires every minute, and unrecognized input silently becomes * * * * *). I couldn't find a modern,
well-maintained alternative either.

I'd suggest we defer this and stick with standard cron expressions for now. We can revisit if a reliable library comes along, or build a small parser ourselves for a limited set of human-friendly strings (e.g.
"every 5 minutes", "daily at 9am").

@SokratisVidros
Copy link
Copy Markdown
Owner

@marcelom97 Any updates on this?

@marcelom97
Copy link
Copy Markdown
Contributor Author

Hey @SokratisVidros, thanks for the review and for your suggestions.

I will take care of this over the weekend.

@marcelom97
Copy link
Copy Markdown
Contributor Author

@SokratisVidros incremental sync is about only processing data that changed since the last cron run, instead of re-processing everything each time.

The schedule context gives cron workflows a lastTimestamp (derived from the previous completed run's completedAt), so the workflow can use it as a cursor:

const sync = workflow('sync-data', async ({ step, schedule }) => {
const since = schedule?.lastTimestamp ?? new Date(0);
const data = await step.run('fetch', async () => fetchSince(since));
await step.run('write', async () => writeToDB(data));
}, { cron: '*/15 * * * *' });

Without this, each cron run would need to either re-process all data or manually track its own high-water mark somewhere. lastTimestamp makes that built-in.

If you think it's not useful we can remove it, but I find it quite useful to have it.

@SokratisVidros
Copy link
Copy Markdown
Owner

@marcelom97 Thanks for the updates. I will review them by the end of this week and get back to you.

@marcelom97 marcelom97 force-pushed the feat/cron-workflows branch from f158d41 to 913e073 Compare March 14, 2026 17:15
@marcelom97
Copy link
Copy Markdown
Contributor Author

@SokratisVidros any news for this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants