Flowland is a Go-based distributed durable job system. Its first version is designed around reliable job execution, explicit queues, worker leases, retries, scheduling, and strongly consistent storage through a Raft-backed StorageService.
Flowland is not a full workflow engine in the first version. The architecture intentionally leaves room for future workflow support, where workflows can be represented as jobs connected by dependencies, but DAG scheduling and Temporal-style workflow replay are out of scope for now.
Flowland is under active design and implementation. The repository currently contains service skeletons, protobuf definitions, Cobra commands, client packages, and StorageService Raft-oriented tests. The accepted first-version architecture is documented in ADR 0001.
flowchart TB
users["Users / Admins / CI"]
workers["Workers"]
lb["NGINX / Envoy / Cloud LB<br/>TLS termination and routing"]
subgraph brokers["Stateless broker tier"]
broker1["Broker 1<br/>HTTP admin/job API<br/>Worker gRPC API<br/>Scheduler loop"]
broker2["Broker N<br/>HTTP admin/job API<br/>Worker gRPC API<br/>Scheduler loop"]
end
subgraph storage["StorageService cluster"]
ss["StorageService gRPC<br/>Explicit job-domain RPCs"]
raft["Babuza / Raft log<br/>Replicated command ordering"]
sm["StateMachine<br/>Memory state + core indexes"]
snap["Snapshots<br/>Canonical state + indexes"]
end
users -->|"HTTP / JSON"| lb
workers -->|"gRPC claim / heartbeat / complete / fail"| lb
lb --> broker1
lb --> broker2
broker1 -->|"StorageService gRPC commands"| ss
broker2 -->|"StorageService gRPC commands"| ss
broker1 -.->|"Acquire / renew scheduler lease"| ss
broker2 -.->|"Acquire / renew scheduler lease"| ss
ss --> raft
raft --> sm
sm --> snap
snap --> sm
The broker is stateless. It exposes public APIs, authenticates and authorizes callers, validates requests, generates job IDs and lease tokens, and calls StorageService. Durable job state is owned by StorageService.
StorageService exposes explicit job-domain RPCs instead of a generic key/value API. Internally, those RPCs are converted into deterministic StateMachine commands and replicated through Babuza/Raft.
The first-version StateMachine uses in-memory current state plus indexes, persisted through Babuza snapshots. This keeps the transition model simple while allowing deterministic recovery from Raft logs and snapshots.
Flowland's first version focuses on:
- Durable job creation, lookup, cancellation, and retention.
- Explicit namespaces and queues.
- Queue-level scheduling configuration and pause/resume.
- Pull-based worker execution.
- At-least-once execution with lease-token validation.
- Heartbeats and lease renewal.
- Retry policy with fixed or exponential backoff.
- Attempt timeout and lease expiration.
- Inline payloads and results with queue-level size limits.
- Scheduler leadership through a StorageService-backed lease and epoch fencing.
- Required idempotency keys for all mutating commands.
The first version intentionally excludes:
- Full workflow engine semantics.
- DAG scheduling.
- Temporal-style workflow replay.
- Broker-owned durable job state.
- A custom load-balancing proxy.
- Event sourcing as the StateMachine persistence model.
- Embedded KV or SQLite as the first StateMachine local engine.
Workers claim jobs from a broker. A successful claim returns a lease token. Worker mutations such as heartbeat, complete, and fail must include the current lease token, and StorageService rejects stale tokens.
Execution is at least once. Workers must make job handlers idempotent at the application level.
Ready jobs are ordered deterministically by:
run_after ASC, priority DESC, create_sequence ASC, job_id ASC
Higher priority numbers run first. The default priority is 0.
Every mutating StorageService command requires a command_id.
For external API calls, callers provide a stable client-visible idempotency key. The broker forwards or deterministically derives the StorageService command_id from that key, so retries of the same logical request reuse the same command ID.
StorageService caches command results for 24 hours using:
namespace + command_id
The cache stores a request fingerprint. Reusing the same command ID with a different request payload returns COMMAND_ID_CONFLICT.
Global scheduler commands use the reserved system namespace __flowland_system__ for command ID caching. Users cannot create or access that namespace.
The broker is the first-version authentication and authorization boundary. External load balancers may terminate TLS and route traffic, but they are not the authorization decision point.
StorageService is an internal service and accepts only broker-to-storage traffic. The broker authorizes access by namespace and queue:
- Namespace administration requires admin permission.
- Job creation requires producer permission on the target namespace and queue.
- Worker claim, heartbeat, complete, and fail require worker permission on the target namespace and queue.
- Read APIs require read permission.
- Cross-namespace access requires explicit grants for each namespace.
main.go CLI entrypoint
cmd/ Cobra commands for storage, broker, and CLI modes
broker/ Stateless broker server and RPC handlers
storage/ Storage server, RPC handlers, and StateMachine skeleton
client/ Client APIs and pooling
pkg/pb/ Protobuf files and generated Go/gRPC code
test/ StorageService Raft-oriented integration tests
scripts/ Protobuf generation workflow
docs/adr/ Architecture Decision Records
- Go
1.24 - Go toolchain
go1.24.2 buf,protoc-gen-go, andprotoc-gen-go-grpcwhen regenerating protobuf code
Run from the repository root.
Build all packages:
go build ./...Run compile-only validation:
go test ./... -run '^$'Run all tests:
go test ./...Some integration tests are Raft-oriented and may include long sleeps. For quick validation, prefer compile-only validation unless you are intentionally exercising integration behavior.
Run the CLI entrypoint:
go run .Run service commands:
go run . storage
go run . broker
go run . cliRegenerate protobuf code:
bash scripts/genProto.sh install
bash scripts/genProto.sh updateDo not hand-edit generated files under pkg/pb/*.pb.go or pkg/pb/*_grpc.pb.go. Change .proto files first, then regenerate.