Skip to content

feat: Add workspace locking using lease metadata inside workspace to serialize write operations#817

Open
knutties wants to merge 1 commit intomainfrom
claude/add-workspace-locking-middleware-l9R1k
Open

feat: Add workspace locking using lease metadata inside workspace to serialize write operations#817
knutties wants to merge 1 commit intomainfrom
claude/add-workspace-locking-middleware-l9R1k

Conversation

@knutties
Copy link
Copy Markdown
Collaborator

@knutties knutties commented Jan 4, 2026

Problem

Currently, changes within workspace are not serialized leading to potential race conditions in version_state generation. We need to serialize them.

Solution

Serialize all write operations per workspace using a lease-based lock stored in the workspaces table itself.

When a write request comes in (POST, PUT, PATCH, DELETE), the WorkspaceWritePermit extractor:

  1. Checks out a DB connection
  2. Tries to UPDATE workspaces SET lock_columns WHERE lock is NULL or expired
  3. If updated_rows == 1 → we got the lock, proceed with the write
  4. If updated_rows == 0 → someone else holds the lock, return 409 Conflict with lock details
  5. When the request completes, the permit is dropped and the lock columns are set back to NULL

Read operations (GET) are not locked — they proceed as before.

Environment variable changes

Variable Default Description
WORKSPACE_LOCK_DEFAULT_TTL_MS 60000 (60s) Lock TTL for normal write operations
WORKSPACE_LOCK_BATCH_TTL_MS 1200000 (20min) Lock TTL for long-running batch operations

Pre-deployment activity

Run the migration: 2026-04-29-000001_workspace_table_lease_lock 

Post-deployment activity

NA

API changes

All workspace write endpoints now return 409 Conflict if the workspace is already locked by another write:

{
  "message": "Workspace is busy with another write request; retry later",
  "lock": {
    "lock_id": "uuid",
    "operation": "PUT /default-config/my-key",
    "locked_by": "user@example.com",
    "acquired_at": "2026-04-30T10:00:00Z",
    "expires_at": "2026-04-30T10:01:00Z"
  }
}

The GET /workspaces/{name} response now includes an optional workspace_lock field showing the current lock state (omitted when no lock is active).

FAQs

Why lock_id?

lock_id prevents a stale owner from clearing someone else'''s lock. Without it:

  1. Request A acquires lock (lock_id = X)
  2. Request A takes too long, lock expires
  3. Request B acquires lock (lock_id = Y) on the same workspace
  4. Request A finishes and releases — it would clear B'''s lock

The release_workspace_lease function filters on workspace_lock_id = lease.lock_id, so Request A'''s release would match 0 rows and not interfere with B'''s lock.

How is the lock acquisition atomic?

The entire check-and-set happens in a single SQL UPDATE statement:

UPDATE workspaces
SET workspace_lock_id = $1, workspace_lock_operation = $2, ...
WHERE organisation_id = $3
  AND workspace_name = $4
  AND (workspace_lock_expires_at IS NULL OR workspace_lock_expires_at <= now())

If the lock is active (not expired), updated_rows will be 0 and we return LockContended. If the lock is absent or expired, updated_rows will be 1 and we own the lock. There is no SELECT-then-UPDATE race condition because PostgreSQL evaluates the WHERE clause and applies the SET atomically within a single statement.

Why have WorkspaceLock as a separate struct in the API?

The 5 lock columns (workspace_lock_id, workspace_lock_operation, etc.) are internal DB details. WorkspaceLock groups them into a clean API response type that:

  • Returns None when no lock is active (instead of 5 nullable fields)
  • Filters out expired locks (from_workspace returns None if expires_at <= now())
  • Is reused in the 409 Conflict error body so clients get meaningful feedback

@knutties knutties requested a review from a team as a code owner January 4, 2026 18:03
@semanticdiff-com
Copy link
Copy Markdown

semanticdiff-com Bot commented Jan 4, 2026

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 4, 2026

Walkthrough

A new workspace locking middleware using PostgreSQL advisory locks is introduced to serialize write operations on a per-workspace basis. The middleware is exported as a public module and integrated across multiple API route scopes in the main application to enforce workspace-level concurrency control.

Changes

Cohort / File(s) Summary
Workspace lock middleware implementation
crates/service_utils/src/middlewares.rs, crates/service_utils/src/middlewares/workspace_lock.rs
New middleware for workspace-level concurrency control using PostgreSQL advisory locks. Exports WorkspaceLockMiddlewareFactory and WorkspaceLockMiddleware<S> to intercept write operations (POST, PUT, DELETE, PATCH), compute lock keys from org_id and workspace_id, and acquire/release two-argument advisory locks with exponential backoff retry logic (up to 10 attempts). Includes helper functions compute_lock_keys(), acquire_advisory_lock(), and release_advisory_lock(). Bypasses locking for read operations. Includes unit tests for lock key computation. Returns 500 on lock failures or missing app state.
Route integration
crates/superposition/src/main.rs
Integrates WorkspaceLockMiddlewareFactory as an additional middleware wrap on 12 route scopes: /context, /dimension, /default-config, /config, /audit, /function, /types, /experiments, /experiment-groups, /webhook, /variables, /resolve, and /auth. Middleware is applied after existing OrgWorkspaceMiddlewareFactory to enforce locking on workspace-specific write requests.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Middleware as WorkspaceLock<br/>Middleware
    participant DBPool as DB Pool
    participant PG as PostgreSQL
    participant Service as Inner Service

    Client->>Middleware: Request (POST/PUT/DELETE/PATCH)
    activate Middleware
    
    Note over Middleware: Extract org_id &<br/>workspace_id
    Middleware->>Middleware: compute_lock_keys()<br/>(org_key, workspace_key)
    
    Middleware->>DBPool: Get PgConnection
    activate DBPool
    DBPool-->>Middleware: Connection
    deactivate DBPool
    
    rect rgb(200, 220, 255)
    Note over Middleware,PG: Retry loop (up to 10 attempts)
    Middleware->>PG: pg_try_advisory_lock<br/>(org_key, workspace_key)
    alt Lock acquired
        PG-->>Middleware: Success
        Middleware->>Service: Forward request
        activate Service
        Service-->>Middleware: Response
        deactivate Service
        Middleware->>PG: pg_advisory_unlock<br/>(org_key, workspace_key)
        PG-->>Middleware: Released
    else Lock unavailable
        Note over Middleware: Exponential backoff retry
        PG-->>Middleware: Failed
    end
    end
    
    Middleware-->>Client: ServiceResponse
    deactivate Middleware
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop along with workspace locks so tight,
PostgreSQL advisory keeps writes in sight,
No races or chaos, just smooth coordination,
Each workspace's dance, precise serialization!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately describes the main change: adding a workspace locking middleware using PostgreSQL advisory locks to serialize write operations.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/add-workspace-locking-middleware-l9R1k

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
crates/superposition/src/main.rs (2)

173-175: Same middleware ordering issue applies here.

All these route scopes have the same incorrect middleware order. After fixing /context, apply the same fix consistently across /dimension, /default-config, /config, /audit, /function, and /types.

Also applies to: 180-182, 187-189, 194-196, 201-203, 208-210


215-217: Same middleware ordering issue applies here.

Apply the same middleware order fix to /experiments, /experiment-groups, /webhook, /variables, /resolve, and /auth.

Also applies to: 221-223, 237-239, 244-246, 251-253, 258-260

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7a0eda and 12ebef8.

📒 Files selected for processing (3)
  • crates/service_utils/src/middlewares.rs
  • crates/service_utils/src/middlewares/workspace_lock.rs
  • crates/superposition/src/main.rs
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-01-03T13:27:14.072Z
Learnt from: ayushjain17
Repo: juspay/superposition PR: 816
File: crates/frontend/src/pages/type_template.rs:82-87
Timestamp: 2026-01-03T13:27:14.072Z
Learning: In the frontend crate, both `Workspace` and `OrganisationId` types implement `Deref` trait (via `#[derive(Deref)]`), allowing automatic coercion from `&Workspace` to `&str` and `&OrganisationId` to `&str`. When passing these types to functions expecting `&str`, use `&workspace` or `&org` directly instead of `&workspace.0` or `&org.0`.

Applied to files:

  • crates/superposition/src/main.rs
📚 Learning: 2026-01-03T13:25:40.584Z
Learnt from: ayushjain17
Repo: juspay/superposition PR: 816
File: crates/frontend/src/pages/webhook.rs:136-137
Timestamp: 2026-01-03T13:25:40.584Z
Learning: In the superposition codebase (Rust frontend), the `Workspace` and `OrganisationId` newtype wrappers implement `Deref`, which allows `&Workspace` and `&OrganisationId` to be automatically coerced to `&str` when passed to functions expecting `&str` parameters. Manual `.0` dereferencing is not needed.

Applied to files:

  • crates/superposition/src/main.rs
📚 Learning: 2026-01-02T20:59:01.233Z
Learnt from: ayushjain17
Repo: juspay/superposition PR: 543
File: crates/service_utils/src/middlewares/auth_z.rs:141-152
Timestamp: 2026-01-02T20:59:01.233Z
Learning: In crates/service_utils/src/middlewares/auth_z.rs, the AuthZHandler::init function is intentionally designed to panic on startup if AUTH_Z_PROVIDER environment variable is missing or set to an unknown value. This fail-fast behavior is expected and preferred for this critical authorization configuration.

Applied to files:

  • crates/superposition/src/main.rs
🧬 Code graph analysis (1)
crates/superposition/src/main.rs (2)
crates/service_utils/src/middlewares/workspace_lock.rs (1)
  • new (26-28)
crates/service_utils/src/middlewares/workspace_context.rs (1)
  • new (28-33)
🔇 Additional comments (8)
crates/service_utils/src/middlewares.rs (1)

5-5: LGTM!

The new module export follows the existing pattern in this file.

crates/superposition/src/main.rs (1)

33-33: LGTM!

Import follows the existing import pattern for middleware factories.

crates/service_utils/src/middlewares/workspace_lock.rs (6)

1-54: LGTM!

The factory and Transform implementation follow standard actix-web middleware patterns correctly.


72-86: LGTM!

Write operation detection and early return for reads is well structured.


161-173: Hash stability note: DefaultHasher is not guaranteed stable across Rust versions.

For advisory locks within a single running process, this is fine. However, be aware that DefaultHasher can change between Rust versions. If you ever need cross-process or persistent lock key consistency (e.g., multiple service instances coordinating), consider a stable hash like xxhash or fnv.


230-235: LGTM!

The helper struct correctly deserializes the pg_try_advisory_lock result.


237-248: LGTM!

The release function is straightforward. Ignoring the return value of pg_advisory_unlock is acceptable since we're just ensuring cleanup.


250-292: LGTM!

Good unit test coverage for compute_lock_keys. The uniqueness assertions have a theoretical (but negligible) chance of hash collision. Consider adding integration tests for the actual advisory lock acquire/release behavior if a test database is available.

Comment thread crates/service_utils/src/middlewares/workspace_lock.rs Outdated
Comment thread crates/service_utils/src/middlewares/workspace_lock.rs Outdated
Comment thread crates/superposition/src/main.rs Outdated
Comment on lines 166 to 168
.wrap(OrgWorkspaceMiddlewareFactory::new(true, true))
.wrap(WorkspaceLockMiddlewareFactory::new())
.service(context::endpoints()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Middleware order causes locking to be bypassed.

In actix-web, the last .wrap() runs first on incoming requests. Currently WorkspaceLockMiddlewareFactory is wrapped after OrgWorkspaceMiddlewareFactory, meaning the lock middleware runs before the org/workspace IDs are extracted and set in request extensions. The lock middleware finds no IDs and proceeds without acquiring a lock.

Swap the middleware order so OrgWorkspaceMiddlewareFactory runs first:

🔎 Proposed fix
 .service(
     scope("/context")
         .app_data(Resource::Context)
-        .wrap(OrgWorkspaceMiddlewareFactory::new(true, true))
-        .wrap(WorkspaceLockMiddlewareFactory::new())
+        .wrap(WorkspaceLockMiddlewareFactory::new())
+        .wrap(OrgWorkspaceMiddlewareFactory::new(true, true))
         .service(context::endpoints()),
 )

Apply the same fix to all other route scopes using both middlewares.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.wrap(OrgWorkspaceMiddlewareFactory::new(true, true))
.wrap(WorkspaceLockMiddlewareFactory::new())
.service(context::endpoints()),
.wrap(WorkspaceLockMiddlewareFactory::new())
.wrap(OrgWorkspaceMiddlewareFactory::new(true, true))
.service(context::endpoints()),
🤖 Prompt for AI Agents
In crates/superposition/src/main.rs around lines 166 to 168, the middleware
order is wrong: WorkspaceLockMiddlewareFactory is wrapped after
OrgWorkspaceMiddlewareFactory so it runs before IDs are extracted, bypassing
locking; swap the wraps so OrgWorkspaceMiddlewareFactory is wrapped first and
WorkspaceLockMiddlewareFactory second (i.e., ensure
OrgWorkspaceMiddlewareFactory executes before the lock middleware on incoming
requests) and apply the same swap to all other route scopes that use both
middlewares so locks run after IDs have been set in request extensions.

@knutties knutties force-pushed the claude/add-workspace-locking-middleware-l9R1k branch from 0c41930 to 46764cd Compare January 7, 2026 07:22
@knutties knutties force-pushed the claude/add-workspace-locking-middleware-l9R1k branch 4 times, most recently from 7f03c92 to d80a71f Compare January 28, 2026 16:27
@knutties knutties force-pushed the claude/add-workspace-locking-middleware-l9R1k branch from d80a71f to 1de2e62 Compare February 4, 2026 02:05
@knutties knutties force-pushed the claude/add-workspace-locking-middleware-l9R1k branch from 1de2e62 to ae3ed17 Compare February 14, 2026 11:14
Comment on lines +77 to +80
let is_write_operation = matches!(
req.method(),
&Method::POST | &Method::PUT | &Method::DELETE | &Method::PATCH
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remember completely, but we have read operations which are POST.
@ayushjain17 @Datron

"acquired advisory lock for workspace (org_key: {}, workspace_key: {})",
org_key, workspace_key
);
Some(AdvisoryLockGuard::new(&mut db_conn, org_key, workspace_key))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard can be returned by the acquire fn

"lock contention detected, retrying in {}ms (attempt {}/{}, org_key: {}, workspace_key: {})",
backoff_ms, attempt + 1, MAX_RETRIES, org_key, workspace_key
);
actix_web::rt::time::sleep(std::time::Duration::from_millis(backoff_ms)).await;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Conditional Variables to awake the sleeping tasks. The awake call can go in drop of LockGuard.

Copilot AI review requested due to automatic review settings April 14, 2026 13:20
@sauraww sauraww force-pushed the claude/add-workspace-locking-middleware-l9R1k branch from e21bc07 to ae2bb08 Compare April 14, 2026 13:20
@sauraww sauraww force-pushed the claude/add-workspace-locking-middleware-l9R1k branch from ae2bb08 to 099e5fa Compare April 14, 2026 13:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new Actix middleware intended to serialize workspace-scoped write operations using PostgreSQL advisory locks, and also includes broad provider API refactors/FFI binding updates plus automation and generated “AI skills” documentation artifacts.

Changes:

  • Added WorkspaceLockMiddlewareFactory (Postgres advisory-lock based) middleware module in service_utils.
  • Refactored Rust + Python provider data source interfaces to make fetch_config a default wrapper over fetch_filtered_config, and updated experimentation filtering to support a new partial_apply flag across UniFFI bindings.
  • Added skills generation tooling (Make target + GitHub Action) and committed generated .agents/skills/** content.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
superposition.skill-seekers.json Adds skill-seekers configuration for generating Superposition “skills” from docs.
makefile Adds skills-update target to generate/package skills locally.
crates/superposition_provider/src/local_provider.rs Refactors to implement fetch_filtered_config directly and filter cached config in-place.
crates/superposition_provider/src/data_source/http.rs Moves config fetching into fetch_filtered_config implementation for the data source.
crates/superposition_provider/src/data_source/file.rs Refactors file data source to implement fetch_filtered_config directly and apply filters.
crates/superposition_provider/src/data_source.rs Makes fetch_config a default method delegating to fetch_filtered_config.
crates/superposition_core/src/ffi.rs Adds partial_apply parameter to experiment filtering selection logic.
crates/service_utils/src/middlewares/workspace_lock.rs Introduces new workspace locking middleware using PG advisory locks.
crates/service_utils/src/middlewares.rs Exposes the new workspace_lock middleware module.
clients/python/provider/superposition_provider/local_provider.py Makes LocalResolutionProvider implement SuperpositionDataSource and adds async fetch methods.
clients/python/provider/superposition_provider/http_data_source.py Refactors to implement fetch_filtered_config directly (removes helper wrapper methods).
clients/python/provider/superposition_provider/file_data_source.py Refactors to implement fetch_filtered_config directly (removes helper wrapper methods).
clients/python/provider/superposition_provider/data_source.py Makes fetch_config a concrete default delegating to fetch_filtered_config.
clients/python/bindings/superposition_bindings/superposition_client.py Updates UniFFI Python bindings for new partial_apply arg + checksum changes.
clients/java/bindings/src/main/kotlin/uniffi/superposition_client/superposition_client.kt Updates UniFFI Kotlin bindings for new partialApply arg + checksum changes.
.gitignore Ignores scripts/skill_templates/ outputs generated by skills tooling.
.github/workflows/update-skills.yml Adds workflow to generate/package/upload skills and commit .agents/skills/** updates.
.agents/skills/superposition/reference/documentation/overview/setup.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/overview/intro.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/lsp-support.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/k8s-staggered-releaser.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/intro.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/format-specification.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/experimentation.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/dimensions.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/deterministic-resolution.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/creating_client.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/context-expressions.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/config-file-compatibility.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/client_experimentation.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/client_context_aware_configuration.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/cascading-model.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/cac-toml.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/other/cac-redis-module.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/features/python.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/features/overview.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/features/javascript.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/features/java.md Adds generated skill reference documentation.
.agents/skills/superposition/reference/documentation/extraction_summary.json Adds generated extraction summary metadata.
.agents/skills/superposition/reference/dependencies/statistics.json Adds generated dependency stats metadata.
.agents/skills/superposition/reference/dependencies/dependency_graph.mmd Adds generated dependency graph metadata (Mermaid).
.agents/skills/superposition/reference/dependencies/dependency_graph.json Adds generated dependency graph metadata (JSON).
.agents/skills/superposition/reference/dependencies/dependency_graph.dot Adds generated dependency graph metadata (DOT).
.agents/skills/superposition/reference/config_patterns/config_patterns.md Adds generated config-pattern extraction report.
.agents/skills/superposition/SKILL.md Adds generated top-level skill manifest and index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/service_utils/src/middlewares.rs Outdated
pub mod auth_z;
pub mod request_response_logging;
pub mod workspace_context;
pub mod workspace_lock;
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says the workspace lock middleware is registered on workspace-scoped endpoints, but there are currently no references/usages of WorkspaceLockMiddlewareFactory anywhere outside its own module (search across crates/**.rs). As-is, exporting the module won’t activate locking. The middleware needs to be wired into the Actix scopes (and ordered appropriately relative to OrgWorkspaceMiddlewareFactory).

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +124
// Acquire advisory lock if we have a lock key and create guard
let _lock_guard = if let Some(lock_key) = lock_key {
match acquire_advisory_lock(&mut db_conn, lock_key).await {
Ok(guard) => {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_lock_guard holds a mutable reference to db_conn (via AdvisoryLockGuard<'a>), and the request is awaited afterwards (srv.call(req).await). Holding a &mut borrow across an .await will fail to compile in async Rust. Consider redesigning the guard to own the pooled connection (or otherwise avoid borrowing db_conn) so it can be kept alive for the request duration without a self-referential borrow.

Copilot uses AI. Check for mistakes.
Comment on lines +243 to +249
Err(diesel::result::Error::DatabaseError(
diesel::result::DatabaseErrorKind::Unknown,
Box::new(format!(
"Failed to acquire workspace lock after {} attempts (high contention)",
MAX_RETRIES
)),
))
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructs diesel::result::Error::DatabaseError using Box::new(format!(...)), but Diesel expects a Box<dyn DatabaseErrorInformation> rather than a String. This won’t compile. Suggest returning a middleware-specific error type (or actix_web::Error) from acquire_advisory_lock and mapping it to an HTTP error response instead of trying to synthesize a Diesel DatabaseError.

Copilot uses AI. Check for mistakes.
@sauraww sauraww force-pushed the claude/add-workspace-locking-middleware-l9R1k branch 6 times, most recently from d8ce45d to ed65f38 Compare April 22, 2026 11:03
@sauraww sauraww force-pushed the claude/add-workspace-locking-middleware-l9R1k branch 7 times, most recently from ed23317 to 328f7d8 Compare April 24, 2026 10:42
@sauraww sauraww force-pushed the claude/add-workspace-locking-middleware-l9R1k branch 6 times, most recently from ba51e2f to 881f539 Compare April 30, 2026 08:40
Comment on lines +590 to +599
{
let conn = write_permit.checkout();
let _ = put_config_in_redis(
&config_version,
&state,
&workspace_context.schema_name,
conn,
)
.await;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block not needed, and taking mut ref and dropping is unnecessary.


impl WorkspaceWritePermit {
/// Returns a mutable reference to the database connection.
/// The workspace lock remains held while the connection is in use.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misleading comment

@sauraww sauraww force-pushed the claude/add-workspace-locking-middleware-l9R1k branch 4 times, most recently from 55f29ec to 7dbd29e Compare April 30, 2026 09:19
@sauraww sauraww changed the title feat: Add workspace locking middleware using PostgreSQL advisory locks feat: Add workspace locking using lease metadata inside workspace to serialize write operations Apr 30, 2026
Copy link
Copy Markdown
Collaborator

@mahatoankitkumar mahatoankitkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be in the subsequent PR lets have a workspaceReadPermit just for consistency how a db conn is being acquired throughout the whole repo.
Right now its workspaceWritePermit which has dbconnection and reads have direct dbconnection access.

Comment on lines +131 to +140
let conn = write_permit.checkout();

validate_change_reason(
&workspace_context,
&req.change_reason,
conn,
&state.master_encryption_key,
)
.await?;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why moved?
should be where it was at the starting.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would not even need to take a mutable reference to the connection if it fails early

Comment thread crates/service_utils/src/db.rs Outdated
// this internally to run queries/transactions with proper error handling and connection management
fn get_connection(db_pool: &PgSchemaConnectionPool) -> result::Result<DBConnection> {
/// Checks out a connection from the pool with statement caching disabled.
pub fn checkout_connection(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pub

}

if let Some(I64Update::Add(version)) = request.config_version {
if let Some(I64Update::Add(version)) = request.config_version.clone() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why clone?

}
}

fn checkout_db_connection(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

) -> superposition::Result<Json<WorkspaceResponse>> {
let request = request.into_inner();
let workspace_name = workspace_name.into_inner();
let DbConnection(mut conn) = db_conn;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary changes in this file

Copy link
Copy Markdown
Collaborator

@Datron Datron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge the error types and check the locking mechanism

ADD COLUMN IF NOT EXISTS key_rotated_at TIMESTAMPTZ;

ALTER TABLE superposition.workspaces
ADD COLUMN IF NOT EXISTS workspace_lock_id UUID,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use of lock ID?

Comment thread package.json Outdated
],
"devDependencies": {
"@tsconfig/node18": "^18.2.4",
"axios": "^1.11.0",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

Comment on lines +464 to +472

diesel::update(workspaces::table)
.filter(workspaces::organisation_id.eq(&org_id.0))
.filter(workspaces::workspace_name.eq(&workspace_name))
.set((
workspaces::last_modified_by.eq(&user_email),
workspaces::last_modified_at.eq(Utc::now()),
))
.execute(transaction_conn)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant. Its done above anyway?

}

if let Some(I64Update::Add(version)) = request.config_version {
if let Some(I64Update::Add(version)) = request.config_version.clone() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the clone, I don't see any other changes in this function

Comment thread crates/service_utils/Cargo.toml Outdated
diesel = { workspace = true }
diesel-adapter = { version = "1.2.0" }
fred = { workspace = true, features = ["metrics"] }
fred = { workspace = true, features = ["metrics", "i-scripts"] }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Comment thread crates/service_utils/src/db.rs Outdated
// this internally to run queries/transactions with proper error handling and connection management
fn get_connection(db_pool: &PgSchemaConnectionPool) -> result::Result<DBConnection> {
/// Checks out a connection from the pool with statement caching disabled.
pub fn checkout_connection(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get was nicer

Comment on lines +23 to +25
fn workspace_key(organisation_id: &str, workspace_id: &str) -> String {
format!("{}/{}", organisation_id, workspace_id)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the workspace schema name as the key

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is workspace update @Datron .
In the workspace handlers - also we are doing in the same way.
Though we don't need any keep , i can just pass org and workspace.

Here is the reference from the handler

            diesel::update(workspaces::table)
                .filter(workspaces::organisation_id.eq(&org_id.0))
                .filter(workspaces::workspace_name.eq(&workspace_name))

Comment on lines +118 to +129
pub(crate) fn map_acquire_lock_error(error: AcquireLockError) -> actix_web::Error {
match error {
AcquireLockError::LockContended(lock) => WorkspaceLockConflict::new(lock).into(),
AcquireLockError::Diesel(diesel::result::Error::NotFound) => {
actix_web::error::ErrorNotFound("Workspace not found")
}
AcquireLockError::Diesel(e) => {
log::error!("failed to acquire workspace lock: {}", e);
actix_web::error::ErrorInternalServerError("Failed to acquire workspace lock")
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be part of our error enum, would make '?' simpler

Comment thread crates/service_utils/src/workspace_lock.rs
@sauraww sauraww force-pushed the claude/add-workspace-locking-middleware-l9R1k branch 5 times, most recently from ebaeb11 to 527c24a Compare April 30, 2026 11:41
@Datron Datron self-requested a review April 30, 2026 11:54
…serialize write operations

This commit introduces a new middleware that serializes all write operations
(POST, PUT, DELETE, PATCH) per workspace using PostgreSQL advisory locks.

Changes:
- Created WorkspaceLockMiddleware that:
  - Extracts org_id and workspace_id from requests
  - Computes a unique lock key using hash of org_id:workspace_id
  - Acquires PostgreSQL advisory lock before processing write operations
  - Ensures lock is released after request completion
  - Skips locking for read operations (GET, etc.)

- Registered the middleware on all workspace-scoped endpoints:
  /context, /dimension, /default-config, /config, /audit, /function,
  /types, /experiments, /experiment-groups, /webhook, /variables,
  /resolve, /auth

This ensures write operations to the same workspace are serialized,
preventing race conditions and maintaining data consistency.

refactor: Use two-argument pg_advisory_lock for better lock space utilization

Changed from single-argument pg_advisory_lock(bigint) to two-argument
pg_advisory_lock(int, int) form for workspace locking.

Benefits:
- More natural mapping: org_id and workspace_id get separate hash spaces
- Better lock space utilization: each ID gets full 32-bit space
- Lower collision probability: separate hashing reduces conflicts
- Easier debugging: both components visible in pg_locks table

Implementation:
- compute_lock_keys() now returns (i32, i32) tuple
- org_id and workspace_id are hashed independently
- Updated acquire/release functions to use two-argument SQL
- Enhanced tests to verify component separation

feat: Add retry logic with exponential backoff for workspace locks

Changed from blocking pg_advisory_lock() to non-blocking pg_try_advisory_lock()
with intelligent retry logic to prevent indefinite request blocking.

**Previous Behavior:**
- pg_advisory_lock() blocks indefinitely until lock is available
- Requests could hang for extended periods during high contention
- No visibility into lock acquisition delays
- Risk of cascading timeouts

**New Behavior:**
- pg_try_advisory_lock() returns immediately with success/failure
- Exponential backoff retry: 10ms, 20ms, 40ms, 80ms... up to 500ms max
- Maximum 10 attempts (total ~5 seconds max wait)
- Clear error message after exhausting retries
- Logs retry attempts for observability

**Retry Configuration:**
- MAX_RETRIES: 10 attempts
- INITIAL_BACKOFF_MS: 10ms
- MAX_BACKOFF_MS: 500ms (cap to prevent excessive delays)

**Benefits:**
- Predictable maximum wait time (~5 seconds)
- Better user experience with faster failures
- Reduced risk of cascading timeouts
- Visibility into lock contention via logs
- Graceful degradation under high load

fix: Replace blocking sleep with async sleep in lock retry logic

Fixed critical async/blocking issues flagged by code review:

**Issue 1: Blocking sleep in async context**
- Changed std::thread::sleep() to actix_web::rt::time::sleep().await
- Using blocking sleep in async middleware would block the entire worker thread
- This prevented other requests from being processed on that thread
- Now properly yields control back to the async executor during backoff

**Issue 2: Made acquire_advisory_lock async**
- Function signature changed from sync to async
- Properly propagates async behavior through the call chain
- Maintains non-blocking execution throughout retry attempts

**Impact:**
- Before: Worker threads would be blocked during lock retry delays
- After: Worker threads can process other requests while waiting
- Much better concurrency and throughput under lock contention

fix: Add RAII guard to ensure lock release even on panic

Implemented AdvisoryLockGuard using RAII pattern to guarantee lock
release in all code paths, including when handlers panic.

**Problem:**
Previous implementation would skip lock release if the handler panicked:
```rust
acquire_lock()
handler()  // <-- If this panics...
release_lock()  // <-- ...this never runs!
```

This would leave locks held until DB connection closes, potentially
causing deadlocks or severe contention.

**Solution:**
Created AdvisoryLockGuard struct that implements Drop:
```rust
struct AdvisoryLockGuard<'a> {
    conn: &'a mut PgConnection,
    org_key: i32,
    workspace_key: i32,
}

impl Drop for AdvisoryLockGuard<'_> {
    fn drop(&mut self) {
        // Always releases lock, even on panic
        release_advisory_lock(...)
    }
}
```

**How it works:**
1. Acquire lock
2. Create guard (holds mutable reference to connection)
3. Call handler
4. Guard is automatically dropped when scope ends
   - On normal return: guard drops, lock released
   - On panic: guard drops during unwinding, lock released
   - On early return: guard drops, lock released

**Benefits:**
- Guaranteed lock cleanup in all code paths
- Panic-safe resource management
- Prevents lock leaks that could cause deadlocks
- Follows Rust RAII best practices

fix: moved intialization of lock guard to acquire call
@sauraww sauraww force-pushed the claude/add-workspace-locking-middleware-l9R1k branch from 527c24a to 672e074 Compare April 30, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants