Skip to content

[BUG] [v0.0.7] locked_read_modify_write() loses flock after atomic_write replaces file inode via rename #174

@climax-dev-1

Description

@climax-dev-1

Bug Report

Project

cortex

Description

In cortex-common/src/file_locking.rs, the locked_read_modify_write() function acquires an exclusive flock on a file, reads it, modifies it, then calls atomic_write() to write the result. However, atomic_write() uses a write-to-temp-then-rename pattern that replaces the original file's inode. Since flock on Linux/Unix is associated with the file descriptor (which points to the old inode), the lock becomes ineffective the moment the rename completes — the lock is now held on an orphaned inode with no directory entry.

Location: src/cortex-common/src/file_locking.rs, lines 524-543

pub fn locked_read_modify_write<T, F>(path: impl AsRef<Path>, modify: F) -> FileLockResult<T>
where
    F: FnOnce(&str) -> (String, T),
{
    let path = path.as_ref();
    let config = LockConfig::default();

    // Acquire exclusive lock on the file (locks the inode via fd)
    let mut guard = acquire_lock(path, LockMode::Exclusive, &config)?;

    // Read current content
    let content = guard.read_to_string()?;

    // Apply modification
    let (new_content, result) = modify(&content);

    // BUG: atomic_write renames a temp file over the original path,
    // creating a NEW inode. The flock held by `guard` is on the OLD inode.
    // After this call, the lock is effectively released because the old
    // inode no longer has a directory entry.
    atomic_write(path, new_content.as_bytes())?;

    Ok(result)
}

The race condition:

  1. Process A: acquires flock on config.toml (inode 100)
  2. Process A: reads content, modifies it
  3. Process A: atomic_write writes to .config.toml.tmp.PID, then renames it to config.toml — now config.toml points to inode 101
  4. Process A's flock is still on inode 100 (orphaned, no directory entry)
  5. Process B: opens config.toml (inode 101), acquires flock immediately (no contention!)
  6. Process B: reads and modifies the file
  7. Process B: writes — potentially overwriting Process A's changes or causing corruption

Error Message

No error message — this is a silent race condition. The flock succeeds but provides no mutual exclusion after the atomic rename.

Debug Logs

N/A — source code analysis.

System Information

Linux (Ubuntu), source code review. This affects all Unix platforms using flock().

Steps to Reproduce

  1. Have two processes (or threads using separate file descriptors) call locked_read_modify_write() on the same file concurrently
  2. Process A acquires the lock first and proceeds to atomic_write
  3. After the rename in atomic_write, Process B can immediately acquire a lock on the new inode
  4. Both processes may now be modifying the file concurrently, defeating the purpose of the lock

Expected Behavior

The lock should provide mutual exclusion for the entire read-modify-write cycle. The fix should use the locked file descriptor directly for writing instead of atomic_write, or use a separate lock file (e.g., config.toml.lock) whose inode is never replaced:

Option 1 — Write through the locked fd (already available via guard.write_all):

pub fn locked_read_modify_write<T, F>(path: impl AsRef<Path>, modify: F) -> FileLockResult<T>
where
    F: FnOnce(&str) -> (String, T),
{
    let path = path.as_ref();
    let config = LockConfig::default();
    let mut guard = acquire_lock(path, LockMode::Exclusive, &config)?;
    let content = guard.read_to_string()?;
    let (new_content, result) = modify(&content);
    // Write directly through the locked fd — preserves inode and lock
    guard.write_all(new_content.as_bytes())?;
    Ok(result)
}

Option 2 — Use a separate .lock file whose inode is never replaced.

Actual Behavior

The flock is silently invalidated when atomic_write replaces the file via rename. Concurrent callers of locked_read_modify_write can overlap, causing lost updates or data corruption in config/session files.

Additional Context

The FileLockGuard already has a write_all() method (line 142-150) that writes through the locked file descriptor, which would preserve the inode and maintain the lock. The locked_read_modify_write function should use this instead of atomic_write.

Note that the FileLockManager with its process-local mutexes (line 564) mitigates this for threads within the same process, but does NOT protect against concurrent access from separate processes (e.g., multiple cortex CLI instances editing the same config file).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions