Skip to content

bug: buildSessionMeta preserves stale entry_count when meta.json exists #532

@galexy

Description

@galexy

Symptom

ox session upload's buildSessionMeta reads any existing meta.json and reuses it as-is — including the entry_count field. It never recomputes entry_count from the actual raw.jsonl on disk. So any session whose raw.jsonl changed size after meta.json was first written reports a wrong count forever (or until meta.json is manually deleted).

Evidence

During the #519 investigation:

  1. ox agent <id> session import wrote meta.json with entry_count: 50 (the 50 entries it imported).
  2. raw.jsonl was rebuilt in place to 235 entries.
  3. ox session upload ran. Preserved entry_count: 50 in meta.json even though raw.jsonl now had 235 entries.
  4. Dashboard and ox session list reported 50 entries for a 235-entry session, long after upload.

The stale count persisted through several upload attempts until meta.json was explicitly deleted before upload.

Root cause

cmd/ox/session_upload_cmd.go:141-177:

func buildSessionMeta(sessionPath, sessionName, projectRoot string, fileRefs map[string]lfs.FileRef) (*lfs.SessionMeta, error) {
    // try reading existing meta.json first
    meta, err := lfs.ReadSessionMeta(sessionPath)
    if err == nil {
        // update the file manifest
        if fileRefs != nil {
            meta.Files = fileRefs
        } else if meta.Files == nil {
            meta.Files = make(map[string]lfs.FileRef)
        }
        return meta, nil
    }
    // ... fallback: count entries from raw.jsonl, build fresh meta
}

Only Files is updated on the "existing meta.json" path. entry_count, summary, and other fields are preserved as-is. The fallback branch (when no meta.json exists) correctly calls countJSONLLines(filepath.Join(sessionPath, ledgerFileRaw)), but it only runs when there's no existing meta.

Impact

  • Any session re-uploaded after its raw.jsonl changed size publishes wrong metadata.
  • ox session list and the dashboard show stale entry counts.
  • Obscures whether a session was modified or re-captured — users can't tell from metadata alone.
  • Downstream tooling (summaries, stats, session filtering) that trusts entry_count is silently wrong.

Fix direction

buildSessionMeta should always recompute entry_count from raw.jsonl on disk, regardless of whether existing meta.json had a value. It's a cheap file-line count — no reason to preserve the stale number.

Consider generalizing: any field in meta.json that's derivable from content files (entry_count, summary) should be recomputed at upload time, not preserved.

Acceptance

  • Modifying raw.jsonl and running upload writes a meta.json with the correct line count.
  • A test fixture exercises: existing meta.json with entry_count=N, modify raw.jsonl to have M entries, call upload, assert meta.json now reports M.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions