Skip to content

Backup / restore-backup / trash — data-safety v2 (8 tracks, 276 tests)#62

Merged
phitoduck merged 14 commits into
mainfrom
delete-safeguards
Jun 13, 2026
Merged

Backup / restore-backup / trash — data-safety v2 (8 tracks, 276 tests)#62
phitoduck merged 14 commits into
mainfrom
delete-safeguards

Conversation

@phitoduck

@phitoduck phitoduck commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Summary

Implements the data-safety spec at docs/backup-spec.md across 8 parallel tracks (T0-T8). 276/276 unit tests green. Version 0.7.0.

What ships

  • loseit backup — captures the diary into a per-grain TOON archive (~/.local/share/loseit/backup/YYYY/MM.toon etc.). Resumable, atomic per-grain, one RPC per grain by default via the bulk-fetch endpoint.
  • loseit restore-backup — restores from an archive. Default safe mode is entry-level upsert (food_id + modified_at ± 10m); cheap mode (--skip-restore-on-nonempty-grain-time-ranges) skips grains the server already has data for.
  • loseit delete now routes through a TrashSink before the wire delete fires. Default sink (LocalFileTrashSink) appends a JSONL record to ~/.local/share/loseit/trash.jsonl (mode 0600). --no-trash requires --i-know-this-is-unrecoverable.
  • loseit restore-trash — replays the most recent (or --line N) trash record.

Track-by-track PR trail (already merged into delete-safeguards)

Track PR Subject
T0 #65 diary_range SDK method (1 RPC per grain)
T1 #63 TOON file format library + atomic-write helper
T2 #68 grain fetch primitive (range-RPC + recursive split-and-retry)
T3 #67 earliest-day discovery via yearly range probes
T4 #64 surface created_at / modified_at on FoodLogEntry
T5 #66 TrashSink protocol + delete-via-sink + restore-trash
T6 #70 orchestrator + cheap-mode restore
T7 #69 upsert match pure-function (food_id + modified_at ± 10m)
T8 #71 backup + restore-backup CLI + safe-mode upsert restore wiring

Empirical findings during implementation

  • Bulk-fetch endpoint exists and works. Captured live by clicking "My Week" on the Lose It! home dashboard. getDailyDetailsIncludingPendingForDateRange returns one DailyDetails per day in the requested range. Spec §6.1 was rewritten so this is the default; per-day getDailyDetailsIncludingPendingForDate is the recursion floor.
  • ⚠️ FoodLogEntry.f4 is NOT a real "created" timestamp. Empirical analysis of captured fixtures shows values clustering around 1970-02-15 (decoded as epoch-ms). Only f5 (modified_at) is a real timestamp (~June 2026 in the fixtures). The spec called for (food_id, created_at ± 10m) as the upsert key; the implementation uses (food_id, modified_at ± 10m) instead. Field is still surfaced for forward-compat. A future live re-capture should verify what f4 actually is — could be a counter, hash, or different epoch.

What did NOT ship (deliberately, per spec)

  • Cloud mirroring (Drive/Dropbox/S3). Local folder is canonical.
  • Compression flag. Grep-ability wins.
  • Schema-migration tooling. When we bump schema_version, then we design.
  • Async TrashSink (spec §9.7 open question).
  • year grain (oversized for the bulk endpoint by design).

Test plan for the reviewer

⛔ DO NOT MERGE until reviewed by Eric.

Per Eric's explicit instruction: this PR is for review only. Eric will merge it manually once satisfied.

phitoduck and others added 12 commits June 12, 2026 18:16
Captures the design for `loseit backup` / `loseit restore-backup` and
the trash-on-delete safety, plus a parallel-tracked implementation
plan with BDD acceptance scenarios written in CLI language.

The plan was originally drafted under the assumption that Lose It! had
only a per-day diary endpoint. A live capture (clicking "My Week" on
the home dashboard) confirmed `getDailyDetailsIncludingPendingForDate-
Range` is real and dispatched in production. Adding T0 to decode it
flips the fetch primitive from N RPCs/grain to 1.

Sanitized request/response fixtures included; both have user_id/
user_name replaced with the test-config placeholders so the conformance
suite can byte-compare against them.
Reviewer asked for the BDDs to show what files should look like, not
just that they exist. Each scenario now spells out:

  - Exact TOON top-level keys in order (schema_version, account,
    grain, generated_at, entries[N]), with placeholder values for
    fields whose exact contents are environmental (timestamps, hex
    IDs, email-like strings).
  - Concrete stdout snippets per scenario — the "fetch / fallback /
    skip" rows and the trailing summary block.
  - The trash.jsonl line shape as a JSON object with every required
    key, including the nested entry block.
  - The diary --output json shape including created_at / modified_at.

The goal is interface validation: each BDD now tells a reviewer the
exact surface a passing implementation must produce, not just "the
file exists." Specific values (counts, paths) are exact where they
must be; environmental values are abstract.
Implements track T1 of the backup-spec: the on-disk contract for
grain files, foods cache, and index file (spec §4). Adds:

- `lose_it.backup._fs` — dataclasses (GrainDoc, FoodsDoc, IndexDoc,
  GrainEntry, FoodCacheEntry, AccountRef, GrainBounds), read/write
  pairs through `toon_format`, schema-version guard
  (`SchemaVersionMismatch`), and an atomic-write primitive
  (`atomic_write_text`: tmp -> fsync -> os.replace, leaves no
  `*.tmp*` siblings).
- Entries are canonically sorted on write by
  (day_num asc, meal_ordinal asc, created_at asc) so diffs between
  two snapshots are stable.
- Top-level key order is pinned to spec §4 in every writer so the
  CLI BDDs that grep the first lines hold.
- Bumps `version.txt` to 0.3.0 (minor: new module).

Tests: 13 hermetic conformance cases covering round-trip,
key-order, sort-on-write, schema-version refusal on both reader
and writer, and the atomic-write postcondition.
Adds the wire decoder for getDailyDetailsIncludingPendingForDateRange
and exposes it through LoseIt.diary_range so callers can pull a window
of diary days with a single RPC instead of N per-day fetches. Also
defines TooMuchData for the T2 fetch primitive to catch 413/429/5xx
and bisect into a smaller grain.

- core.daily: build_range_payload, get_daily_details_range,
  parse_entries_by_day, TooMuchData
- core.init: get_init_day_keys (parses full day_num/day_key window
  from a single getInitializationData response)
- client: LoseIt.diary_range bootstraps the day-key cache lazily; a
  full backup loop now costs 1 init + N range RPCs, not N*per-day
- conformance fixture for the request (byte-pinned) + response
)

Routes every delete through a trash sink BEFORE the wire call fires,
so deleted entries are always recoverable. Per docs/backup-spec.md §9.

New module: src/lose_it/trash.py
  - TrashSink Protocol (runtime_checkable)
  - TrashReceipt, DeleteResult, DeleteSafetyError
  - LocalFileTrashSink — appends JSONL to
    ~/.local/share/loseit/trash.jsonl with chmod 600.
  - ConsoleTrashSink — TOON/JSON to stdout/stderr.
  - ChainedTrashSink — fan-out; all-must-succeed, no rollback (§9.7 q3).

SDK rewrite:
  - LoseIt.delete_entry(entry, *, trash_sink=..., acknowledge_no_trash,
    confirm) -> DeleteResult. Default sink: LocalFileTrashSink.
    Stash succeeds first; only then does the wire delete fire. If the
    sink raises, the wire call is skipped and the exception propagates.
  - LoseIt.restore_trash(*, trash_file, line, keep, dry_run) — replays
    a trash record through log_food and (optionally) consumes the line.

CLI:
  - loseit delete grows --trash-file / --print-deleted / --no-trash /
    --i-know-this-is-unrecoverable. --no-trash without ack exits 2 with
    the exact BDD-pinned stderr.
  - loseit restore-trash — new command. Default consumes the last line.

Tests:
  - tests/conformance/test_trash.py — unit tests pin the stash-before-
    wire-call invariant + the chained-sink no-rollback contract.
  - tests/conformance/test_cli_trash.py — Typer CliRunner integration.
  - test_cli{,_toon}.py env fixtures isolate HOME so the default sink
    writes inside tmp_path instead of the developer's real homedir.

Version: 0.2.0 -> 0.3.0 (minor — adds visible CLI surface).
T4 of the backup-spec implementation plan. Pulls FLE.f4 / FLE.f5
(GWT epoch-ms longs) into ``FoodLogEntry.created_at`` /
``modified_at`` as aware UTC datetimes, and projects them into
``to_dict()`` as ISO 8601 strings. Unblocks T7 (safe-mode restore),
whose upsert match key is (food_id, created_at +/- 10 minutes).

Bumps version 0.2.0 -> 0.3.0 (visible field surface on a public
dataclass). New fields default to ``None`` so existing fixtures
and tests continue to construct ``FoodLogEntry`` without supplying
timestamps.
main shipped v0.3.0 via PR #60 while delete-safeguards was being
integrated; bumping to 0.4.0 keeps the tag-on-merge CI happy when
the eventual delete-safeguards -> main PR lands.
Adds `lose_it.backup.discover_earliest_day` (T3 of the backup-spec impl
plan): a pure-logic probe that finds the earliest day a user has diary
entries on. The algorithm uses the bulk range RPC (T0) hierarchically:

- one yearly `diary_range(Jan-1, Dec-31)` probe per candidate year,
- once a year hits, monthly `diary_range(month_start, month_end)` probes,
- once a month hits, day-by-day `diary(d)` walk to the exact day.

Day-by-day rather than binary-search is load-bearing (spec §5.2): a
user who only logged Aug-14 and Aug-15 of their first month would be
silently skipped by a midpoint probe.

Falls back to a 12-month monthly fan-out when a yearly probe raises
`TooMuchData` (the heavy-logger spec §5 branch).

Hermetic conformance tests cover the four spec-called-out scenarios
(typical profile, no entries ever, late-year start, oversize year)
plus the day-by-day walk invariant, driven by a `FakeLoseIt` double.

Bumps version.txt to 0.5.0 ahead of the backup feature integration.
…retry) (#68)

Adds T2 of the backup track: src/lose_it/backup/_fetch.py with the
Grain value type (day/week/month constructors + canonical splitter),
fetch_grain (one diary_range call per grain, recurses on TooMuchData
through month -> weeks -> days, re-raises if the day-grain floor
fails), update_food_cache (spec §6.3 once-per-UTC-day describe gate
with today_utc injectable for tests), and the to_grain_entry /
grain_entry_sort_key helpers used by the T6 orchestrator.

Sort key substitutes modified_at for created_at: T4's empirical
analysis showed FoodLogEntry.created_at (FLE.f4) is not a real
timestamp; only modified_at (FLE.f5) is a real UTC epoch. The
substitution lives in grain_entry_sort_key with a comment, and
test_to_grain_entry_uses_modified_at_when_created_at_is_bogus pins
the decision.

Tests: 20 new hermetic conformance cases covering clean-fetch RPC
count, month->week and week->day recursive fallback, day-grain
abort, describe cadence (gate hit + gate miss + dedupe + new id),
and the sort-key invariants. Full suite is 232 passing.
#69)

Adds `lose_it.backup._upsert` with the pure-function half of T7:
upsert_match (boolean) and plan_day (per-day matched/missing partition)
for safe-mode restore. Per the empirical FoodLogEntry analysis,
modified_at substitutes for created_at as the time half of the key — the
captured f4 values are not real epoch-ms timestamps. plan_day enforces
the additive-only contract (spec §7.4): server-only entries are never
enumerated, and each server entry can claim at most one archive entry.

Bumps version.txt to 0.5.0.
Composes T1 + T2 + T3 into the end-to-end ``LoseIt.backup`` flow plus
the cheap-mode restore (``--skip-restore-on-nonempty-grain-time-ranges``).
Safe-mode restore raises ``NotImplementedError`` with a clear pointer at
the cheap-mode flag until T7 ships.

* ``src/lose_it/backup/_orchestrator.py`` — new module owning the
  backup walk, resume logic, discovery cache handshake, and cheap-mode
  restore loop. Silent — all per-grain decisions go through an optional
  ``progress(report)`` callback.
* ``client.py`` — adds ``LoseIt.backup`` and ``LoseIt.restore_backup``
  façades; both are 1-1 forwarders to the orchestrator.
* ``tests/conformance/test_backup_orchestrator.py`` — 16 hermetic
  tests using a structural ``FakeLoseIt`` double.
* ``version.txt`` → 0.6.0 per the impl plan's track-T6 bookkeeping.
…store (#71)

* sdk: hide raw PK arrays from FoodLogEntry.to_dict(); add round-trip functional test (#60)

* sdk: hide raw PK arrays from FoodLogEntry.to_dict(); add round-trip functional test

`FoodLogEntry.to_dict()` projected the food/entry SimplePrimaryKeys
verbatim as 16-int byte arrays — noise that no caller of the JSON/TOON
output actually consumed:

- Food PK: never crosses the SDK boundary as bytes. The only external
  use case is round-tripping a food reference, which `food_id` (hex)
  already serves via `LoseIt.{get_food,describe_food,log_food}`.
- Entry PK: no LoseIt RPC accepts it as input on its own. Even
  `deleteFoodLogEntry` requires the full entry body, which forces a
  fresh diary read; the entry PK comes along for the ride. There is
  no external workflow where a caller can act on an entry PK alone.

So `to_dict()` now emits `food_id` (32-char hex, same shape as
`FoodSearchResult.food_id`) and drops both raw `pk` arrays + the
entry-side hex entirely. The raw bytes stay on the dataclass where
the envelope builders need them.

Added a live-API functional test (`tests/functional/test_readme_example.py`)
mirroring the README's SDK example: pinned to 2018-03-15 for isolation,
asserts empty → log → present → delete → empty using only the documented
SDK surface (`search`/`log_food`/`diary`/`delete_entry`). Matching
diary entries by `food_id` proves that the externally visible identifier
is sufficient for the round-trip — no PK bytes ever escape the SDK.

Also extended the README example to use `food_id` for the match-and-delete
loop instead of a name substring, so the docs and the test stay aligned.

* test(functional): use servings-based second log; gram path needs gram-stored food

The README's SDK example logs `serving_amount=61, serving_unit=ServingUnit.g`
as the second log, but the first hit for `li.search("tortilla")` is the
Xtreme Wellness wrap — serving-stored with no per-serving-g cross-class
slot — so the gram path raises `PortionError` before the diary read-back
even runs.

The round-trip test isn't gating on the unit-conversion code (that's
covered by `test_entries_serving_unit.py` in the conformance suite). It's
gating on `search → log → diary → delete → diary` only needing `food_id`
to round-trip — no PK arrays. So the second log now uses `servings=2.0`
instead, which works regardless of the food's native unit.

Verified live against the real API on 2018-03-15: passes.

* chore: bump version to 0.3.0

* feat(cli+sdk): backup / restore-backup CLI + safe-mode upsert restore

Wires T7's pure plan_day function into the orchestrator as
restore_backup_safe, exposes it through LoseIt.restore_backup (replacing
the prior NotImplementedError stub), and adds the loseit backup and
loseit restore-backup CLI commands per spec §3.

* src/lose_it/backup/_orchestrator.py: new restore_backup_safe + a
  SafeRestoreGrainReport report shape; RestoreSummary gains per-day
  counters that safe mode populates.
* src/lose_it/client.py: restore_backup default routes through safe
  mode, with skip_restore_on_nonempty_grain_time_ranges=True falling
  back to cheap mode. New upsert_window kwarg surfaces the ±10m fuzz.
* src/lose_it/cli.py: adds backup, restore-backup commands. Renders
  per-grain rows + summary block per spec §3.1 / §3.2; supports
  --dry-run, --quiet-skips, -o text|json|toon.
* tests/conformance/test_backup_restore_safe.py: 7 unit tests for
  safe-mode (missing → log, idempotent re-run, ±10m window, additive
  vs. server-only, dry-run, strict_account, pk round-trip).
* tests/conformance/test_cli_backup.py: 7 CLI integration tests
  hermetic via monkeypatched _open_loseit.
* version.txt: 0.6.0 → 0.7.0.
@phitoduck phitoduck changed the title docs(backup): spec + impl plan + range-RPC wire confirm Backup / restore-backup / trash — data-safety v2 (8 tracks, 276 tests) Jun 13, 2026
Main shipped PR #60 (0.3.0) during the 8-track integration. delete-
safeguards bumped through 0.4.0, 0.5.0, 0.6.0, 0.7.0 in successive
track merges. Resolving the version.txt conflict in favor of 0.7.0
since CI tags v{version} on merge to main, and v0.3.0 is already
taken.
…xpand delete

Documents the new commands shipped in delete-safeguards (8-track impl):
- delete: trash-sink behavior + recovery flow + --trash-file / --no-trash flags
- backup: rolling archive to ~/.local/share/loseit/backup/YYYY/MM.toon,
  one RPC per grain via the bulk-fetch endpoint, --quiet-skips, --dry-run
- restore-backup: safe-mode (upsert by food_id + modified_at ± 10m, the
  empirical correction for FoodLogEntry.f4 not being a real timestamp) +
  cheap-mode
- restore-trash: undo the most recent loseit delete, --line/--keep/--dry-run

Also updates the TOC and the loseit --help command-list rendering.
@phitoduck phitoduck merged commit 4a21859 into main Jun 13, 2026
3 of 4 checks passed
phitoduck added a commit that referenced this pull request Jun 13, 2026
Two ruff failures kept CI red on main after the delete-safeguards
merge (PR #62) and the path-consolidation merge (PR #72):

  - UP035: src/lose_it/backup/_upsert.py imported Iterable from typing.
    Pyupgrade rule says use collections.abc.Iterable.
  - I001: after the swap, the import block became un-sorted.

Both fixed; ruff check is now clean. Tests stay at 276/276.
phitoduck added a commit that referenced this pull request Jun 13, 2026
Two ruff failures kept CI red on main after the delete-safeguards
merge (PR #62) and the path-consolidation merge (PR #72):

  - UP035: src/lose_it/backup/_upsert.py imported Iterable from typing.
    Pyupgrade rule says use collections.abc.Iterable.
  - I001: after the swap, the import block became un-sorted.

Both fixed; ruff check is now clean. Tests stay at 276/276.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant