Skip to content

Design breeder data model/import strategy to preserve dog-level legacy breeder details #269

@asku1990

Description

@asku1990

Summary

Current legacy import preserves bearek_id.KASVA as dog text, but loses dog-level breeder address fields when no canonical Breeder link is created. We need a production-safe model that preserves all per-dog legacy breeder facts while still supporting canonical kennel data from kennel.

Problem

  • Canonical Breeder is imported from kennel table.
  • Dog-level breeder link rate is currently extremely low.
  • For unlinked dogs, profile has only breeder name text and misses address/person details that exist in legacy dog rows.
  • We are mixing two semantics:
    • canonical kennel registry data
    • dog-record breeder snapshot data

Current observations

  • Dog.breederId links are rare.
  • Dog.breederNameText exists, but no per-dog breeder address snapshot fields are imported from bearek_id.
  • kennel may represent current canonical list; dog rows represent historical per-dog registry snapshot values.

Proposal to evaluate

  • Keep canonical Breeder (kennel source) as-is.
  • Add dedicated dog-level snapshot entity (e.g. DogBreederSnapshot) sourced from bearek_id (KASVA, KOSO, KOSOPOS, KOSOPAI, etc.).
  • Always persist snapshot for dog records.
  • Keep canonical link optional with explicit match metadata:
    • canonicalBreederId (nullable)
    • matchMethod
    • matchConfidence
  • UI should show snapshot as source-of-truth for dog history, and canonical link as enrichment when available.

Acceptance criteria

  • Decision doc for target data model and semantics (canonical vs snapshot).
  • Chosen migration approach for existing imported data.
  • Defined import/backfill strategy for missing dog-level breeder fields.
  • Defined admin/public UI behavior when canonical link exists vs missing.
  • Defined matching strategy + safety threshold + review/reporting.

Open questions

  • Do we trust kennel as “current canonical” only, or also historical?
  • Should snapshot live 1:1 with dog or as versioned history rows?
  • Which fields are required in MVP snapshot schema?
  • How should mismatch/conflict be surfaced in admin UI?
  • Do we need manual linking workflow/tools?

Priority

High (pre-production data integrity risk).

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions