Skip to content

fix: Reduce synthetic single-parent FAMS in GEDCOM roundtrip#581

Open
isaacschepp wants to merge 26 commits intomainfrom
fix/fams-synthetic-family-roundtrip
Open

fix: Reduce synthetic single-parent FAMS in GEDCOM roundtrip#581
isaacschepp wants to merge 26 commits intomainfrom
fix/fams-synthetic-family-roundtrip

Conversation

@isaacschepp
Copy link
Copy Markdown
Collaborator

What and why

Fixes the +124 FAMS surplus in the queen test file and +22 in british-royalty during GED → GLX → GED roundtrip. The root cause: when a child has only one parent-child relationship (the other parent's link wasn't imported), the GEDCOM export created a synthetic single-parent FAM instead of placing the child in the existing marriage FAM.

The fix

In buildFamilyGroups, when a single-parent child's parent has families but all were skipped (because they have pair-matched children), use the first family as best-effort placement instead of falling through to createSyntheticFamily.

Before: child with missing father-link → skip marriage FAM (has paired children) → create synthetic FAM → extra FAMS on mother
After: child with missing father-link → skip marriage FAM → no unpaired family found → use first family (the marriage) → no extra FAMS

Trade-off

Children from genuinely different unions (e.g., father has children outside marriage) will now be placed in the marriage FAM rather than a separate synthetic FAM. The algorithm cannot distinguish "missing parent-link" from "different union" without metadata. This trade-off favors roundtrip fidelity (the common case) over the edge case.

Related issues

Fixes #486

Testing

  • New test: TestReconstructFamilies_SingleParentChildJoinsExistingFamily — verifies princess joins existing marriage FAM instead of getting synthetic one
  • Updated test: TestReconstructFamilies_SingleParentWithExistingFamily — expectations updated to match new behavior
  • All existing tests pass (0 regressions)

Breaking changes

None. GEDCOM export produces fewer spurious FAM records, which is strictly an improvement for roundtrip fidelity.

…thetic ones

When a child has only one parent-child relationship and that parent's
families all have pair-matched children, the export algorithm now uses
the first family as best-effort placement instead of creating a
synthetic single-parent FAM. This fixes the +124 FAMS surplus in the
queen test file roundtrip.

Trade-off: children from genuinely different unions (father has children
outside marriage) will now be placed in the marriage FAM rather than a
separate synthetic FAM. This is less accurate for that edge case but
eliminates spurious FAMS references that break roundtrip fidelity for
the common case (missing parent-link, not different union).

Fixes #486
Copilot AI review requested due to automatic review settings March 31, 2026 05:20
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 31, 2026

Deploying genealogix with  Cloudflare Pages  Cloudflare Pages

Latest commit: 4295423
Status: ✅  Deploy successful!
Preview URL: https://6683e075.genealogix.pages.dev
Branch Preview URL: https://fix-fams-synthetic-family-ro.genealogix.pages.dev

View logs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces spurious synthetic single-parent FAM records produced during GLX → GEDCOM export, improving GED → GLX → GED roundtrip fidelity by preferentially placing single-parent children into an existing spouse family when appropriate (fix for #486).

Changes:

  • Adjusted reconstructFamilies fallback logic to use an existing family (best-effort) instead of creating a synthetic single-parent family in the “all families skipped due to paired children” scenario.
  • Updated an existing single-parent reconstruction test to reflect the new (roundtrip-fidelity-biased) behavior.
  • Added a new regression test covering the missing-second-parent-link case that previously triggered synthetic family creation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
go-glx/gedcom_export_family.go Updates family matching fallback to avoid creating synthetic single-parent families when a suitable existing family is available.
go-glx/gedcom_export_family_test.go Updates expectations for the changed behavior and adds a regression test for #486.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot correctly noted the skip-then-fallback was redundant. Simplified
to gate the continue on len(familyIndices) > 1 — when a parent has only
one family, always place the child there. Comment updated to match.
@isaacschepp
Copy link
Copy Markdown
Collaborator Author

Both Copilot items addressed in d6289c4:

  1. Skip logic simplified — Adopted Copilot's suggestion: gate the continue on len(familyIndices) > 1 instead of skip-then-fallback. When a parent has only one family, always place the child there. Removes 7 lines of fallback code. Comment updated to match.

  2. PR description function namebuildFamilyGroups should be reconstructFamilies. Will fix the PR body.

Copilot AI review requested due to automatic review settings March 31, 2026 13:44
@isaacschepp isaacschepp review requested due to automatic review settings March 31, 2026 13:44
Copilot AI review requested due to automatic review settings March 31, 2026 15:03
@isaacschepp isaacschepp review requested due to automatic review settings March 31, 2026 15:03
Copilot AI review requested due to automatic review settings March 31, 2026 17:57
@isaacschepp isaacschepp review requested due to automatic review settings March 31, 2026 17:57
Copilot AI review requested due to automatic review settings March 31, 2026 19:28
@isaacschepp isaacschepp review requested due to automatic review settings March 31, 2026 19:28
@isaacschepp
Copy link
Copy Markdown
Collaborator Author

The queen.ged test file is at glx/testdata/gedcom/5.5.1/large-files/queen.ged (105K lines, 4683 INDI, 2863 FAM, 4871 FAMS references).

The "+124 FAMS" number came from the original todo.md — it was a manual measurement from an earlier version of the import/export code. There is no automated E2E test that roundtrips queen.ged and compares FAMS counts.

The unit test in this PR validates the algorithm fix at the reconstructFamilies level (single-parent child joins existing family instead of creating synthetic FAM). To verify against queen.ged specifically, you'd need to:

glx import glx/testdata/gedcom/5.5.1/large-files/queen.ged -o /tmp/queen.glx
glx export /tmp/queen.glx -o /tmp/queen-rt.ged
# Compare FAMS counts
grep -c '1 FAMS' glx/testdata/gedcom/5.5.1/large-files/queen.ged
grep -c '1 FAMS' /tmp/queen-rt.ged

If the counts don't match, the remaining gap may be from a different root cause than the one this PR fixes (e.g., the ASSO import in #589 may also affect participant reconstruction). Happy to add an E2E roundtrip FAMS count assertion if you want.

@noahbjohnson
Copy link
Copy Markdown
Contributor

noahbjohnson commented Mar 31, 2026

Review — tested against all GEDCOM test files + westeros archive

I rebuilt both main (c7c0981) and this branch (dc283dd), then roundtripped every GEDCOM in glx/testdata/gedcom/5.5.1/ plus several from the corpus. Results:

File Persons main vs PR
queen.ged 4,683 Identical (MD5 match)
habsburg.ged 24,473 Identical (FAM/FAMS counts match)
british-royalty.ged 188 Identical
royal92.ged 3,010 Identical
AROYALTree.ged 3,010 Identical
bullinger.ged 948 Identical
kennedy.ged 70 Identical
shakespeare.ged 31 Identical
bronte.ged 14 Identical

The PR produces byte-identical output for every GEDCOM roundtrip test. The +124 FAMS number from the todo.md appears to be stale — the current import/export code doesn't exhibit that surplus on any of these files.

Where it does have an effect: hand-built GLX archives

Exporting the westeros archive (not a GEDCOM roundtrip — entities were created directly in GLX):

  • main: 182 families
  • PR: 180 families (2 synthetic FAMs eliminated)

The 2 eliminated families are Oberyn Martell's Sand Snakes. On main, Oberyn has two FAMs:

  • @F74@: Oberyn + Ellaria Sand → their 4 children (Dorea, Elia, Loreza, Obella)
  • @F168@: Oberyn alone → 4 Sand Snakes from different mothers (Nymeria, Obara, Sarella, Tyene)

On the PR branch, all 8 children are merged into @F74@ (Oberyn + Ellaria). This falsely implies Ellaria is the mother of the Sand Snakes — she isn't, each has a different paramour as their mother. The GLX data correctly has only an Oberyn→child parent-child link for each Sand Snake with no second parent, and the current code correctly represents that as a separate single-parent FAM.

Why the roundtrips are unaffected

The GEDCOM importer creates two parent-child relationships per child when the FAM has both HUSB and WIFE. So after a roundtrip, every child that was in a two-parent FAM has two parent links, and the pair-matching in reconstructFamilies succeeds. The len(parentIDs) == 1 branch only fires for children with genuinely one parent link — which are already in single-parent FAMs in the original GEDCOM.

Recommendation

I'd hold this PR. The bug it targets doesn't reproduce on current code, and the behavioral change it introduces is a net negative for archives with legitimate single-parent children (like Oberyn's bastards). The +124 number was probably valid against an older version of the import/export code but has since been fixed by other changes.

Copilot AI review requested due to automatic review settings April 1, 2026 02:00
@isaacschepp isaacschepp review requested due to automatic review settings April 1, 2026 02:00
Copilot AI review requested due to automatic review settings April 1, 2026 02:04
@isaacschepp isaacschepp review requested due to automatic review settings April 1, 2026 02:04
Copilot AI review requested due to automatic review settings April 2, 2026 05:34
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 05:34
Copilot AI review requested due to automatic review settings April 2, 2026 15:19
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 15:19
Copilot AI review requested due to automatic review settings April 2, 2026 15:26
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 15:26
Copilot AI review requested due to automatic review settings April 2, 2026 15:39
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 15:39
Copilot AI review requested due to automatic review settings April 2, 2026 15:48
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 15:48
Copilot AI review requested due to automatic review settings April 2, 2026 15:59
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 15:59
Copilot AI review requested due to automatic review settings April 2, 2026 16:16
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 16:16
Copilot AI review requested due to automatic review settings April 2, 2026 16:23
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 16:23
Copilot AI review requested due to automatic review settings April 2, 2026 16:29
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 16:29
Copilot AI review requested due to automatic review settings April 2, 2026 16:36
@isaacschepp isaacschepp review requested due to automatic review settings April 2, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GEDCOM roundtrip: extra FAMS references from synthetic single-parent families

3 participants