Skip to content

Preserve within-file duplicates and simplify matching logic#34

Draft
danischm wants to merge 6 commits intomainfrom
ds-merge
Draft

Preserve within-file duplicates and simplify matching logic#34
danischm wants to merge 6 commits intomainfrom
ds-merge

Conversation

@danischm
Copy link
Member

Summary

Improves YAML list merging behavior to preserve within-file duplicates while simplifying the
matching logic. This makes the behavior more predictable and intuitive.

Changes

1. Preserve Within-File Duplicates (Order-Independent)

Previously, deduplicate=True would remove ALL duplicate list items globally, including
duplicates within a single file.

New behavior: If ANY file contains duplicates in a list, that entire list is concatenated
(no merging) to preserve all duplicates. This ensures:

  • ✅ All within-file duplicates preserved (from any file)
  • ✅ Order-independent results (same output regardless of file load order)
  • ✅ Predictable behavior ("duplicates disable merging")

Example:

# file1.yaml
devices:
  - name: switch1
  - name: switch1  # duplicate

# file2.yaml
devices:
  - name: switch1
    ip: 192.168.1.1

Before: 1 device (file1 duplicates lost)
After: 3 devices (all preserved, no merging)

2. Relaxed Matching Logic

Previously, dict items would NOT merge if both sides had unique primitive keys. This prevented
useful scenarios like combining complementary configuration data.

New behavior: Items merge as long as they share at least one primitive key with matching values.
Both sides can have unique keys - they'll be combined.

Example:

# file1.yaml
devices:
  - name: switch1
    vlan: 100

# file2.yaml
devices:
  - name: switch1
    port: eth0

Before: 2 separate items (both have unique keys)
After: 1 merged item {name: switch1, vlan: 100, port: eth0}

Implementation Details

  • Added _has_duplicates_in_list() helper function to detect duplicates using same matching logic
  • Modified merge_dict() to check both source and destination lists for duplicates before merging
  • Updated docstrings with clear examples of new behavior

Breaking Changes

  1. Duplicate preservation: Lists containing duplicates in ANY file will now be concatenated
    instead of merged, preserving all within-file duplicates. This may result in more items than
    before if you have duplicates and previously relied on cross-file merging.

  2. Relaxed matching: Items now merge when they share primitive keys, even if both sides have
    unique keys. Complementary configuration data will be combined instead of kept separate. Items
    that previously stayed separate may now merge.

Testing

  • ✅ All existing unit tests pass (with updated expectations)
  • ✅ New test cases for both-sides-unique-keys scenario
  • ✅ Verified duplicate preservation from any file
  • ✅ Verified order-independence
  • ✅ Ruff linting passes
  • ✅ Mypy type checking passes

@danischm danischm marked this pull request as draft November 12, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant