Skip to content

fix(dashboard): drop ELK edges that reference nodes removed by orphan/cycle repair passes#456

Open
tirth8205 wants to merge 2 commits into
Egonex-AI:mainfrom
tirth8205:fix/elk-layout-orphan-edges
Open

fix(dashboard): drop ELK edges that reference nodes removed by orphan/cycle repair passes#456
tirth8205 wants to merge 2 commits into
Egonex-AI:mainfrom
tirth8205:fix/elk-layout-orphan-edges

Conversation

@tirth8205

Copy link
Copy Markdown
Contributor

Problem

  • repairElkInput is supposed to hand ELK a self-consistent graph, but the orphan-edge pass (step 4) validates edges against allIds, which is computed once by walking childrenB — the child set BEFORE step 3 removes orphan children and BEFORE step 5 removes containment-cycle nodes. So an edge whose endpoint is a node that gets dropped by step 3 or step 5 is NOT dropped: it survives in the returned edges array pointing at a node that no longer exists in children. Verified by reproducing the logic: input children [a, orphan(parentId:ghost)] with edge a -> orphan; after repair children is [a] (orphan dropped) yet edges still contains e1: a -> orphan. This dangling edge is exactly the kind of malformed input that can make elk.layout() throw, which then trips the elk-layout-failed fatal path the repair function exists to avoid.

Fix

  • Move the orphan-edge pass (step 4) to run AFTER the containment-cycle pass (step 5), and recompute the id set from the final child tree (childrenD) instead of reusing the stale allIds from childrenB. Concretely: rename the step-4 allIds use to a freshly-walked set, e.g. const finalIds = new Set(); const walkFinal = (children: ElkChild[]) => { for (const c of children) { finalIds.add(c.id); if (c.children) walkFinal(c.children); } }; walkFinal(childrenD); let orphanEdges = 0; const…

Testing

Adds unit test(s) that fail before the change and pass after. The full dashboard test suite, eslint, and tsc --noEmit all pass locally on this branch.

Found via a static correctness audit of the dashboard ELK layout repair.

🤖 Generated with Claude Code

…/cycle repair passes

The orphan-edge pass validated edges against an id set walked from the
pre-removal child tree (childrenB), so an edge whose endpoint is a node
later dropped by the orphan-child or containment-cycle passes survived,
pointing at a node no longer present in children. ELK can throw on such
dangling input, tripping the elk-layout-failed fatal path.

Move the orphan-edge pass to run after the containment-cycle pass and
validate edges against a freshly-walked id set from the final child tree
(childrenD).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@thejesh23 thejesh23 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns before this lands.

1. Layout is still the de-facto deduper. Step 3 (orphan-child) and step 4 (cycle) know they're removing nodes but don't drop their own dangling edges — they leave it to step 5 to clean up. Worse, when step 3 filters a parent it drops c.children wholesale, so every edge into that subtree's descendants gets silently dropped by step 5 too. If the orphan/cycle passes returned {children, removedIds} the edge filter could be local to each pass and the reason for each drop would be recoverable.

2. Observability is lossy. The issue message is "Dropped N edge(s) referencing nonexistent nodes" and the category is elk-orphan-edge regardless of whether the missing endpoint was an original ghost, an orphan-child drop, or a cycle drop. Combined with appendLayoutIssues deduping by level|message (store.ts:776), a second layout run that drops a different single edge produces the same string and is swallowed — so diff-impact re-layouts won't surface fresh losses. At minimum, include edge ids or the missing-endpoint id in the message.

3. Test gaps. The new test covers the orphan-child case but not the cycle case (step 4 removing a node that an edge targets), nor the cascading case where step 3 drops a parent and an edge into a now-vanished grandchild gets quietly dropped by step 5. Both are reachable from the same root cause and would lock the contract this PR is establishing.

Nit: // 4. dropCircularContainment comment block was renumbered but the inline doc on step 5 says "step 3 or step 4" — fine until someone renumbers again. Consider naming the steps instead of numbering them.

…-pass contract

Include the dropped edge ids in the elk-orphan-edge issue message so
distinct losses produce distinct strings and survive store.ts
appendLayoutIssues' `level|message` dedupe — previously a re-layout that
dropped a different single edge yielded a byte-identical message and the
fresh loss was swallowed.

Document that dropOrphanEdges (step 5) is the single reconciliation point
that validates edges against the final post-removal child tree, covering
ghost endpoints plus orphan-child subtree drops and containment-cycle
drops; per-pass attribution is deliberately not tracked.

Add tests for the containment-cycle endpoint case and the cascading
orphan-child parent drop carrying out a nested grandchild edge target, and
assert distinct dropped-edge messages across runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tirth8205

Copy link
Copy Markdown
Contributor Author

2 (lossy observability / dedupe): Fixed. dropOrphanEdges now collects the dropped edge ids while filtering and appends them to the message, e.g. Dropped 1 edge(s) referencing nonexistent nodes: e1. (list capped at 20 with a +N more suffix). Distinct single-edge losses across re-layouts now produce distinct strings, so appendLayoutIssues' level|message dedupe no longer swallows a fresh loss. No change needed in store.ts.

3 (test gaps): Added both missing cases. (a) Containment-cycle endpoint: two nodes mutually contained via shared ids at different levels so the cycle pass removes both, then asserts the edge targeting one of them is dropped with its id in the message. (b) Cascading orphan-child drop: a parent with a missing parentId whose nested grandchild is an edge target — the parent (and its subtree) is dropped by the orphan-child pass and step 5 reconciles the dangling edge. Also added a regression test asserting two different dropped edges yield non-equal messages.

1 (layout as de-facto deduper / lost drop reason): Kept the current step-5 approach. The correctness invariant the PR promises holds — step 5 validates edges against the final post-removal child tree (childrenD), so edges into a dropped parent's vanished subtree and into cycle-removed nodes are all dropped before ELK sees them. The {children, removedIds} refactor that would make each pass filter its own edges with a specific reason is a three-pass return-shape redesign beyond this narrow reorder fix; I've documented in code that step 5 is the single reconciliation point for ghost/orphan-child/cycle removals and that per-pass drop attribution is deliberately not tracked, and we can file a follow-up if attribution becomes load-bearing.

Nit (numeric step cross-references): The current comments are accurate post-renumber; I left the numbering but expanded the step-5 comment to also name the passes (dropOrphanChildren/dropCircularContainment/dropOrphanEdges) so the cross-reference is anchored to stable names.

Full dashboard suite: 46 passed. tsc --noEmit clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants