Skip to content

Replace Inactive: chained regex replacements can corrupt output when concept IDs collide #2

@dionmcm

Description

@dionmcm

Summary

AppDelegate.replaceInactiveConceptsInSelection (around lines 1125–1138 in Codeagogo/AppDelegate.swift) applies replacements sequentially against an accumulating result string:

```swift
var result = text
for (conceptId, replacement) in replacementsByCode {
let pattern = "(?<![0-9])" + NSRegularExpression.escapedPattern(for: conceptId) + "(?![0-9])" + "(\s*\|[^|]*\|)?"
if let regex = try? NSRegularExpression(pattern: pattern) {
let range = NSRange(result.startIndex..., in: result)
result = regex.stringByReplacingMatches(in: result, options: [], range: range, withTemplate: "...")
}
}
```

Each replacement comes from buildReplacementText, which emits \"<targetCode> |<display>|\" — and the target code is itself an SCTID (a string of digits). The digit-boundary anchors (?<![0-9]) and (?![0-9]) prevent substring matches but do not prevent matches inside text that was just inserted by a prior iteration.

Edge case

If concept A's replacement target SCTID matches concept B's inactive ID being processed in the same selection, the second iteration's regex matches inside the first's just-inserted replacement text.

Synthetic reproducer (numbers chosen for clarity, not real SCTIDs):

  • Selection: \"<< 11111111 OR << 22222222\"
  • Inactive 11111111 → suggested replacement 22222222
  • Inactive 22222222 → suggested replacement 99999999

Sequential processing on accumulating result:

  1. After replacing 11111111: \"<< 22222222 OR << 22222222\" ← both occurrences now match the second pattern
  2. After replacing 22222222: \"<< 99999999 OR << 99999999\" ← incorrect; the original 11111111 effectively got replaced by 99999999, skipping its actual target

The user sees their original 11111111 mapped to 99999999, which is the wrong concept and the wrong association.

Why it has not surfaced before

It requires two inactive concepts in the same selection where one's replacement target happens to also be the other's inactive code. Unlikely but reachable in clinical data with multiple retired concepts from the same module.

Suggested fix

Mirror the existing replaceSelection pattern (in the same file, around lines 600–650 in v1.1.0): collect all (startIndex, length, replacement) tuples from the original text (before any mutation), then apply them in reverse-index order against an NSMutableString. This makes each replacement immune to text inserted by other iterations.

Cross-port impact

The same issue exists in the Windows port (codeagogo-win) at TrayAppContext.cs:868–872, where the chained-regex approach was ported verbatim. Once this is fixed on Mac, the Windows port will be updated to mirror the position-based approach (tracked separately on the Windows side).

Discovered

During the v1.1.2 parity work on the Windows port — a multi-stage code review caught the edge case while reviewing the regex helpers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions