Skip to content

Tighten update_versions.py regex to match <version> and <*.version> tags#49267

Draft
jeet1995 wants to merge 1 commit into
mainfrom
fix/update-versions-regex-custom-properties
Draft

Tighten update_versions.py regex to match <version> and <*.version> tags#49267
jeet1995 wants to merge 1 commit into
mainfrom
fix/update-versions-regex-custom-properties

Conversation

@jeet1995
Copy link
Copy Markdown
Member

Problem

update_versions.py uses external_dependency_version_regex (in eng/versioning/utils.py) to find and replace version values when processing {x-version-update} comments. The old regex:

r'(?<=<version>).+?(?=</version>)'

only matches content inside <version>...</version> elements. Custom Maven property tags like:

<scala-jackson.version>2.18.6</scala-jackson.version> <!-- {x-version-update;...;external_dependency} -->

carry valid {x-version-update} comments, but the regex substitution silently no-ops on them. This caused bannedDependencies CI failures when Jackson was bumped to 2.18.7 (see PR #49263 for the immediate fix).

Fix

Replace the regex with:

r'(?<=>)[^<]+(?=</(?:[\w.\-]+\.)?version>)'

Component breakdown

Component Purpose
(?<=>) Fixed-width lookbehind — anchors after > (Python re requires fixed-width lookbehinds)
[^<]+ Captures version content (everything up to the next <)
(?=</ Start of lookahead — next chars must be </
(?:[\w.\-]+\.)? Optional non-capturing group: one or more word/dot/hyphen chars followed by a dot (e.g., scala-jackson.)
version>) Literal version> — closing the tag

What it matches / rejects

Input Old regex New regex Correct?
<version>2.18.7</version> ✅ match ✅ match
<scala-jackson.version>2.18.6</scala-jackson.version> ❌ no match ✅ match
<scalatest.version>3.2.3</scalatest.version> ❌ no match ✅ match
<description>Some text</description> ❌ no match ❌ no match
<subversion>1.0</subversion> ❌ no match ❌ no match
<!-- version 1.0 --> ❌ no match ❌ no match

Python re limitation

Alan's suggested regex (?<=<((?:[\w-.]+\.)?version)>).+?(?=<\/\1>) uses a variable-length lookbehind with a backreference, which Python's re module does not support (error: look-behind requires fixed-width pattern). The alternative places all variable-width logic in the lookahead, which has no such restriction.

Validation

Ran update_versions.py --sr across all 874 POM files:

Regex Files changed Files
Old (<version> only) 0
New (tightened) 4 azure-cosmos-spark_3/pom.xml, spark_3-5/pom.xml, spark_3-5_2-12/pom.xml, spark_4/pom.xml

All 4 changes are scala-jackson.version property bumps from stale values (2.18.4 / 2.18.6) to 2.18.7 — identical to the manual fix in PR #49263.

Known limitation

Tags like <my-custom-property> (no .version suffix) would still be skipped. This is intentional — the script should only auto-update version-related properties.

…ersion> tags

The regex previously only matched version values inside <version>...</version>
elements. Custom Maven property tags like <scala-jackson.version>2.18.6
</scala-jackson.version> that carry valid {x-version-update} comments were
silently skipped, causing version drift and bannedDependencies failures
(see PR #49263 for the immediate fix).

The new regex uses a lookahead restricted to closing tags that are either
</version> or </something.version> (e.g. </scala-jackson.version>),
rejecting unrelated XML elements like </description> or </subversion>.

Python's re module does not support variable-length lookbehinds, so the
pattern uses a fixed-width lookbehind (?<=>)  and places the restriction
entirely in the lookahead (?=</(?:[\w.\-]+\.)?version>).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant