Skip to content

Comments

fix: reject control characters in job names#263

Merged
crowecawcaw merged 3 commits intoOpenJobDescription:mainlinefrom
crowecawcaw:fix-control-chars
Feb 20, 2026
Merged

fix: reject control characters in job names#263
crowecawcaw merged 3 commits intoOpenJobDescription:mainlinefrom
crowecawcaw:fix-control-chars

Conversation

@crowecawcaw
Copy link
Contributor

What was the problem/requirement? (What/Why)

The spec bans control characters in job names, but this package allowed them.

Spec reference: https://github.com/OpenJobDescription/openjd-specifications/blob/mainline/wiki/2023-09-Template-Schemas.md#111-jobname

What was the solution? (How)

Add a regex to reject them.

What is the impact of this change?

How was this change tested?

Add unit tests. Also by running the conformance tests against the changes: OpenJobDescription/openjd-specifications#103

Was this change documented?

n/a

Is this a breaking change?

No

Does this change impact security?

No

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Stephen Crowe <6042774+crowecawcaw@users.noreply.github.com>
@crowecawcaw crowecawcaw requested a review from a team as a code owner February 11, 2026 22:07
@crowecawcaw crowecawcaw changed the title fix: reject control characteres in job names fix: reject control characters in job names Feb 11, 2026
# Max length is validated after resolution in Job model, not here
# because the template name can contain format strings
# All unicode except the [Cc] (control characters) category
_regex = f"(?-m:^[^{_Cc_characters}]+\\Z)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and _standard_string_regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model uses two regex engines: one is Rust's that's embedded in Pydantic and the other is Python's. They have slightly different syntax. _standard_string_regex is formatted for Pydantic/Rust's engine with \z and this regex is formatted for Python with \Z. This regex in this class is eventually evaluated here:

Here's another example of Python regexes in use in the model:

_regex = f"(?-m:^[^{_Cc_characters}]+\\Z)"

Not sure there's a good reason for it. I'm guessing it is an outcome of upgrading from Pydantic v1 to v2.

Copy link
Contributor

@joel-wong-aws joel-wong-aws Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. From Kiro:

_standard_string_regex = rf"(?-m:^[^{_Cc_characters}]+\z)"

\\Z (uppercase Z): Matches at the end of the string OR just before a newline at the end

"hello\n" ✅ matches
"hello" ✅ matches

\z (lowercase z): Matches only at the absolute end of the string

"hello\n" ❌ does not match
"hello" ✅ matches

The lowercase \z seems to be the correct choice here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there being something different between \z and \Z between pydantic v2 and Python regexes, because pydantic v2 switched to using the Rust regex crate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The \Z vs \z difference is intentional due to two different regex engines being in play:

  • _standard_string_regex (used by JobName, StepName, EnvironmentName via pydantic StringConstraints) uses lowercase \z — this is correct because pydantic v2 validates these patterns via pydantic-core, which uses the Rust regex crate. In Rust regex, \z is the absolute end-of-string anchor. Ref

  • JobTemplateName._regex (used by DynamicConstrainedStr._validate) uses uppercase \Z — this is correct because validation runs through Python's re.match(). In Python's re module, \Z matches only at the absolute end of the string (it does NOT match before a trailing newline, unlike $). Python doesn't support lowercase \z in versions before 3.14 — it throws re.PatternError: bad escape \z. Ref

So both are equivalent in behavior (absolute end-of-string), just spelled differently for their respective engines.

I've added additional tests to verify newlines are handled correctly.

@leongdl
Copy link

leongdl commented Feb 18, 2026

I think we should add a fuzz test suite to OpenJD and have broader testing. Maybe a good backlog item for the OpenJD team.

Signed-off-by: Stephen Crowe <6042774+crowecawcaw@users.noreply.github.com>
@crowecawcaw
Copy link
Contributor Author

We actually have a new-ish fuzzer that uncovered a couple issues: https://github.com/OpenJobDescription/openjd-model-for-python/blob/mainline/test/openjd/model/test_fuzz.py

We could definitely extend it for better coverage. In this particular case, I don't think it would have helped because the problem was not that a regex was faulty, it's that the library was missing a validation.

@crowecawcaw crowecawcaw enabled auto-merge (squash) February 20, 2026 18:01
@crowecawcaw crowecawcaw disabled auto-merge February 20, 2026 18:01
@crowecawcaw crowecawcaw enabled auto-merge (squash) February 20, 2026 18:01
@sonarqubecloud
Copy link

@crowecawcaw crowecawcaw merged commit 0d428c4 into OpenJobDescription:mainline Feb 20, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants