YAML: prevent block-scalar value mutations from corrupting siblings#8154
Open
steve-aom-elliott wants to merge 5 commits into
Open
YAML: prevent block-scalar value mutations from corrupting siblings#8154steve-aom-elliott wants to merge 5 commits into
steve-aom-elliott wants to merge 5 commits into
Conversation
`Yaml.Scalar.value` carries different content depending on `style`: - For PLAIN and quoted scalars, `value` is just the body content. - For FOLDED (`>`, `>-`, `>+`) and LITERAL (`|`, `|-`, `|+`) scalars, `value` is everything after the `>` or `|` indicator: the chomp indicator (if any), the newline terminating the block-scalar header, the indented body, AND the trailing whitespace that bounds the block from the next sibling mapping entry. For block scalars, the boundary newline that visually separates the scalar from the next sibling lives inside the previous scalar's `value`, not in the next entry's `prefix`. A naive `scalar.withValue(...)` replacement clobbers the block envelope and lets the printer glue the next sibling key onto the same line. This affected several built-in recipes: `ChangePropertyValue`, `ChangeValue`, `DeleteProperty` (when a block scalar precedes the deleted entry), and the `YamlValue` / `YamlReference` traits. Adds an internal `BlockScalarUtils` helper with `getBody` / `withBody` that knows about the block envelope, refactors the affected recipes through it, and documents `Yaml.Scalar.value`'s per-style semantics. Helpers are intentionally kept internal for now to avoid promoting public API that downstream recipes could call into and then fail to load on older CLI bundles; promoting them onto `Yaml.Scalar` itself is left as a follow-up once adoption catches up. `MergeYaml` and `CoalesceProperties` exhibit related but distinct structural bugs that need more involved fixes; left for follow-up PRs.
`BlockScalarUtils.getBody` split the body on `\n` only, leaving stray `\r`s in the returned body on CRLF-authored files, and `withBody` always joined interior lines with bare `\n`, producing mixed `\r\n`/`\n` block scalars when editing on Windows. `getBody` now splits on any line-ending form and rejoins with `\n` for a platform-neutral body; `withBody` detects the existing value's line-ending convention from the block header and uses it for interior breaks and the empty-trailing fallback. Adds a focused BlockScalarUtilsTest that exercises each helper in isolation: the recipe path calls getBody then withBody, where a `\r` leaked by getBody is re-absorbed by withBody, masking the bug end-to-end. Also adds CRLF cases to ChangePropertyValueTest, ChangeValueTest, and DeletePropertyKeyTest.
- Drop explanatory inline comments in ChangePropertyValue/ChangeValue where the call to BlockScalarUtils tells the story. - Compress DeleteProperty's prefix-strip comment to one line. - Tighten the Javadoc on Yaml.Scalar.value, BlockScalarUtils, and BlockScalarUtilsTest; drop tangential prose. - Stop linking to Lombok-generated method signatures (withValue, getValue); reference the underlying field instead, with a note about the generated accessor. - Reformat the new block-scalar tests in ChangePropertyValueTest to match the cleaner indentation used in ChangeValueTest / DeletePropertyKeyTest.
jkschneider
reviewed
Jun 30, 2026
| * including them — promoting now would {@code NoSuchMethodError} customer recipes that | ||
| * adopted the new API but load against an older bundled {@code Yaml.Scalar}. | ||
| */ | ||
| public final class BlockScalarUtils { |
Member
There was a problem hiding this comment.
Not super comfortable with Utils classes. Can you search for where else we would put such convience classes often?
Contributor
Author
There was a problem hiding this comment.
There's already a StringUtils in the same package. In the ideal, these would have just been put directly into Yaml.Scalar and not had a utility class at all, but given that changes the available method list on a core tree type, it would mean that newer recipes using those methods but running on an older CLI would more than likely end up throwing NoSuchMethodError, unless I've misunderstood the whole parent-loading thing.
jkschneider
reviewed
Jun 30, 2026
4 tasks
The fallback added a '\n' whenever the existing scalar's value had no trailing whitespace. For block scalars parsed directly from source the value always carries trailing whitespace, so the fallback never fires in the consumers shipped in this PR. For scalars produced upstream by autoFormat (used by the merge-family recipes touched in the follow-up), the boundary newline lives in the next entry's prefix and the existing value legitimately has no trailing whitespace — adding one there introduces an extra blank line between the scalar and its successor. Preserve the existing trailing whitespace exactly. No behavioral change for any consumer in this PR; surfaces the bug to be fixed in the MergeYaml follow-up where it actually matters.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Several built-in
rewrite-yamlrecipes silently corrupt YAML files containing FOLDED (>,>-,>+) or LITERAL (|,|-,|+) block scalars by clobbering the block envelope when the mapped property's value is replaced.This PR fixes the value-replacement path through a new internal
BlockScalarUtilshelper, refactors the affected built-ins to use it, and documents the semantic asymmetry onYaml.Scalar.value.Root cause
Yaml.Scalar.valuecarries different content depending onstyle:>or|indicator — the chomp indicator (if any), the newline terminating the block-scalar header, the indented body, and the trailing whitespace that bounds the block from the next sibling mapping entry.For block scalars, the boundary newline that visually separates the scalar from the next sibling lives inside the previous scalar's
value, not in the next entry'sprefix. A naïvescalar.withValue(...)replacement clobbers the block envelope; the printer then concatenates[> or |] + newPlainString + [next entry's empty prefix] + [next key], producing one-line corruption likekey: >replacedafter: tail.Round-trip without edits is byte-clean — this is purely a value-mutation problem in recipes, not a parser or printer bug.
What's fixed
ChangePropertyValueChangeValueDeleteProperty(delete entry after block scalar)trait/YamlValue.withValue(String)BlockScalarUtils.withBodytrait/YamlReference.rename(...)BlockScalarUtils.withBody14 new parameterized tests across
ChangePropertyValueTest,ChangeValueTest,DeletePropertyKeyTest.API surface
The helpers (
getBody,withBody) are intentionally placed inorg.openrewrite.yaml.internal.BlockScalarUtilsrather than exposed as instance methods onYaml.Scalar. The Javadoc onBlockScalarUtilsrecords the future intent to promote them ontoYaml.Scalaronce enough downstream environments have rolled forward — exposing them now risks downstream recipes depending on methods that older bundledrewrite-yamlversions don't provide.Yaml.Scalargets a Javadoc-only update on thevaluefield documenting the per-style semantics and pointing toBlockScalarUtils. No new fields, no new public methods, no constructor changes.What's deliberately not fixed
MergeYamlVisitor— the value-replacement path inmergeScalarhas the same shape, but fixing it in isolation regressesaddLiteralStyleBlockWhichDoesAlreadyExistbecause the surrounding merge-insertion logic computes new-entry prefixes assuming the broken value semantics. Needs a coordinated fix to bothmergeScalarand the insertion prefix calculation. Follow-up PR.CoalescePropertiesVisitor— different bug entirely: the recipe doesn't reach steady state when block scalars are present; restructured mapping indentation is wrong. Not awithValueclobber. Follow-up PR.Test plan
rewrite-yamltests passChangePropertyValue, FOLDED/LITERAL/KEEP forChangeValue, delete-after/-before/literal-keep cases forDeleteProperty)MergeYamlTest,CoalescePropertiesTest, or other yaml tests