Skip to content

YAML: prevent block-scalar value mutations from corrupting siblings#8154

Open
steve-aom-elliott wants to merge 5 commits into
mainfrom
fix-yaml-block-scalar-value-mutation
Open

YAML: prevent block-scalar value mutations from corrupting siblings#8154
steve-aom-elliott wants to merge 5 commits into
mainfrom
fix-yaml-block-scalar-value-mutation

Conversation

@steve-aom-elliott

Copy link
Copy Markdown
Contributor

Summary

Several built-in rewrite-yaml recipes silently corrupt YAML files containing FOLDED (>, >-, >+) or LITERAL (|, |-, |+) block scalars by clobbering the block envelope when the mapped property's value is replaced.

This PR fixes the value-replacement path through a new internal BlockScalarUtils helper, refactors the affected built-ins to use it, and documents the semantic asymmetry on Yaml.Scalar.value.

Root cause

Yaml.Scalar.value carries different content depending on style:

  • For PLAIN and quoted scalars: just the body content.
  • For FOLDED/LITERAL: everything after the > or | indicator — the chomp indicator (if any), the newline terminating the block-scalar header, the indented body, and the trailing whitespace that bounds the block from the next sibling mapping entry.

For block scalars, the boundary newline that visually separates the scalar from the next sibling lives inside the previous scalar's value, not in the next entry's prefix. A naïve scalar.withValue(...) replacement clobbers the block envelope; the printer then concatenates [> or |] + newPlainString + [next entry's empty prefix] + [next key], producing one-line corruption like key: >replacedafter: tail.

Round-trip without edits is byte-clean — this is purely a value-mutation problem in recipes, not a parser or printer bug.

What's fixed

Recipe Before After
ChangePropertyValue corrupted all 6 chomp variants preserves block envelope, re-indents body
ChangeValue corrupted all 6 chomp variants preserves block style when existing is FOLDED/LITERAL
DeleteProperty (delete entry after block scalar) left an extra blank line strips redundant prefix newline when block-scalar predecessor already supplies the separator
trait/YamlValue.withValue(String) clobbered envelope uses BlockScalarUtils.withBody
trait/YamlReference.rename(...) clobbered envelope uses BlockScalarUtils.withBody

14 new parameterized tests across ChangePropertyValueTest, ChangeValueTest, DeletePropertyKeyTest.

API surface

The helpers (getBody, withBody) are intentionally placed in org.openrewrite.yaml.internal.BlockScalarUtils rather than exposed as instance methods on Yaml.Scalar. The Javadoc on BlockScalarUtils records the future intent to promote them onto Yaml.Scalar once enough downstream environments have rolled forward — exposing them now risks downstream recipes depending on methods that older bundled rewrite-yaml versions don't provide.

Yaml.Scalar gets a Javadoc-only update on the value field documenting the per-style semantics and pointing to BlockScalarUtils. No new fields, no new public methods, no constructor changes.

What's deliberately not fixed

  • MergeYamlVisitor — the value-replacement path in mergeScalar has the same shape, but fixing it in isolation regresses addLiteralStyleBlockWhichDoesAlreadyExist because the surrounding merge-insertion logic computes new-entry prefixes assuming the broken value semantics. Needs a coordinated fix to both mergeScalar and the insertion prefix calculation. Follow-up PR.
  • CoalescePropertiesVisitor — different bug entirely: the recipe doesn't reach steady state when block scalars are present; restructured mapping indentation is wrong. Not a withValue clobber. Follow-up PR.

Test plan

  • All 642 existing rewrite-yaml tests pass
  • 14 new tests pass (6 chomp variants for ChangePropertyValue, FOLDED/LITERAL/KEEP for ChangeValue, delete-after/-before/literal-keep cases for DeleteProperty)
  • Multi-line replacement re-indents correctly
  • Regex replacement mode operates on body only (no longer matches against chomp indicator or trailing whitespace)
  • No regressions in MergeYamlTest, CoalescePropertiesTest, or other yaml tests

`Yaml.Scalar.value` carries different content depending on `style`:

- For PLAIN and quoted scalars, `value` is just the body content.
- For FOLDED (`>`, `>-`, `>+`) and LITERAL (`|`, `|-`, `|+`) scalars,
  `value` is everything after the `>` or `|` indicator: the chomp
  indicator (if any), the newline terminating the block-scalar header,
  the indented body, AND the trailing whitespace that bounds the block
  from the next sibling mapping entry.

For block scalars, the boundary newline that visually separates the
scalar from the next sibling lives inside the previous scalar's
`value`, not in the next entry's `prefix`. A naive
`scalar.withValue(...)` replacement clobbers the block envelope and
lets the printer glue the next sibling key onto the same line.

This affected several built-in recipes: `ChangePropertyValue`,
`ChangeValue`, `DeleteProperty` (when a block scalar precedes the
deleted entry), and the `YamlValue` / `YamlReference` traits.

Adds an internal `BlockScalarUtils` helper with `getBody` /
`withBody` that knows about the block envelope, refactors the
affected recipes through it, and documents `Yaml.Scalar.value`'s
per-style semantics. Helpers are intentionally kept internal for
now to avoid promoting public API that downstream recipes could
call into and then fail to load on older CLI bundles; promoting
them onto `Yaml.Scalar` itself is left as a follow-up once adoption
catches up.

`MergeYaml` and `CoalesceProperties` exhibit related but distinct
structural bugs that need more involved fixes; left for follow-up
PRs.
`BlockScalarUtils.getBody` split the body on `\n` only, leaving stray
`\r`s in the returned body on CRLF-authored files, and `withBody` always
joined interior lines with bare `\n`, producing mixed `\r\n`/`\n` block
scalars when editing on Windows.

`getBody` now splits on any line-ending form and rejoins with `\n` for a
platform-neutral body; `withBody` detects the existing value's line-ending
convention from the block header and uses it for interior breaks and the
empty-trailing fallback.

Adds a focused BlockScalarUtilsTest that exercises each helper in
isolation: the recipe path calls getBody then withBody, where a `\r`
leaked by getBody is re-absorbed by withBody, masking the bug end-to-end.
Also adds CRLF cases to ChangePropertyValueTest, ChangeValueTest, and
DeletePropertyKeyTest.
@steve-aom-elliott steve-aom-elliott added bug Something isn't working test provided Already replicated with a unit test, using JUnit pioneer's ExpectedToFail yaml labels Jun 30, 2026
@steve-aom-elliott steve-aom-elliott marked this pull request as ready for review June 30, 2026 18:35
@steve-aom-elliott steve-aom-elliott added the recipe Requested Recipe label Jun 30, 2026
@steve-aom-elliott steve-aom-elliott moved this from In Progress to Ready to Review in OpenRewrite Jun 30, 2026
- Drop explanatory inline comments in ChangePropertyValue/ChangeValue where
  the call to BlockScalarUtils tells the story.
- Compress DeleteProperty's prefix-strip comment to one line.
- Tighten the Javadoc on Yaml.Scalar.value, BlockScalarUtils, and
  BlockScalarUtilsTest; drop tangential prose.
- Stop linking to Lombok-generated method signatures (withValue, getValue);
  reference the underlying field instead, with a note about the generated
  accessor.
- Reformat the new block-scalar tests in ChangePropertyValueTest to match
  the cleaner indentation used in ChangeValueTest / DeletePropertyKeyTest.
* including them — promoting now would {@code NoSuchMethodError} customer recipes that
* adopted the new API but load against an older bundled {@code Yaml.Scalar}.
*/
public final class BlockScalarUtils {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super comfortable with Utils classes. Can you search for where else we would put such convience classes often?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a StringUtils in the same package. In the ideal, these would have just been put directly into Yaml.Scalar and not had a utility class at all, but given that changes the available method list on a core tree type, it would mean that newer recipes using those methods but running on an older CLI would more than likely end up throwing NoSuchMethodError, unless I've misunderstood the whole parent-loading thing.

Comment thread rewrite-yaml/src/main/java/org/openrewrite/yaml/tree/Yaml.java
The fallback added a '\n' whenever the existing scalar's value had no
trailing whitespace. For block scalars parsed directly from source the
value always carries trailing whitespace, so the fallback never fires in
the consumers shipped in this PR. For scalars produced upstream by
autoFormat (used by the merge-family recipes touched in the follow-up),
the boundary newline lives in the next entry's prefix and the existing
value legitimately has no trailing whitespace — adding one there
introduces an extra blank line between the scalar and its successor.

Preserve the existing trailing whitespace exactly. No behavioral change
for any consumer in this PR; surfaces the bug to be fixed in the
MergeYaml follow-up where it actually matters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working parser recipe Requested Recipe test provided Already replicated with a unit test, using JUnit pioneer's ExpectedToFail yaml

Projects

Status: Ready to Review

Development

Successfully merging this pull request may close these issues.

3 participants