Identify unexpected nodes in error recovery scenarios

In error recovery scenarios (for example for invalid Ruby code) Prism can return unexpected node types in fields they wouldn't be for valid Ruby. For example:

```rb
module def foo end
```

Prism parses this as a `ModuleNode` with a `DefNode` in the `constant_path` field. The allowed/expected nodes are listed in `config.yml` [here](https://github.com/ruby/prism/blob/801d98222a5e78765b5dae48d9ee88f8c2933784/config.yml#L3698-L3703) and we use `on error:` to signal that in error recovery scenarios there might be something other than `ConstantReadNode` or `ConstantPathNode`. In this case, we only have `on error: MissingNode` but it can actually be `DefNode` or something else (perhaps this is an oversight and all possible error nodes should be listed here?).

Clients using Prism may want to deal with invalid Ruby syntax in all kinds of different ways, but I suspect the most common approach would be to simply ignore unexpected nodes and maybe stop traversing that branch of the AST. In any case, clients cannot expect only valid node types in fields, as shown above.

One way for clients to deal with unexpected node types would be to guard based on each of the expected node types, for example:

```cpp
if (!PM_NODE_TYPE(node, PM_CONSTANT_READ_NODE) && !PM_NODE_TYPE(node, PM_CONSTANT_PATH_NODE)) {
  // handle error case
}
```

In some cases this might be quite verbose, with many valid node types.

Another approach (assuming the `on error` indicators in `config.yml` are exhaustive) would be to check for known error cases:

```cpp
if (PM_NODE_TYPE(node, PM_MISSING_NODE) || PM_NODE_TYPE(node, PM_DEF_NODE)) {
  // handle error case
}
```

Again, this could potentially be quite verbose and I'm not sure if `config.yml` is currently expected to exhaustively list all possible `on error` cases.

## Proposal

Prism could introduce an `UnexpectedNode` type (similar to `MissingNode`) that would be returned in error recovery scenarios, wrapping the original node type.

```cpp
if (PM_NODE_TYPE(node, PM_MISSING_NODE) || PM_NODE_TYPE(node, PM_UNEXPECTED_NODE)) {
  // handle error case
}
```

Or introduce a helper covering both cases:

```cpp
if (PM_ERROR_NODE(node)) {
  // handle error case
}
```

The client could more easily just ignore error recovery cases, or they could do something appropriate with `node->child` in the case of `UnexpectedNode`.

I'm sure there are other approaches too, but this one seemed quite straightforward. However, introducing a new node type would be a breaking change, so perhaps there is a non-breaking alternative, such as adding a flag to existing node types?

## Prototype

I prototyped it [here](https://github.com/ruby/prism/compare/main...thomasmarshall:prism:validate-node-types) but I thought I should open an issue first, to see if this is something you would consider, rather than spending more time tidying up the code for a PR. That prototype also ensures `MissingNode` is always added in cases where a required field is null, but that's potentially a separate issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Identify unexpected nodes in error recovery scenarios #3828

Proposal

Prototype

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Identify unexpected nodes in error recovery scenarios #3828

Description

Proposal

Prototype

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions