Skip to content

Conversation

@matejrisek
Copy link
Member

@matejrisek matejrisek commented Nov 27, 2025

Introduce on_failure attribute to the action_trigger block.
This attribute will allow consumers to define the behavior if an action fails.
As of now we default to failing run if an error has been returned as part of action invocation.
We'll keep that the default behavior but introduce two options:

on_failure: fail
on_failure: continue

fail is the default behavior - we're just making our choice explicit.
continue is the new behavior we're adding - it will instruct terraform to ignore errors from the action invocation, log that the error has occurred and continue with the run execution.

The on_failure attribute takes inspiration from the namesake attribute in provisioners.

Example configuration

resource "tfcoremock_simple_resource" "test_resource" {
  id     = "test-resource"
  string = "This is a message"

  lifecycle {
    action_trigger {
      events = [before_create]
      actions = [
        action.tfcoremock_simple_resource.example
      ]
      on_failure = continue
    }
  }
}

action "tfcoremock_simple_resource" "example" {
  config {
    string = "Hello from action"
  }
}

Testing

In order to manually test this feature one should use a provider that allows for controlled erroring of actions. For that purpose I've used the modified tfcoremock[GH] provider which will always fail on the action invocation.

Fixes #

Target Release

1.15.x

Rollback Plan

  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

CHANGELOG entry

Add on_failure attribute to the action_trigger block to allow defining different behavior on action failure. For now we support the default fail keyword as well as the new continue keyword which instructs terraform to ignore action failure and continue with the run.

  • This change is user-facing and I added a changelog entry.
  • This change is not user-facing.

@matejrisek matejrisek requested a review from a team as a code owner November 27, 2025 13:34
Copy link
Contributor

@DanielMSchmidt DanielMSchmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I think the basis looks really good, there are a couple of "edge" topics that are still open, but it looks like the right course to me.

},
},

"trigger on_failure set to 'fail' fails the resource": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test cases could be more extensive:

  • we should assert that the resource change has been made (so the right RPC call on the provider has been made)
  • we should check that on_failure = continue continues with the other actions in the actions list and in other action_trigger blocks.
  • we should make sure the behavior with multiple action triggers where the on_failure behavior is mixed is consistent with the expectations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks for this pointers.

They should have been address in this commit: 2a46bea.

Just for the clarification:

we should assert that the resource change has been made (so the right RPC call on the provider has been made)

Do you mean we should assert action invocations or we do something else to verify changes?

) tfdiags.Diagnostics {
switch aii.ActionTrigger.TriggerOnFailure() {
case configs.ActionTriggerOnFailureContinue:
if currentDiags.HasErrors() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of printing the errors we want to continue with we probably want to wrap them in warning level diagnostics so that consumers that work directly with the returned diagnostics (e.g. the stacks runtime) can appropriately handle these diagnostics as well.

Also we should only do this for errors coming from the provider complete event, if a hook sends a diagnostic it is unrelated to the on_failure behavior and we should still pass them through.

ActionTriggerBlockIndex: at.actionTriggerBlockIndex,
ActionsListIndex: at.actionListIndex,
ActionTriggerEvent: triggeringEvent,
ActionTriggerOnFailure: at.onFailure,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now part of plans.LifecycleActionTrigger, so it needs to also be handled in the JSON representation:

case *plans.LifecycleActionTrigger:

Also I think we need to handle this in plans.ActionInvocationInstanceSrc and in the serialization to and from the planfile:

ret.ActionTrigger = &plans.LifecycleActionTrigger{

Copy link
Contributor

@mildwonkey mildwonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋🏻 hi, sorry! This isn't an actual review - I just wanted to make sure we're not actually merging any changes to actions before the PRD gets approved and RFCs are written and approved. We are ready for prototypes and RFCs, not mergable PRs 😁

@jbardin
Copy link
Member

jbardin commented Dec 1, 2025

Just throwing some context in here for thought as the RFC are finalized. These may not end up being the final requirements, but are important to consider when trying to move provisioners to this new model.

In comparison to provisioners:

  • on_failure = fail marks the resource as tainted if there's an error. The provisioner is considered part of the resource, and failure indicates that the status of the resource is unknown.
  • on_failure = fail implies that dependency processing will halt, and no dependencies of the resource will be applied.

The current implementation can't fulfill either of those two points, because the separate apply nodes are not in the dependency chain, and are not evaluated until after the resource has been already recorded as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants