Skip to content

Check allocation success in plan before creating an actual job #185

@arsiesys

Description

@arsiesys

What feature do you want to see added?

Hi,

Context
2 cloud provider configured in this order:

  • Nomad (on prime nodes)
  • Google Cloud Compute (cloud nodes)

Problem
When Jenkins detect an excess of workload, it will ask Nomad first and nomad will ALWAYS create a job in Nomad. If Nomad was out of ressources/available node to allocate the job, we will be in an infinite loop where jenkins will delete then recreate this job.
The provisioning request will never reach the secondary node provider (GCP).

Solution

  • We could make a "nomad plan" before we decide to create a node and check if nomad would have been able to allocate the requested workload.
    For example, in a plan response return by the nomad api, the dict "FailedTGAllocs" would contain data when the allocation would fail.
    In the case the allocation would have been working, it contain "null".
  • Then, if from the plan, the allocation would fail, we stop the loop that create the nodes

I made an example of implementation that I am experimenting for our needs:
c80712c

Someone with more skills in java could probably make it cleaner and even add an option to enable or not this behavior. This would made the plugin compatible with an other cloud provider.

Ressources
https://developer.hashicorp.com/nomad/api-docs/jobs#create-job-plan
c80712c

Upstream changes

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions