feat(core): Replace fixed parameters with environment variables#3083
feat(core): Replace fixed parameters with environment variables#3083yx9o wants to merge 1 commit intoAutoMQ:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request replaces a hardcoded failover protection period with a configurable environment variable, enabling dynamic adjustment of the timeout without code modification. The main change allows operators to control when a node should be eligible for failover after entering a new epoch.
- Replaced
TimeUnit.MINUTES.toMillis(1)withSystems.getEnvLong()for dynamic configuration - Removed unused
TimeUnitimport and addedSystemsutility import - Default value remains 60000ms (1 minute) when environment variable is not set
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", 60L * 1000); | ||
| private final int id; |
There was a problem hiding this comment.
The Systems.getEnvLong method can throw NumberFormatException if the environment variable AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS is set to an invalid value (e.g., non-numeric string). This would cause the class to fail during static initialization, potentially crashing the application at startup. Consider adding validation or catching the exception with a fallback to the default value, or document the expected format clearly.
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", 60L * 1000); | |
| private final int id; | |
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = getDontFailoverAfterNewEpochMs(); | |
| private final int id; | |
| private static long getDontFailoverAfterNewEpochMs() { | |
| final long defaultValue = 60L * 1000; | |
| try { | |
| return Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", defaultValue); | |
| } catch (NumberFormatException e) { | |
| // Optionally log the error here if a logger is available | |
| return defaultValue; | |
| } | |
| } |
| @@ -40,7 +41,7 @@ public final class NodeRuntimeMetadata { | |||
| * @see ClusterControlManager#getNextNodeId() | |||
| */ | |||
| private static final int MAX_CONTROLLER_ID = 1000 - 1; | |||
There was a problem hiding this comment.
The new environment variable AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS should be documented with a comment explaining its purpose, acceptable values (non-negative long in milliseconds), and the default value (60000ms = 1 minute). This helps operators understand how to configure this parameter.
| private static final int MAX_CONTROLLER_ID = 1000 - 1; | |
| private static final int MAX_CONTROLLER_ID = 1000 - 1; | |
| /** | |
| * The minimum time (in milliseconds) after a node starts a new epoch before failover is allowed. | |
| * Controlled by the environment variable {@code AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS}. | |
| * <p> | |
| * Purpose: Prevents failover immediately after a node restarts, allowing time for recovery. | |
| * Acceptable values: non-negative long (milliseconds). | |
| * Default value: 60000ms (1 minute). | |
| */ |
| */ | ||
| private static final int MAX_CONTROLLER_ID = 1000 - 1; | ||
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = TimeUnit.MINUTES.toMillis(1); | ||
| private static final long DONT_FAILOVER_AFTER_NEW_EPOCH_MS = Systems.getEnvLong("AUTOMQ_CONTROLLER_DONT_FAILOVER_AFTER_NEW_EPOCH_MS", 60L * 1000); |
There was a problem hiding this comment.
The behavior of DONT_FAILOVER_AFTER_NEW_EPOCH_MS now depends on an environment variable, but there are no tests verifying that shouldFailover() works correctly with different values of this parameter. Consider adding tests that validate the failover logic with various timeout values to ensure the environment variable configuration works as expected.
The failover protection period parameter can be dynamically configured. This time can be adjusted by setting environment variables without requiring code modification and recompilation.