-
-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Labels
Description
Our current Janitor implementation could be improved by looking the queue mode the we use (STICKY_POLLING or STICKY_EVENTS, POLLING).
- For
STICKY_EVENTS(default config for the bus) any rebalancing of entries will not work unless we restart the nodes - as the code maintains an in-memory inflight queue. - For
POLLING, it is also questionable whether we should enable janitor or not given that all nodes are equal and will anyways pick entries that have not completed (once claim time has expired).
So, the main use case for Janitor seems to be STICKY_POLLING (default for notification queue).
Current implementation:
- The code identifies late entries for the current node which it ignores √
- Then, there is default else statement, a sort of catch-all statement which can be broken in 2 separate sub cases:
entryLeftBehind.getProcessingOwner() != null: This is a legit case of stuck entry. It means that either a node started to process an entry and never completed or that a node disappeared √! owner.equals(entryLeftBehind.getCreatingOwner()): In this case, we can't tell if this is a case of being late or if this is a case of stuck entry.
For the else (catch-all statement), we should instead break into the following - the 3 remaining cases outside from the if:
entryLeftBehind.getProcessingOwner() != null&&owner.equals(entryLeftBehind.getCreatingOwner()): A real use case of stuck entries, i.e the current node is clearly alive but the entry has beenIN_PROCESSINGfor a very long time. -> This should be WARNedentryLeftBehind.getProcessingOwner() != null&&! owner.equals(entryLeftBehind.getCreatingOwner()): Likely to show a case where node disappeared. If not, then WARN from previous case would be seen by the target nodeentryLeftBehind.getCreatingOwner().entryLeftBehind.getProcessingOwner() == null&&! owner.equals(entryLeftBehind.getCreatingOwner()): Also likely to show a case where node disappeared. If not, then WARN would be seen by the target nodeentryLeftBehind.getCreatingOwner().
So in summary:
- Disable janitor for
STICKY_EVENTS - Probably disable janitor for
POLLINGas well - For
STICKY_POLLING, implement finer granularity logic as described above.