Skip to content

Janitor improvements #176

@sbrossie

Description

@sbrossie

Our current Janitor implementation could be improved by looking the queue mode the we use (STICKY_POLLING or STICKY_EVENTS, POLLING).

  1. For STICKY_EVENTS (default config for the bus) any rebalancing of entries will not work unless we restart the nodes - as the code maintains an in-memory inflight queue.
  2. For POLLING, it is also questionable whether we should enable janitor or not given that all nodes are equal and will anyways pick entries that have not completed (once claim time has expired).

So, the main use case for Janitor seems to be STICKY_POLLING (default for notification queue).

Current implementation:

  • The code identifies late entries for the current node which it ignores √
  • Then, there is default else statement, a sort of catch-all statement which can be broken in 2 separate sub cases:
    • entryLeftBehind.getProcessingOwner() != null: This is a legit case of stuck entry. It means that either a node started to process an entry and never completed or that a node disappeared √
    • ! owner.equals(entryLeftBehind.getCreatingOwner()): In this case, we can't tell if this is a case of being late or if this is a case of stuck entry.

For the else (catch-all statement), we should instead break into the following - the 3 remaining cases outside from the if:

  • entryLeftBehind.getProcessingOwner() != null && owner.equals(entryLeftBehind.getCreatingOwner()): A real use case of stuck entries, i.e the current node is clearly alive but the entry has been IN_PROCESSING for a very long time. -> This should be WARNed
  • entryLeftBehind.getProcessingOwner() != null && ! owner.equals(entryLeftBehind.getCreatingOwner()): Likely to show a case where node disappeared. If not, then WARN from previous case would be seen by the target node entryLeftBehind.getCreatingOwner().
  • entryLeftBehind.getProcessingOwner() == null && ! owner.equals(entryLeftBehind.getCreatingOwner()): Also likely to show a case where node disappeared. If not, then WARN would be seen by the target node entryLeftBehind.getCreatingOwner().

So in summary:

  1. Disable janitor for STICKY_EVENTS
  2. Probably disable janitor for POLLING as well
  3. For STICKY_POLLING, implement finer granularity logic as described above.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions