Skip to content

Implement Log4j filter to deduplicate log messages#19173

Open
andsel wants to merge 5 commits into
elastic:mainfrom
andsel:feature/deduplication_of_logs
Open

Implement Log4j filter to deduplicate log messages#19173
andsel wants to merge 5 commits into
elastic:mainfrom
andsel:feature/deduplication_of_logs

Conversation

@andsel

@andsel andsel commented May 28, 2026

Copy link
Copy Markdown
Member

Release notes

[rn:skip]

What does this PR do?

Implemented a log4j filter that uses Guava's Bloom filter to check if a log message at certain level was already emitted, and then avoid to emit again. This filter can work both attaching to a logger or to an appender.

Why is it important/What is the impact to the user?

Permit the user to configure the drop filter on lines that could be really noisy if repeated too much.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

  1. configure a logger to use the filter:
    logger.periodic_flusher.name = org.logstash.execution.PeriodicFlush
    logger.periodic_flusher.level = DEBUG
    logger.periodic_flusher.filter.dedup.type = DeduplicationFilter
  2. run Logstash with:
    bin/logstash -e 'input { generator { message => "test" } } output { sink{} }' --debug
  3. verify that in logs the following line appear just once instead of every 5 seconds:
    [org.logstash.execution.PeriodicFlush][main] Pushing flush onto pipeline 
    

Related issues

@andsel andsel self-assigned this May 28, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

@mergify

mergify Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

This pull request does not have a backport label. Could you fix it @andsel? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

@andsel andsel changed the title Implemented a log4j filter to deduplicate log mesages Implement log4j filter to deduplicate log messages May 28, 2026
@andsel andsel changed the title Implement log4j filter to deduplicate log messages Implement Log4j filter to deduplicate log messages May 28, 2026
@andsel andsel requested a review from Copilot May 28, 2026 14:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Log4j2 filter plugin to suppress repeated log lines (per-level + formatted message) using a Guava Bloom filter, addressing the need for a standard deduplication mechanism to reduce log spam in Logstash.

Changes:

  • Introduces DeduplicationFilter Log4j2 plugin that denies log events whose deduplication key has been seen before.
  • Adds unit tests covering basic dedup behavior and false-positive-probability validation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
logstash-core/src/main/java/org/logstash/log/DeduplicationFilter.java Implements the Bloom-filter-backed Log4j2 filter and exposes a configurable false-positive probability.
logstash-core/src/test/java/org/logstash/log/DeduplicationFilterTest.java Adds unit tests for first-seen vs. repeated messages and parameter validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread logstash-core/src/main/java/org/logstash/log/DeduplicationFilter.java Outdated
Comment thread logstash-core/src/test/java/org/logstash/log/DeduplicationFilterTest.java Outdated
@andsel andsel requested a review from Copilot May 28, 2026 15:16
@andsel andsel marked this pull request as ready for review May 28, 2026 15:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

andsel added 5 commits May 28, 2026 17:27
… a log message at certain level was already emitted
… only if not previously seen so that it avoid the false positive, being on the safe side of filter, where it's certain if an element is not part of the filter.
@andsel andsel force-pushed the feature/deduplication_of_logs branch from 682dd8c to 4687dcb Compare May 28, 2026 15:28
@elasticmachine

Copy link
Copy Markdown

💚 Build Succeeded

History

cc @andsel

@alexcams alexcams left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions I think worth to double check, but no need to block the PR for that!

}

static double resolveFalsePositiveProbability(final double falsePositiveProbability) {
if (falsePositiveProbability > 0.0 && falsePositiveProbability < 1.0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the upper bound be snaller here? A falsePositiveProbability of 0.5 means randomly dropping 50% of new logs detecting them as false positives. I find it difficult to find a realistic use case for that. Maybe cap it at 0.1 or 0.05 instead of 1.0? WDYT?

if (falsePositiveProbability > 0.0 && falsePositiveProbability < 1.0) {
return falsePositiveProbability;
}
return DEFAULT_FALSE_POSITIVE_PROBABILITY;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be possible to add a warn log here to not be silent? I'm not sure since this is a logger config file...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a standard mechanism to avoid logging multiple times the same line.

4 participants