Implement Log4j filter to deduplicate log messages#19173
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label. Could you fix it @andsel? 🙏
|
There was a problem hiding this comment.
Pull request overview
Adds a Log4j2 filter plugin to suppress repeated log lines (per-level + formatted message) using a Guava Bloom filter, addressing the need for a standard deduplication mechanism to reduce log spam in Logstash.
Changes:
- Introduces
DeduplicationFilterLog4j2 plugin that denies log events whose deduplication key has been seen before. - Adds unit tests covering basic dedup behavior and false-positive-probability validation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
logstash-core/src/main/java/org/logstash/log/DeduplicationFilter.java |
Implements the Bloom-filter-backed Log4j2 filter and exposes a configurable false-positive probability. |
logstash-core/src/test/java/org/logstash/log/DeduplicationFilterTest.java |
Adds unit tests for first-seen vs. repeated messages and parameter validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… a log message at certain level was already emitted
… only if not previously seen so that it avoid the false positive, being on the safe side of filter, where it's certain if an element is not part of the filter.
682dd8c to
4687dcb
Compare
💚 Build Succeeded
History
cc @andsel |
alexcams
left a comment
There was a problem hiding this comment.
Just a couple of questions I think worth to double check, but no need to block the PR for that!
| } | ||
|
|
||
| static double resolveFalsePositiveProbability(final double falsePositiveProbability) { | ||
| if (falsePositiveProbability > 0.0 && falsePositiveProbability < 1.0) { |
There was a problem hiding this comment.
Should the upper bound be snaller here? A falsePositiveProbability of 0.5 means randomly dropping 50% of new logs detecting them as false positives. I find it difficult to find a realistic use case for that. Maybe cap it at 0.1 or 0.05 instead of 1.0? WDYT?
| if (falsePositiveProbability > 0.0 && falsePositiveProbability < 1.0) { | ||
| return falsePositiveProbability; | ||
| } | ||
| return DEFAULT_FALSE_POSITIVE_PROBABILITY; |
There was a problem hiding this comment.
Would be possible to add a warn log here to not be silent? I'm not sure since this is a logger config file...
Release notes
[rn:skip]
What does this PR do?
Implemented a log4j filter that uses Guava's Bloom filter to check if a log message at certain level was already emitted, and then avoid to emit again. This filter can work both attaching to a logger or to an appender.
Why is it important/What is the impact to the user?
Permit the user to configure the drop filter on lines that could be really noisy if repeated too much.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files (and/or docker env variables)Author's Checklist
How to test this PR locally
bin/logstash -e 'input { generator { message => "test" } } output { sink{} }' --debugRelated issues