Skip to content

dnsdist: Optionally keep all data in LRU cache#16692

Draft
karelbilek wants to merge 2 commits intoPowerDNS:masterfrom
karelbilek:kb/alwaysKeepStaleData
Draft

dnsdist: Optionally keep all data in LRU cache#16692
karelbilek wants to merge 2 commits intoPowerDNS:masterfrom
karelbilek:kb/alwaysKeepStaleData

Conversation

@karelbilek
Copy link
Contributor

@karelbilek karelbilek commented Dec 31, 2025

Short description

To be able to keep all data in cache, and not delete expired, we need another mechanism to evict from cache; and LRU is the most useful here, as the actually used elements will stay in the cache.

MaybeLruCache is either behaving like a regular map, if isLRU is false, or also has a linked list+map for constant-time putFront

One of the changes is also that finding in cache can now require a write lock.

I have hand-tested this code and it works for my purposes. I have not yet written any unit tests or regression tests.

For my local testing, I have had assert in the code to check that all the three containers have same size at all time; but I have removed it here.

Checklist

I have:

  • read the CONTRIBUTING.md document
  • read and accepted the Developer Certificate of Origin document, including the AI Policy, and added a "Signed-off-by" to my commits
  • compiled this code
  • tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)

}
}

std::pair<bool, bool> DNSDistPacketCache::getWriteLocked(MaybeLruCache<uint32_t, CacheValue>& map, DNSQuestion& dnsQuestion, bool& stale, PacketBuffer& response, time_t& age, uint32_t key, bool recordMiss, time_t now, uint32_t allowExpired, bool receivedOverUDP, bool dnssecOK, const std::optional<Netmask>& subnet, bool truncatedOK, uint16_t queryId, const DNSName::string_t& dnsQName)
Copy link
Contributor Author

@karelbilek karelbilek Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are all new functions but the code is moved from get; it is here just so it can all work with both read lock and write lock. The function signature is very ugly though.

@rgacogne rgacogne self-requested a review January 5, 2026 10:58
@karelbilek karelbilek force-pushed the kb/alwaysKeepStaleData branch from 6139fca to d83bf64 Compare January 12, 2026 05:18
@coveralls
Copy link

coveralls commented Jan 12, 2026

Pull Request Test Coverage Report for Build 21174868678

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 127 of 199 (63.82%) changed or added relevant lines in 5 files are covered.
  • 45 unchanged lines in 12 files lost coverage.
  • Overall coverage increased (+4.8%) to 71.55%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pdns/dnsdistdist/dnsdist-cache.cc 78 101 77.23%
pdns/dnsdistdist/dnsdist-cache.hh 41 90 45.56%
Files with Coverage Reduction New Missed Lines %
pdns/backends/gsql/gsqlbackend.hh 1 97.77%
pdns/dnsdistdist/dnsdist-cache.cc 1 79.57%
pdns/misc.cc 1 61.5%
pdns/signingpipe.cc 1 85.83%
pdns/dnsdistdist/dnsdist-async.cc 2 79.59%
pdns/recursordist/aggressive_nsec.cc 2 66.39%
pdns/opensslsigners.cc 3 61.41%
pdns/remote_logger.cc 3 55.98%
pdns/recursordist/pdns_recursor.cc 4 75.61%
pdns/recursordist/lwres.cc 7 66.0%
Totals Coverage Status
Change from base Build 21173332132: 4.8%
Covered Lines: 128870
Relevant Lines: 165197

💛 - Coveralls

@rgacogne
Copy link
Member

Just to keep you updated, I have not forgotten about this PR! The code is actually a lot cleaner than I feared, and I'm trying to see if we can make it more generic so it would also fit with a S3-FIFO experiment that I have had in a branch for a while.

@karelbilek
Copy link
Contributor Author

I have looked at S3-Fifo before and it went a bit over my head. LRU was simpler to understand/implement.

I will look what have you implemented. S3-Fifo is lock-free, which is great.

@karelbilek
Copy link
Contributor Author

I am seeing your branch - https://github.com/rgacogne/pdns/commits/ddist-s3-fifo-rebased

With that branch, I am not sure how much of my branch would even be necessary.

@rgacogne
Copy link
Member

Note that:

  • it has not been updated for a while, I need to rebase it
  • it has been very lightly tested, I would not run it in production

@karelbilek
Copy link
Contributor Author

The more I am reading about S3-Fifo, the more I realize it's not really right fit for my use-case here.

Broadly speaking, my use-case here is: I want to use dnsdist as a local stub resolver - and in the case where the upstream is down, I want to return at least something; even in case the TTL is old.

I do not care about throughput (as it's minimal anyway in the local case); I want to keep things in cache as long as possible. I don't care about lock contention that much, because there isn't that many requests anyway. The LRU with write locking on each operation, as I did in my implementation, is in the end ideal. (for my usecase)

The only upside of potentially using S3-FIFO would be that it would already be implemented in dnsdist and there wouldn't need to be another cache eviction algorithm :)

@rgacogne
Copy link
Member

rgacogne commented Jan 19, 2026

Yup, I agree that S3-FIFO is not really the silver bullet I was hoping for, which is why this branch is dusty :) It works very nicely in a lot of cases but is not optimal in others.
What I would really like to have, and I think your current branch is a step in the right direction, is to make it easy to switch between the current behaviour, LRU and S3-FIFO, with minimal code duplication and complexity.

@karelbilek
Copy link
Contributor Author

I have realized LRU cache might be good even when you don't have keepStaleData

so I have changed the code somewhat; instead of the ugly named alwaysKeepStaleData, I have added lru option, that keeps the LRU behavior whatever the keepStaleData value. Still, when the lru is set to true AND the keepStaleData is true, the expiring thread doesn't remove the old entries. So, now to keep stale data always, you need to set both keepStaleData and lru.

This seems to me less ugly.

@karelbilek karelbilek force-pushed the kb/alwaysKeepStaleData branch 3 times, most recently from 224d787 to 9c9277e Compare January 20, 2026 13:42
@rgacogne rgacogne added this to the dnsdist-2.1.0 milestone Jan 29, 2026
@rgacogne
Copy link
Member

I'm tentatively putting this in the 2.1.0 milestone and I'll do my best to make it land for 2.1.0, but it might have to wait until 2.2.0.

@rgacogne
Copy link
Member

Note to self: also consider SIEVE which is much simpler than S3-FIFO and still has the property of not requiring a write-lock on a cache-hit: https://cachemon.github.io/SIEVE-website/

@karelbilek
Copy link
Contributor Author

SIEVE looks very easy to implement indeed. I will have a look at it

@rgacogne
Copy link
Member

There is only one thing that requires special attention: the hand iterator needs to be properly updated when the queue is modified.

@karelbilek
Copy link
Contributor Author

@rgacogne do you think it would be better to allow both SIEVE and LRU - which is more configurable, but will make the code slightly more complex for doubtful benefit? I cannot decide.

Looking at others, BIND seems to implement just SIEVE after recent changes

@rgacogne
Copy link
Member

I think supporting only SIEVE is enough.

@karelbilek
Copy link
Contributor Author

Looking more closely at SIEVE, it seems it might have some surprising properties that might not be ideal.

If the "hand" comes all the way to the "head", SIEVE will keep evicting the latest entry unless it's visited at least once. Which might be fine! But it can be surprising. I am not entirely sure until what circumstances would that happen.

I have implemented SIEVE already (not committed yet) but I am thinking I might keep both in the end.

struct DNSQuestion;

// lru cache is NOT locked; we need to use shared lock
template <typename K, typename V>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am editing the code more, I start to wonder if the templating is worth it, when it's never being used anywhere else. I will keep it for now though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's not useful at the moment, and we can add it back later if we reuse that code with different types.

This is not yet doing any actual change; it is just to make the next commit diff smaller

Signed-off-by: Karel Bilek <kb@karelbilek.com>
Signed-off-by: Karel Bilek <kb@karelbilek.com>
@karelbilek karelbilek force-pushed the kb/alwaysKeepStaleData branch from ec7a948 to a40bf9e Compare February 19, 2026 15:31
@karelbilek
Copy link
Contributor Author

I have reworked the PR very significantly.

I have made the cache container abstract and made the methods virtual; that allowed me to implement all 3 caches (the default "non-eviction" map-based one, LRU and SIEVE).

The virtual dispatch might add some overhead, but my intuition was that the burtle hashing will be more expensive anyway and the virtual dispatch won't matter. I did not make any bigger measurements or profiling.

The SIEVE container still goes through every element for TTL expiration. The SIEVE authors mention in the paper, that a possible optimization is to make TTL range buckets as in Segcache (other cache system of some of the authors) and then just look at the end of the list for each; I have not done that here and leave it for later.

@karelbilek
Copy link
Contributor Author

When I try to run regression tests with sieve eviction, TestCachingCacheFull fails; but that's to be expected.

static bool receivedOverUDP = true;

static void test_packetcache_simple(bool shuffle)
static void test_packetcache_simple(bool shuffle, DNSDistPacketCache::EvictionType eviction)

Check warning

Code scanning / CodeQL

Poorly documented large function Warning test

Poorly documented function: fewer than 2% comments for a function of 114 lines.
Copy link
Member

@rgacogne rgacogne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only done a very high-level scan but I like it, thanks a lot!
The next step would be to test the new eviction algorithms in the regression tests as well, I think?
It's too late in the cycle for 2.1.0 but I'd like to have this in 2.2 :)


if (d_shards.at(shardIndex).d_entriesCount >= (d_settings.d_maxEntries / d_settings.d_shardCount)) {
return;
// with LRU cache, we don't check size; we always insert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// with LRU cache, we don't check size; we always insert
// with LRU or SIEVE cache, we don't check size; we always insert


value = newValue;

map.visit(key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to pass value to visit()? It would prevent a lookup in the SIEVE case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as written now, this code won't get the visited atomic flag in value here, I don't see how we would skip a lookup?

@rgacogne rgacogne modified the milestones: dnsdist-2.1.0, dnsdist-2.2.0 Feb 20, 2026
@karelbilek
Copy link
Contributor Author

I refrained from doing any extensive testing before the general shape is agreed.

Yeah I should add some regression tests + some general tests of the cache eviction logic.

I am not entirely sure if to leave in the code the none case or not. The non-eviction can be simulated with just SIEVE and the speed should be the same. But we will spend extra memory for the linked list and the flag that we don't use.

@rgacogne
Copy link
Member

I refrained from doing any extensive testing before the general shape is agreed.

Yeah I should add some regression tests + some general tests of the cache eviction logic.

Definitely don't spend time on new tests specific to the eviction logic yet, my idea was to see if the existing tests are still passing with the new eviction logic. I expect some won't because they rely on the existing behaviour but most of them should pass;

I am not entirely sure if to leave in the code the none case or not. The non-eviction can be simulated with just SIEVE and the speed should be the same. But we will spend extra memory for the linked list and the flag that we don't use.

The only reason it would make sense for me to keep the "none" case around is if the new eviction algorithms have a noticeable impact on performance that we cannot solve. Otherwise I don't see any reason to keep it, the memory overhead is likely negligible.

@karelbilek
Copy link
Contributor Author

karelbilek commented Feb 23, 2026

Yeah I did test it manually with the regression tests (by manual text replace) and they pass, except for TestCachingCacheFull.

There shouldn't be a big performance impact; but then I did not test it yet.

@karelbilek
Copy link
Contributor Author

karelbilek commented Mar 2, 2026

I have realized; the code in this MR doesn't regularly delete the expired entries in the background thread; however, when the entry is older than d_staleCacheEntriesTTL, it kind of has no point of keeping it there? It will never go through the check and is a dead weight. It will also be "reactivated" with the LRU/visited by accessing it, so it will not delete. (at least not soon.)

I don't know how to solve it though - I don't want to go through the whole cache in the background thread, I like that I got rid of that for SIEVE/LRU; I want to keep the idea of TTL-buckets for later.

One solution is really just go all-in and implement just SIEVE and the TTL buckets (as they don't really work with LRU).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants