ECS Scope Preservation and NODATA‑Scoped Caching for Incoming EDNS Subnet#16819
ECS Scope Preservation and NODATA‑Scoped Caching for Incoming EDNS Subnet#16819jd82k wants to merge 1 commit intoPowerDNS:masterfrom
Conversation
|
Thanks. I'll try to look at this PR the coming time. In the meantime, please make sure the at least PR compiles. |
Pull Request Test Coverage Report for Build 21971072129Details
💛 - Coveralls |
That sounds like a very bad idea, especially for (but not limited to) CNAME chains. An authoritative server can serve a scoped CNAME to some clients and a different one (or A, AAAA, whatever) to others.
This might work in some very specific environments where you can ensure that clients are always sending a very narrow source (/32 and /128), otherwise you might be sending a badly scoped response. None of this should be enabled by default and I'm not convinced we should merge it at all. |
|
@rgacogne 1)
2)
|
Are you sure this is true? I don't think it is:
|
|
Also: see https://www.rfc-editor.org/rfc/rfc7871#section-7.4 which basically says: negative answers should/can be treated as having scope /0. |
|
For CNAME chains, see https://www.rfc-editor.org/rfc/rfc7871#section-7.2.1:
|
|
@rgacogne @omoerbeek 1) RFC7871 §7.4 (negative answers and scope /0) 2) RFC7871 §7.2.1 (CNAME chains)
So the change is not contradicting the RFC; it’s adding a switch to avoid a real operational failure mode when ECS is used. If a deployment wants strict RFC7871 semantics, they keep the default behavior. If they need ECS correctness across scopes, they can disable it. |
|
Example scenario: For
PowerDNS currently ignores that non‑zero scope for NODATA and inserts the negative answer into the global negcache. So when 149.22.95.0/24 queries AAAA again, it receives the same NODATA/SOA result as 111.32.70.0/24, even though the authoritative response intended different answers for different ECS scopes. That is why I added the option: it allows administrators to decide whether NODATA should be forced to scope /0 (current behavior) or whether ECS‑scoped NODATA should not be inserted into the global negcache. The default keeps existing behavior for backward compatibility. |
|
The option |
|
I'm not disagreeing on the problem statement that right now chaining caching resolvers with ECS does not work, but I'm unconvinced about some details of the proposed solution. Otto will do a proper review in the coming days in any case, as he stated already. |
I get That is not a NODATA + SOA response. It isn't the final complete answer, but rec follows the target and constructs the complete response. |
| } | ||
| sendit:; | ||
|
|
||
| if (g_useIncomingECS && comboWriter->d_ecsFound && !resolver.wasVariable() && !variableAnswer) { |
There was a problem hiding this comment.
This is wrong. It will break setting variable from Lua scripts and also cases where SyncRes indicates the answer should not be packetcached.
There was a problem hiding this comment.
In the old implementation, we always returned ECS with scope=0 in that if block.
A scope=0 ECS tells downstream caches the answer is broadly reusable.
That is why we had to exclude variable answers before:
if a variable answer was returned with scope=0, downstream might cache and reuse it too broadly, even though we intentionally do not packet-cache it locally.
My change is different: we no longer force scope=0.
We now return the scope learned from upstream ECS.
So downstream gets the real ECS scope and can shard cache correctly by subnet, instead of being told “global” by scope=0.
Because of that, the old !wasVariable()/!variableAnswer guard is no longer needed for this specific ECS path.
That guard was mainly protecting against the incorrect scope=0 behavior.
| QType d_qtype; // 2 | ||
| mutable vState d_state{vState::Indeterminate}; // 1 | ||
| bool d_auth; // 1 | ||
| uint8_t d_ecsScope{0}; |
There was a problem hiding this comment.
This field seems to add double bookkeeping of scope, which is also in d_netmask.
There was a problem hiding this comment.
d_ecsScope is not duplicate bookkeeping of d_netmask.
d_netmask is a cache selection key (which entry matches a client subnet), while d_ecsScope preserves the response ECS scope to be returned downstream. They can diverge (for example we clamp the stored netmask to min(scope, source-prefix) and in routing-tagged entries netmask may be empty by design), so scope cannot be reliably reconstructed from d_netmask alone.
There was a problem hiding this comment.
I don't think you are right. srcmask in asycnresolve is set to the received ECS including scope. It is an in/out argument to asyncresolve. It is further processed as ednsmask in doResolveAtThisIP and end up finally in updateCacheFromRecords.
Cache entries store the received scope, as it is indicative which client IP match.
There was a problem hiding this comment.
Thanks. I agree with your flow description, but one key point is incorrect: srcmask is not a full representation of the received ECS scope.
Code evidence:
- In lwres.cc, srcmask is reset before parsing the reply (lwres.cc:688).
- The received scope is stored in lwr->d_ednsECScope (lwres.cc:870).
- srcmask is only set when scope != 0, and it is clamped with min(scope, source-prefix) (lwres.cc:875-879).
So srcmask cannot represent all received scope states (notably scope=0).
Also, cache selection uses d_netmask, not d_ecsScope:
- ECS index / best-match is done on netmask (recursor_cache.cc:330+).
- entryMatches() checks d_netmask.match(who) (recursor_cache.cc:406-407).
And d_netmask can intentionally differ from scope:
- cache key netmask is empty when routingTag is present (recursor_cache.cc:670),
- while d_ecsScope is still stored (recursor_cache.cc:717).
Therefore d_ecsScope is not duplicate bookkeeping of d_netmask; they carry different semantics (selection key vs returned ECS scope metadata).
|
For the caching of scoped negative answers: I agree that it can lead to issues, but I don't agree that the recursor is the place to fix this. The single example you mention does not reproduce and the RFC is clear about it (even though it does not use MUST): ECS handling already comes with quite a high cost and complexity. Conditionally not caching negative scoped answers will make this cost and complexity even higher. Not caching some negative answers will cause a potential big increase of outgoing traffic. |
|
I have made some comments on detail, but let's look at the main logic of your PR. An ECS EDSN record has thee fields: address (specifying a network), prefixlength and scope. Desired behavior: If a client sends an ECS (address and prefixlength set) it should receive in the anwer the same address and prefixlength, but with a specific scope added to it. The scope is received from the auth and may be wider that the prefixlength sent. The received scope is also what gets cached and determines if a lookup with ECS set matches in the recordcache. If an answer is constructed from multiple records received from auths e.g. a CNAME chain), it should set a scope that corresponds to the most narrow scope of the records involved. The ECS received from the client is available form Currently the scope is not returned by the recordcache
But more importantly: The scope end result to be sent to the client has to be computed taking into account that multiple records (originally received from multiple auths/the result of multiple recordcache lookups) can be involved. But I don't see that code. Instead I see to code to set the scope based on characteristics of the produced recordset and a single scope value that looks to be set from that last answer record involved. So the main logic of your PR is not doing the right thing. The (recursive) calls to |
Thanks for the detailed review. I agree with your main point, and I have now changed the implementation to compute the final ECS scope as the narrowest scope (largest prefix length) across all RRsets that contribute to the final answer. What is fixed now:
Example:
Example:
Example:
|
|
I need to find some time to confirm your observations and update to the PR. That will not be this week. |
Thanks for the update — no problem. Take your time to verify. I’ve already adjusted the PR logic to aggregate ECS scope as the narrowest scope across all contributing RRsets, and I can help rework any remaining parts once you’ve had time to review. I’ll wait for your follow-up. |
|
The unit test runner no longer links: |
Now it should work. I changed the code. |
|
I was confused why there was no subnet information in the response, so I spent a day troubleshooting until I saw this issue from two years ago...
How to use the information returned by the authoritative/upstream server is the user's business. In any case, hiding the information returned by the upstream server without authorization is not what a recursor should do... |
Thanks for raising this, and I agree with the main point: a recursor should not silently drop ECS information returned by upstream. I have updated the logic so that when incoming ECS is present (and ECS return is enabled), we include ECS in the downstream response. The address/prefix is taken from the incoming client ECS, and the scope is taken from upstream/auth data. For multi-step answers (for example CNAME chains / multiple cache hits), the final scope is now aggregated as the narrowest one (largest prefix length), instead of being overwritten by the last record. So we no longer hide upstream ECS scope by default, and we avoid bad scope selection in composed answers. |
**Options** 1. `ecs-scope-zero-on-no-record` (`scope_zero_on_no_record`), default **true** - Meaning: when enabled, the recursor forces ECS scope to **0** for NODATA responses (including CNAME‑chain NODATA), regardless of the authoritative scope. - When disabled, the authoritative scope is preserved when available. - **Note:** when `scope_zero_on_no_record=false` and the authoritative scope is **0**, negative caching remains **global** (negcache is not ECS‑aware), so there is no ECS scoping effect in that case. 2. `return-incoming-edns-subnet` (`return_incoming_edns_subnet`), default **true** - Meaning: when enabled (and `use-incoming-edns-subnet` is enabled), the recursor echoes an ECS option back to the client. - Purpose: returning ECS in the response conveys the applicable scope, so downstream resolvers and caches can avoid reusing a subnet‑specific answer for unrelated clients. This prevents incorrect upstream/downstream cache reuse. **Record Cache (ECS scope preservation)** 1. **Record cache entries now store ECS scope** - Added `d_ecsScope` to `MemRecursorCache::CacheEntry`. - `replace()` writes scope into the cache entry (only when the entry is ECS‑specific). - `get()` and `handleHit()` return the stored scope via an output parameter. 2. **Cache hits now propagate ECS scope back to the response path** - `SyncRes::doCacheCheck()` and `SyncRes::doCNAMECacheCheck()` read ECS scope from record cache hits and set `d_answerECSScope`, so cached answers return the correct scope. 3. **Cache dump/load includes ECS scope** - `PBCacheEntry` now carries `optional_uint32_ecsScope` so ECS scope survives `getRecordSets()` / `putRecordSets()`. **Negative Cache (NODATA + ECS)** 1. **Avoid global negcache pollution for ECS‑specific NODATA** - In `SyncRes::processRecords()`, when `scope_zero_on_no_record=false` and the authoritative NODATA response contains a **non‑zero** ECS scope, we skip inserting the NODATA entry into `negcache`. - This avoids a global negative cache entry that would incorrectly apply to all scopes. **Response ECS Scope when `scope_zero_on_no_record=true`** 1. **Scope forced to 0 only for true NODATA** - In `pdns_recursor.cc`, ECS scope is forced to 0 only when the final response is NODATA (NoError with no relevant RRSet for the requested QTYPE), including CNAME‑chain NODATA. - Other responses keep the authoritative scope. Signed-off-by: Miaosen Wang <secretandanon@gmail.com>
|
@omoerbeek @rgacogne Please review this PR. Thanks! |
|
It's on our list, spamming does not help. |
Options
ecs-scope-zero-on-no-record(scope_zero_on_no_record), default truescope_zero_on_no_record=falseand the authoritative scope is 0, negative caching remains global (negcache is not ECS‑aware), so there is no ECS scoping effect in that case.return-incoming-edns-subnet(return_incoming_edns_subnet), default trueuse-incoming-edns-subnetis enabled), the recursor echoes an ECS option back to the client.Record Cache (ECS scope preservation)
Record cache entries now store ECS scope
d_ecsScopetoMemRecursorCache::CacheEntry.replace()writes scope into the cache entry (only when the entry is ECS‑specific).get()andhandleHit()return the stored scope via an output parameter.Cache hits now propagate ECS scope back to the response path
SyncRes::doCacheCheck()andSyncRes::doCNAMECacheCheck()read ECS scope from record cache hits and setd_answerECSScope, so cached answers return the correct scope.Cache dump/load includes ECS scope
PBCacheEntrynow carriesoptional_uint32_ecsScopeso ECS scope survivesgetRecordSets()/putRecordSets().Negative Cache (NODATA + ECS)
SyncRes::processRecords(), whenscope_zero_on_no_record=falseand the authoritative NODATA response contains a non‑zero ECS scope, we skip inserting the NODATA entry intonegcache.Response ECS Scope when
scope_zero_on_no_record=truepdns_recursor.cc, ECS scope is forced to 0 only when the final response is NODATA (NoError with no relevant RRSet for the requested QTYPE), including CNAME‑chain NODATA.Short description
Preserves authoritative ECS scope across record cache hits, returns ECS to downstream clients, and avoids global caching of ECS‑specific NODATA responses.
Checklist
I have: