feat: SHEK-16 — in-cluster Kubernetes auto-discovery from ConfigMap by arieradle · Pull Request #29 · arieradle/shekel

arieradle · 2026-06-04T12:12:11Z

Summary

New `shekel/integrations/kubernetes.py` module with `is_k8s_environment()`, `apply_k8s_config()`, `KubernetesPoller`, and `KubernetesSpendReporter` daemon threads
`Budget._record_spend()` raises `BudgetPausedError` (subclass of `BudgetExceededError`) immediately when `_paused_externally` is set by the poller
`Budget.exit` / `aexit` stop poller and reporter on context exit; threads restart on re-entry
Redis backend (`backend=redis` in ConfigMap) wired into `_record_spend()` for distributed cross-pod enforcement
Adds `[k8s]` optional extra (`kubernetes>=28.0`) in `pyproject.toml`; included in `[all]`
97 tests, 97% overall coverage

Known issues — tracked for follow-up

Code review identified 9 bugs. All critical/medium issues are resolved; one open for design clarification:

Ticket	Summary	Priority	Status
SHEK-26	Per-pod cap `Budget` construction recurses infinitely when `SHEKEL_BUDGET_NAME` is set	High	✅ Fixed
SHEK-27	K8s poller/reporter threads not restarted when a `Budget` is reused across multiple `with`-blocks	High	✅ Fixed
SHEK-28	K8s config errors silently swallowed — misconfigured ConfigMap disables K8s features with no log	Medium	✅ Fixed
SHEK-29	`BudgetExceededError` raised on external pause is indistinguishable from normal budget exhaustion	Medium	✅ Fixed
SHEK-30	Redis backend configured from ConfigMap is stored but never used in `_record_spend`	Medium	✅ Fixed
SHEK-31	K8s poller thread leaks in tests that don't enter the budget context	Low	✅ Fixed
SHEK-32	K8s spend reporter misses the exceeding call's cost when a budget limit raises	Medium	✅ Fixed
SHEK-33	`_check_per_pod_limit()` raises unconditionally, ignoring `warn_only` mode	Medium	✅ Fixed
SHEK-34	`scope_mode` / `scope_group_by` stored from ConfigMap but never used	Medium	🔲 Open

Test plan

pytest tests/test_kubernetes_integration.py — all 97 tests pass
pytest --cov=shekel --cov-report=term-missing — 97% overall coverage
Install without [k8s] extra and confirm no import errors
All linters pass: black, isort, ruff, mypy

🤖 Generated with Claude Code

New module shekel/integrations/kubernetes.py: - is_k8s_environment(): detects KUBERNETES_SERVICE_HOST + SHEKEL_BUDGET_NAME - _fetch_configmap(): loads shekel-budget-{name} from the pod's namespace via kubernetes.client.CoreV1Api; soft-imports kubernetes (no crash if absent) - apply_k8s_config(): applies ConfigMap values to Budget fields where still None (priority: explicit kwarg > AGENT_BUDGET_USD env var > ConfigMap) - KubernetesPoller: daemon thread that polls paused key every SHEKEL_POLL_INTERVAL_SECONDS (default 10s); sets _paused_externally Budget._record_spend(): raises BudgetExceededError immediately when _paused_externally is True (before spend accumulation). Budget.__exit__ / __aexit__: stop the poller thread on context exit. pyproject.toml: add [k8s] extra (kubernetes>=28.0); add to [all]; add kubernetes mypy override. 36 tests; 100% coverage on kubernetes.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-06-04T12:14:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

New KubernetesSpendReporter daemon thread (shekel/integrations/kubernetes.py): - Active when ConfigMap has backend=k8s; skipped for backend=redis or absent - Accumulates cumulative LLM spend/calls under threading.Lock (never hold lock across network call) - Flush triggers: flush_every_seconds (time-based), flush_every_usd (USD threshold on delta since last flush), Budget.__exit__/__aexit__ (always, including on exception) - Patch-or-create ConfigMap shekel-spend-{HOSTNAME}: patch first, create on 404 ApiException; any failure logs WARNING and never raises to caller - After successful write updates _last_flush_spent so next flush computes correct delta; baseline unchanged on failure so full cumulative total retried - HOSTNAME absent → flush silently skipped - Correct labels: shekel.dev/spend-report, shekel.dev/budget, shekel.dev/group (omitted when SHEKEL_GROUP_VALUE is empty) Budget._record_spend: calls reporter.on_spend(cost) after each LLM call. Budget.__exit__ / __aexit__: calls reporter.flush_and_stop() on context exit. 41 new tests; 100% coverage on kubernetes.py. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

arieradle · 2026-06-10T07:36:23Z

/improve

qodo-code-review · 2026-06-10T07:36:30Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (3) 📜 Skill insights (0)

Context used

✅ Tickets: 🎫 shekel library: in-cluster auto-discovery from ConfigMap

✅ Compliance rules (platform): 7 rules

1. per_pod_cap not child Budget 📎 Requirement gap ≡ Correctness

Description

per_pod_cap is applied by setting budget._per_pod_cap_usd and enforcing it via
_check_per_pod_limit(), instead of creating a child Budget(max_usd=float(v)) as required by the
ConfigMap-to-Budget mapping. This breaks the specified behavior for per-pod limiting and the
accompanying tests explicitly assert the non-child implementation.

Code

shekel/integrations/kubernetes.py[114]
+        budget._per_pod_cap_usd = float(cm["per_pod_cap"])

Evidence
The checklist mapping explicitly requires per_pod_cap to create a child
Budget(max_usd=float(v)). The new code sets budget._per_pod_cap_usd from the ConfigMap and
enforces it directly in Budget._check_per_pod_limit(), and the new tests assert that no per-pod
child budget exists (assert not hasattr(b, "_per_pod_budget")).
ConfigMap keys are correctly mapped to Budget parameters and behaviors
shekel/integrations/kubernetes.py[112-115]
shekel/_budget.py[814-823]
tests/test_kubernetes_integration.py[577-583]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The `per_pod_cap` ConfigMap key is required to create a child `Budget(max_usd=float(v))` for per-pod limiting, but the current implementation stores a float on the parent budget and enforces it directly.

## Issue Context
Compliance requires the mapping behavior: `per_pod_cap` → create a child `Budget(max_usd=float(v))` when per-pod limiting is used. Current tests also encode the non-child behavior, so they will need updating alongside the implementation.

## Fix Focus Areas
- shekel/integrations/kubernetes.py[112-115]
- shekel/_budget.py[703-706]
- shekel/_budget.py[814-823]
- tests/test_kubernetes_integration.py[577-593]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Per-pod cap bypasses warn_only~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

Budget._check_per_pod_limit() unconditionally raises BudgetExceededError when the per-pod cap is
exceeded, even if warn_only=True. This violates the existing warn_only contract used by other limit
checks and can cause unexpected exceptions in environments relying on warn-only behavior.

Code

shekel/_budget.py[R814-823]

+    def _check_per_pod_limit(self) -> None:
+        """Enforce the per-pod USD cap set via ConfigMap per_pod_cap (SHEK-26)."""
+        if self._per_pod_cap_usd is None:
+            return
+        if self._spent > self._per_pod_cap_usd:
+            from shekel.exceptions import BudgetExceededError  # noqa: PLC0415
+
+            raise BudgetExceededError(
+                self._spent, self._per_pod_cap_usd, self._last_model, self._last_tokens
+            )

Evidence
Other enforcement checks explicitly suppress exceptions when warn_only=True, but the new per-pod cap
check has no such guard, so it will raise in warn-only mode.
shekel/_budget.py[755-807]
shekel/_budget.py[814-823]
tests/test_budget_warn_only.py[13-34]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`Budget._check_per_pod_limit()` raises even when `Budget.warn_only=True`, unlike `_check_limit()` / `_check_call_limit()` which suppress exceptions in warn-only mode.

### Issue Context
`warn_only=True` is documented and tested to mean “enforce silently, never raise”. Per-pod cap should follow the same semantics for consistency.

### Fix Focus Areas
- shekel/_budget.py[814-823]

### Suggested fix
- Mirror `_check_limit()` behavior:
 - call `self._emit_budget_exceeded_event()` when cap exceeded
 - if `self.warn_only`: optionally fire `_check_warn()` and `return`
 - else: raise `BudgetExceededError(...)`

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. RedisBackend missing name arg 📎 Requirement gap ≡ Correctness

Description

When backend=="redis" and REDIS_URL is set, the code instantiates RedisBackend(url=redis_url)
but does not pass name=redis_key as required. This can break correct Redis key naming and violates
the specified ConfigMap-to-budget mapping for Redis backend activation.

Code

shekel/integrations/kubernetes.py[R119-127]

+    if cm.get("backend") == "redis":
+        redis_url = os.environ.get("REDIS_URL")
+        if redis_url:
+            try:
+                from shekel.backends.redis import RedisBackend  # noqa: PLC0415
+
+                budget._k8s_redis_backend = RedisBackend(url=redis_url)
+                budget._k8s_redis_name = cm.get("redis_key", f"shekel:{namespace}:{budget_name}")
+            except ImportError:

Evidence
The rules require that when ConfigMap backend is redis and REDIS_URL is set, RedisBackend
must be created with name=redis_key. The new code creates RedisBackend(url=redis_url) and stores
the key separately in _k8s_redis_name, which does not satisfy the required constructor usage.
Redis backend activation follows ConfigMap/backend and REDIS_URL rules
ConfigMap key-to-budget parameter mapping is implemented as specified
shekel/integrations/kubernetes.py[119-127]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Redis backend activation must instantiate `RedisBackend(url=REDIS_URL, name=redis_key)` (using `redis_key` from ConfigMap, or the default `shekel:{namespace}:{budget_name}`), but the current code does not pass `name=`.

## Issue Context
The compliance spec explicitly requires the Redis backend to use the `redis_key` naming so controller/materialized budgets map to the intended Redis budget key.

## Fix Focus Areas
- shekel/integrations/kubernetes.py[118-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. ~~Paused error missing message~~ ✓ Resolved 📎 Requirement gap ≡ Correctness

Description

Budget._record_spend() raises BudgetExceededError when an external pause is active, but it does
so by directly referencing self._paused_externally (instead of guarding with `getattr(self,
"_paused_externally", False)`) and by omitting the required "Budget paused by Kubernetes controller"
reason, while also often populating limit=self.max_usd or 0.0 which can produce misleading “Budget
of $0.00 exceeded” messages. This breaks the documented paused enforcement semantics and makes
pause-triggered failures hard to distinguish from real budget exhaustion (including in track-only
mode and for nested budgets), confusing downstream handling/logging.

Code

shekel/_budget.py[R662-671]

    def _record_spend(self, cost: float, model: str, tokens: dict[str, int]) -> None:
+        if self._paused_externally:
+            from shekel.exceptions import BudgetExceededError  # noqa: PLC0415
+
+            raise BudgetExceededError(
+                spent=self._spent,
+                limit=self.max_usd or 0.0,
+                model=model,
+                tokens=tokens,
+            )

Evidence

The compliance rule requires _record_spend() to begin with a paused check using `getattr(self,
"_paused_externally", False) and to raise BudgetExceededError` with the specific message/reason
"Budget paused by Kubernetes controller", but the current implementation checks
self._paused_externally directly and raises BudgetExceededError without any pause-specific
reason. Additionally, the paused path supplies limit=self.max_usd or 0.0, and since
BudgetExceededError.__str__ formats errors as "Budget of ${limit:.2f} exceeded", paused budgets
without max_usd will be rendered as "$0.00 exceeded", masking the kill-switch condition and
potentially misreporting the effective limit for nested budgets.

Paused enforcement check is executed at the top of Budget._record_spend()
shekel/_budget.py[662-671]
shekel/exceptions.py[52-92]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Fix `Budget._record_spend()` so that external pause enforcement uses the required `getattr(self, "_paused_externally", False)` guard at the top of the method and raises a `BudgetExceededError` that clearly indicates "Budget paused by Kubernetes controller" (and does not format as a misleading normal "Budget of $X exceeded" exhaustion error, especially when `max_usd` is unset).

## Issue Context
The Kubernetes kill-switch compliance spec requires the paused check to run first in `_record_spend()` and to raise the specified error/reason. Today the code references `self._paused_externally` directly and raises `BudgetExceededError` without the required paused-specific reason; it also commonly sets `limit=self.max_usd or 0.0`, which—given `BudgetExceededError.__str__` formats as "Budget of ${limit:.2f} exceeded"—causes paused budgets (notably when `max_usd` is unset / track-only) to appear as "$0.00 exceeded" and makes pause vs exceed indistinguishable to logs and downstream handlers.

## Fix Focus Areas
- shekel/_budget.py[662-705]
- shekel/exceptions.py[52-92]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. ~~Poller not restarted~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

Budget.__exit__/__aexit__ stop the K8s poller/reporter, but those threads are only started once
during Budget.__init__, so reusing a Budget across multiple with blocks leaves kill-switch
polling and spend reporting permanently disabled after the first exit. This contradicts the
documented “session budget” usage where the same Budget instance is entered multiple times.

Code

shekel/_budget.py[R491-494]

+        if self._k8s_reporter is not None:
+            self._k8s_reporter.flush_and_stop()
+        if self._k8s_poller is not None:
+            self._k8s_poller.stop()

Evidence
The codebase documents reusing a single Budget instance across multiple with blocks, but the new
K8s threads are started only in __init__ and are explicitly stopped on every context exit; there
is no corresponding restart path on __enter__/__aenter__.
shekel/_budget.py[128-134]
shekel/_budget.py[439-495]
shekel/_budget.py[584-639]
shekel/_budget.py[303-319]
shekel/integrations/kubernetes.py[140-157]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`__exit__`/`__aexit__` call `stop()`/`flush_and_stop()` on the K8s poller/reporter, but the only place they are created/started is `Budget.__init__` via `apply_k8s_config(self)`. Budgets are documented as reusable across multiple `with` blocks, so after the first context exit the background threads will never be restarted and K8s kill-switch/reporting silently stops working.

## Issue Context
The `Budget` docstring shows a “Session budget (accumulates across multiple with-blocks)” pattern, which re-enters the same instance multiple times.

## Fix Focus Areas
- shekel/_budget.py[128-134]
- shekel/_budget.py[439-495]
- shekel/_budget.py[584-639]
- shekel/_budget.py[303-319]
- shekel/integrations/kubernetes.py[140-157]

## Suggested fix approach
- Make poller/reporter lifecycle match budget lifecycle:
 - Either **do not stop** the poller/reporter in `__exit__/__aexit__` (let them run for the object’s lifetime),
 - OR, if stopping on exit is desired, then **restart** them on `__enter__/__aenter__` when K8s mode is active and the prior thread is stopped.
- If restarting: set `_k8s_poller/_k8s_reporter` to `None` after stopping, and ensure start logic is idempotent (don’t spawn duplicates on nested enters).
- Consider joining threads (bounded) if you require a clean shutdown before returning from exit.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (2)

6. ~~Per-pod cap recursion~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

apply_k8s_config() creates a new Budget for per_pod_cap, but every Budget.__init__ also
calls apply_k8s_config() in a K8s environment, causing recursive Budget construction until a
RecursionError / huge nested object chain. This can make Budget() construction extremely slow or
unstable whenever the ConfigMap includes per_pod_cap.

Code

shekel/integrations/kubernetes.py[R112-117]

+    # --- per_pod_cap ---
+    if "per_pod_cap" in cm:
+        from shekel._budget import Budget as _Budget  # noqa: PLC0415
+
+        budget._per_pod_budget = _Budget(max_usd=float(cm["per_pod_cap"]))
+

Evidence

apply_k8s_config() constructs a Budget when per_pod_cap is present, and Budget.__init__
unconditionally invokes apply_k8s_config(self) (under the env gate). Since the env gate is
process-wide, the child budget re-enters the same path and repeats the construction.

shekel/integrations/kubernetes.py[31-33]
shekel/integrations/kubernetes.py[69-117]
shekel/_budget.py[303-319]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`apply_k8s_config()` constructs a child `Budget` when `per_pod_cap` is present, but `Budget.__init__` always calls `apply_k8s_config()` again under the same env gate. This creates unbounded recursive construction and can lead to `RecursionError`, excessive ConfigMap reads, and runaway object graphs.

## Issue Context
- K8s integration is activated purely by process env vars (`KUBERNETES_SERVICE_HOST` + `SHEKEL_BUDGET_NAME`), so any internal `Budget(...)` created in-process inherits the same K8s activation.
- The per-pod cap concept does not require a fully auto-discovered `Budget` instance; it can be represented as a plain float cap or a `Budget` constructed with a “skip k8s” guard.

## Fix Focus Areas
- shekel/integrations/kubernetes.py[69-117]
- shekel/_budget.py[303-319]

## Suggested fix approach
- Option A (simplest): store `per_pod_cap` as a float on the budget (e.g. `_per_pod_cap_usd: float | None`) instead of constructing a new `Budget`.
- Option B: add an internal constructor flag (e.g. `Budget(..., _skip_k8s=True)` or similar) so internal helper budgets do not invoke `apply_k8s_config`.
- Ensure the chosen approach cannot recurse when `per_pod_cap` is present in the ConfigMap.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

7. K8s paused flag not locked 📎 Requirement gap ☼ Reliability

Description

Kubernetes polling writes to budget._paused_externally from a background thread without any
synchronization, and the budget does not define or use a lock to protect concurrent reads/writes.
This violates the thread-safety requirement and can cause race conditions during spend recording.

Code

shekel/integrations/kubernetes.py[R185-189]

+        while not self._stop_event.wait(self._interval):
+            cm = _fetch_configmap(self._budget_name, self._namespace)
+            if cm is not None:
+                self._budget._paused_externally = cm.get("paused") == "true"
+

Evidence
The rule requires thread-safe access to _paused_externally (writes protected by a lock) and
correct async scheduling behavior. The new code assigns budget._paused_externally directly in both
initial ConfigMap application and in the poller thread loop, with no lock shown on the budget or
around these writes.
Background poll runs as daemon and is stopped safely on context exit; thread safety is maintained
shekel/_budget.py[303-320]
shekel/integrations/kubernetes.py[85-88]
shekel/integrations/kubernetes.py[153-158]
shekel/integrations/kubernetes.py[185-189]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The K8s poller mutates `budget._paused_externally` from a background thread without a lock, and `Budget._record_spend()` reads it without synchronization. The compliance rule requires writes to `_paused_externally` to be protected by a `threading.Lock` (or equivalent) and also specifies async budgets should use `asyncio.create_task` when an event loop is running.

## Issue Context
`KubernetesPoller` runs in a daemon thread and periodically updates the paused flag. Without a lock, concurrent access can race with `_record_spend()`.

## Fix Focus Areas
- shekel/_budget.py[303-320]
- shekel/_budget.py[662-671]
- shekel/integrations/kubernetes.py[85-88]
- shekel/integrations/kubernetes.py[153-158]
- shekel/integrations/kubernetes.py[160-189]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

8. ~~Reporter skips spend on raises~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

Budget._record_spend() calls KubernetesSpendReporter.on_spend(cost) only after running checks that
may raise (USD limit, call limit, per-pod cap). When an exception is raised, the reporter never
records the cost of that already-incurred LLM call, under-reporting spend.

Code

shekel/_budget.py[R700-705]

        self._check_warn()
        self._check_limit()
        self._check_call_limit()
+        self._check_per_pod_limit()
+        if self._k8s_reporter is not None:
+            self._k8s_reporter.on_spend(cost)

Evidence
_record_spend runs several limit checks before calling the reporter; any of those checks can raise
after spend was accumulated. The reporter’s totals only advance via on_spend(), so skipping it loses
that call’s cost.
shekel/_budget.py[681-706]
shekel/_budget.py[774-812]
shekel/integrations/kubernetes.py[221-228]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Spend reporting (`_k8s_reporter.on_spend(cost)`) happens after enforcement checks that can raise, so the final (exceeding) call’s cost is not included in reported totals.

### Issue Context
`_record_spend` increments `self._spent` first, so the cost is real and should be reported even if a limit is exceeded on that call.

### Fix Focus Areas
- shekel/_budget.py[681-706]

### Suggested fix
- Ensure `on_spend(cost)` is executed after `self._spent += cost` but before any checks that may raise, **or** wrap the check section in a `try: ... finally:` that calls `on_spend(cost)` when `_k8s_reporter` is set.
- Keep semantics consistent with existing spend/accounting: reporter should reflect the same `_spent` that enforcement uses.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

9. ~~K8s errors swallowed~~ ✓ Resolved 🐞 Bug ◔ Observability

Description

Budget.__init__ wraps apply_k8s_config(self) in except Exception: pass, so parsing bugs,
recursion issues, or other unexpected failures can silently disable K8s
discovery/kill-switch/reporting with no log or signal. This makes production misconfiguration and
regressions extremely difficult to detect and debug.

Code

shekel/_budget.py[R314-319]

+        try:
+            from shekel.integrations.kubernetes import apply_k8s_config  # noqa: PLC0415
+
+            apply_k8s_config(self)
+        except Exception:
+            pass

Evidence

The new try/except Exception: pass around apply_k8s_config will suppress any exception escaping
config application, even though config parsing includes float()/int() conversions that can raise
and would otherwise indicate bad ConfigMap data or a logic bug.

shekel/_budget.py[303-319]
shekel/integrations/kubernetes.py[89-110]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`Budget.__init__` swallows all exceptions from `apply_k8s_config`, which can hide real failures (e.g., value parsing errors, recursion errors, reporter startup issues) and lead to silent loss of K8s features.

## Issue Context
`_fetch_configmap()` already handles and logs many “can’t talk to Kubernetes” failures; the outer blanket `except Exception` mainly hides bugs and config parsing issues inside `apply_k8s_config()`.

## Fix Focus Areas
- shekel/_budget.py[303-319]
- shekel/integrations/kubernetes.py[69-158]

## Suggested fix approach
- Narrow the exception handling:
 - Catch `ImportError` (or a small known set) to preserve the “optional dependency” behavior.
 - For other exceptions, at minimum `logger.warning(..., exc_info=True)` so failures are visible.
- Avoid swallowing `RecursionError`/`KeyboardInterrupt`/`SystemExit`.
- Consider returning early with a warning when ConfigMap values are invalid (e.g., float/int conversion fails) so users can fix their ConfigMap.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

10. ~~Per-pod tests leave pollers~~ ✓ Resolved 🐞 Bug ☼ Reliability

Description

The new per-pod cap tests construct K8s-enabled Budgets without stopping the background poller
thread afterwards. This can leave extra daemon threads running across the test suite and introduce
nondeterministic background activity/noise.

Code

tests/test_kubernetes_integration.py[R573-582]

+    def test_per_pod_cap_stored_as_float(self) -> None:
+        b = _budget_with_k8s({"per_pod_cap": "0.25"})
+        assert b._per_pod_cap_usd == pytest.approx(0.25)
+
+    def test_per_pod_cap_does_not_recurse(self) -> None:
+        # Regression for SHEK-26: constructing a Budget with per_pod_cap in the
+        # ConfigMap must not trigger infinite recursion via nested Budget.__init__ calls.
+        b = _budget_with_k8s({"per_pod_cap": "0.10"})
+        assert b._per_pod_cap_usd == pytest.approx(0.10)
+        assert not hasattr(b, "_per_pod_budget")

Evidence
The helper returns a Budget without cleanup, and the new per-pod cap tests call it without using the
context manager, so no stop path is triggered for the poller.
tests/test_kubernetes_integration.py[37-74]
tests/test_kubernetes_integration.py[572-599]
shekel/integrations/kubernetes.py[151-156]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The per-pod cap tests create budgets via `_budget_with_k8s()` and never stop the K8s poller thread for tests that don’t enter/exit the budget context.

### Issue Context
`_budget_with_k8s()` returns a constructed `Budget` and does not perform any cleanup; `Budget.__exit__` is the path that stops the poller.

### Fix Focus Areas
- tests/test_kubernetes_integration.py[573-599]

### Suggested fix
- In tests that don’t use `with b:`, explicitly stop (and ideally `join`) the poller after assertions:
 - `if b._k8s_poller: b._k8s_poller.stop(); b._k8s_poller.join(timeout=1)`
- Alternatively, use `with b:` in these tests to ensure `__exit__` runs and stops the poller.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-10T07:41:13Z

+        while not self._stop_event.wait(self._interval):
+            cm = _fetch_configmap(self._budget_name, self._namespace)
+            if cm is not None:
+                self._budget._paused_externally = cm.get("paused") == "true"
+


2. K8s paused flag not locked 📎 Requirement gap ☼ Reliability

Kubernetes polling writes to budget._paused_externally from a background thread without any synchronization, and the budget does not define or use a lock to protect concurrent reads/writes. This violates the thread-safety requirement and can cause race conditions during spend recording.

Agent Prompt

## Issue description The K8s poller mutates `budget._paused_externally` from a background thread without a lock, and `Budget._record_spend()` reads it without synchronization. The compliance rule requires writes to `_paused_externally` to be protected by a `threading.Lock` (or equivalent) and also specifies async budgets should use `asyncio.create_task` when an event loop is running. ## Issue Context `KubernetesPoller` runs in a daemon thread and periodically updates the paused flag. Without a lock, concurrent access can race with `_record_spend()`. ## Fix Focus Areas - shekel/_budget.py[303-320] - shekel/_budget.py[662-671] - shekel/integrations/kubernetes.py[85-88] - shekel/integrations/kubernetes.py[153-158] - shekel/integrations/kubernetes.py[160-189]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-06-10T07:41:13Z

+    if cm.get("backend") == "redis":
+        redis_url = os.environ.get("REDIS_URL")
+        if redis_url:
+            try:
+                from shekel.backends.redis import RedisBackend  # noqa: PLC0415
+
+                budget._k8s_redis_backend = RedisBackend(url=redis_url)
+                budget._k8s_redis_name = cm.get("redis_key", f"shekel:{namespace}:{budget_name}")
+            except ImportError:


3. Redisbackend missing name arg 📎 Requirement gap ≡ Correctness

When backend=="redis" and REDIS_URL is set, the code instantiates RedisBackend(url=redis_url) but does not pass name=redis_key as required. This can break correct Redis key naming and violates the specified ConfigMap-to-budget mapping for Redis backend activation.

Agent Prompt

## Issue description Redis backend activation must instantiate `RedisBackend(url=REDIS_URL, name=redis_key)` (using `redis_key` from ConfigMap, or the default `shekel:{namespace}:{budget_name}`), but the current code does not pass `name=`. ## Issue Context The compliance spec explicitly requires the Redis backend to use the `redis_key` naming so controller/materialized budgets map to the intended Redis budget key. ## Fix Focus Areas - shekel/integrations/kubernetes.py[118-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

…ate infinite recursion Constructing a child Budget for per_pod_cap caused unbounded recursion because KUBERNETES_SERVICE_HOST/SHEKEL_BUDGET_NAME are still set in the process, triggering apply_k8s_config() on every child __init__. Store the cap as _per_pod_cap_usd: float | None instead, and enforce it via a new _check_per_pod_limit() method called inside _record_spend(). Adds 4 regression tests including an explicit no-recursion guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

arieradle · 2026-06-10T08:33:01Z

/review

qodo-code-review · 2026-06-10T08:33:08Z

Code review by qodo was updated up to the latest commit f2cb46e

After __exit__ stops the K8s threads, re-entering the same Budget instance (session budget pattern) left K8s polling and spend reporting permanently dead. Python threads cannot be restarted, so a new instance must be created. Persist budget_name, namespace, and poll_interval on the budget during apply_k8s_config(), then call _restart_k8s_threads() from __enter__ and __aenter__ to rebuild any stopped threads idempotently. Adds 6 regression tests including sync and async re-entry, idempotency, and no-K8s no-op. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

qodo-code-review · 2026-06-10T08:36:31Z

+
+    # --- per_pod_cap ---
+    if "per_pod_cap" in cm:
+        budget._per_pod_cap_usd = float(cm["per_pod_cap"])


1. per_pod_cap not child budget 📎 Requirement gap ≡ Correctness

per_pod_cap is applied by setting budget._per_pod_cap_usd and enforcing it via _check_per_pod_limit(), instead of creating a child Budget(max_usd=float(v)) as required by the ConfigMap-to-Budget mapping. This breaks the specified behavior for per-pod limiting and the accompanying tests explicitly assert the non-child implementation.

Agent Prompt

## Issue description The `per_pod_cap` ConfigMap key is required to create a child `Budget(max_usd=float(v))` for per-pod limiting, but the current implementation stores a float on the parent budget and enforces it directly. ## Issue Context Compliance requires the mapping behavior: `per_pod_cap` → create a child `Budget(max_usd=float(v))` when per-pod limiting is used. Current tests also encode the non-child behavior, so they will need updating alongside the implementation. ## Fix Focus Areas - shekel/integrations/kubernetes.py[112-115] - shekel/_budget.py[703-706] - shekel/_budget.py[814-823] - tests/test_kubernetes_integration.py[577-593]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Blanket except Exception: pass hid bad ConfigMap values, recursion bugs, and any other apply_k8s_config failure with no log or signal. Split into ImportError (silent — optional dep not installed) and Exception (warning with exc_info so operators can diagnose misconfigured ConfigMaps). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tures limit-exceeding call cost Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…xture; fix black line-length Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…r kill-switch from budget exhaustion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ibuted enforcement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…cope_mode=shared (group-scoped Redis key) into runtime Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

arieradle · 2026-06-10T14:32:04Z

@qodo-code-review[bot] - did the last commit resolve the gaps?>

- Fix _check_redis_limit raise path (lines 859-861): test used max_usd=0.05 which caused _check_limit to fire first, never reaching redis path - Add chain() tests to cover happy path and invalid-arg guard (lines 1362-1365) - Add _litellm_available() skipif to TestLiteLLMPatching and TestLiteLLMCostRecording so they skip when litellm optional dep is absent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

qodo-code-review Bot reviewed Jun 10, 2026

View reviewed changes

arieradle and others added 7 commits June 10, 2026 08:41

fix: SHEK-33 — _check_per_pod_limit() now respects warn_only mode

02354fd

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: SHEK-32 — report spend before enforcement checks so reporter cap…

8c20312

…tures limit-exceeding call cost Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: SHEK-31 — stop leaked K8s threads after each test via autouse fi…

00e71c8

…xture; fix black line-length Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: SHEK-29 — add BudgetPausedError subclass to distinguish operato…

ecd0b9e

…r kill-switch from budget exhaustion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: SHEK-30 — wire _k8s_redis_backend into _record_spend() for distr…

397fd22

…ibuted enforcement Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: SHEK-34 — wire scope_group_by (pod-label group detection) and s…

6645684

…cope_mode=shared (group-scoped Redis key) into runtime Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

arieradle merged commit 4163b71 into main Jun 17, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SHEK-16 — in-cluster Kubernetes auto-discovery from ConfigMap#29

feat: SHEK-16 — in-cluster Kubernetes auto-discovery from ConfigMap#29
arieradle merged 12 commits into
mainfrom
feat/shek-16

arieradle commented Jun 4, 2026 •

edited by atlassian Bot

Loading

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

arieradle commented Jun 10, 2026

Uh oh!

qodo-code-review Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

qodo-code-review Bot Jun 10, 2026

Uh oh!

qodo-code-review Bot Jun 10, 2026

Uh oh!

Uh oh!

Uh oh!

arieradle commented Jun 10, 2026

Uh oh!

qodo-code-review Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot Jun 10, 2026

Uh oh!

Uh oh!

arieradle commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arieradle commented Jun 4, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Known issues — tracked for follow-up

Test plan

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

arieradle commented Jun 10, 2026

Uh oh!

qodo-code-review Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

qodo-code-review Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arieradle commented Jun 10, 2026

Uh oh!

qodo-code-review Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qodo-code-review Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arieradle commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arieradle commented Jun 4, 2026 •

edited by atlassian Bot

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading

qodo-code-review Bot commented Jun 10, 2026 •

edited

Loading

qodo-code-review Bot commented Jun 10, 2026 •

edited

Loading