perf(security): resource management leaks, cache penetration, timing attacks, and validation allocations

## Background

The `internal/security` module handles API key authentication, OIDC provider integration, password login verification, CORS/headers, and path/SSRF validation. This cycle reviews the module for Resource Management (Phase 2), Performance (Phase 3), and Scalability (Phase 3).

**Scope**: resource-mgmt, performance, scalability — cycle 205 (模块分析通过 3)
**Key files**: `apikey_resolver.go`, `auth.go`, `oauth_provider.go`, `local_account_provider.go`

---

## Finding Summary

| Category | Critical | High | Medium | Low |
|----------|----------|------|--------|-----|
| resource-mgmt | 0 | 1 | 1 | 0 |
| performance | 0 | 2 | 0 | 0 |
| scalability | 0 | 1 | 0 | 0 |
| **合计** | **0** | **4** | **1** | **0** |

---

## Findings

### Scalability

#### apikey-dbresolver-cache-penetration

**Severity**: High | **Confidence**: High | **ROI**: High
**Location**: `apikey_resolver.go:93-123`

**Problem**: When an invalid or non-existent API key is provided, `DBResolver.Resolve` queries the database. If the query returns `sql.ErrNoRows`, it returns `"", false` but does not cache this negative lookup result. An attacker sending requests with invalid keys will bypass the cache completely and trigger database queries for every request, saturating SQLite/Postgres connection limits.

**Current Pattern**:
```go
	var userID string
	err := r.db.QueryRowContext(ctx,
		r.dialect.Rebind("SELECT user_id FROM api_key_users WHERE api_key = ?"),
		key,
	).Scan(&userID)
	if err != nil {
		if !errors.Is(err, sql.ErrNoRows) {
			slog.Warn("security: DBResolver query failed", "error", err)
		}
		return "", false
	}
	r.cache.Store(key, &cacheEntry{
		userID:    userID,
		expiresAt: time.Now().Add(60 * time.Second),
	})
```

**Proposed Fix**: Cache negative lookup results (`sql.ErrNoRows`) for a short duration (e.g. 5 seconds) to protect the DB from repeated queries.
```go
	if err != nil {
		if errors.Is(err, sql.ErrNoRows) {
			r.cache.Store(key, &cacheEntry{
				userID:    "",
				expiresAt: time.Now().Add(5 * time.Second),
			})
		} else {
			slog.Warn("security: DBResolver query failed", "error", err)
		}
		return "", false
	}
```

**Estimated Impact**: Protects database resources against API key brute-force / scanning Denial-of-Service.

**Acceptance Criteria**:
- [ ] Modify `DBResolver.Resolve` to cache negative lookup results (`sql.ErrNoRows`) for a duration of 5 seconds.
- [ ] Add a unit test verifying that consecutive resolves for a non-existent API key query the DB exactly once within the negative TTL window.

---

### Resource Management

#### apikey-dbresolver-cache-memory-leak

**Severity**: Medium | **Confidence**: High | **ROI**: High
**Location**: `apikey_resolver.go:61-123`

**Problem**: `DBResolver` uses a standard `sync.Map` to cache key resolutions. Although expired cache entries are marked with a TTL, they are only removed when they are queried again (passive eviction). If keys are frequently rotated or single-use, expired entries accumulate in memory indefinitely, causing a memory leak.

**Current Pattern**:
```go
type DBResolver struct {
	db      *sql.DB
	dialect dbutil.Dialect
	cache   sync.Map // key → *cacheEntry
}
```

**Proposed Fix**: Implement an active background cleanup loop using a time ticker to delete expired entries periodically.
```go
func (r *DBResolver) StartCleanupLoop(ctx context.Context) {
	ticker := time.NewTicker(5 * time.Minute)
	go func() {
		for {
			select {
			case <-ticker.C:
				now := time.Now()
				r.cache.Range(func(k, v any) bool {
					if e, ok := v.(*cacheEntry); ok && now.After(e.expiresAt) {
						r.cache.Delete(k)
					}
					return true
				})
			case <-ctx.Done():
				ticker.Stop()
				return
			}
		}
	}()
}
```

**Estimated Impact**: Places a bound on API key cache memory usage, preventing slow memory exhaustion.

**Acceptance Criteria**:
- [ ] Add active eviction logic to `DBResolver` (either a background worker loop or size-capped cache map).
- [ ] Ensure the loop goroutine terminates cleanly on system shutdown/cancellation.
- [ ] Add a unit test verifying that expired entries are successfully swept automatically.

---

#### oidc-discovery-timeout-hang

**Severity**: High | **Confidence**: High | **ROI**: High
**Location**: `oauth_provider.go:60-64`

**Problem**: `NewOAuthProvider` calls `oidc.NewProvider(ctx, cfg.Issuer)` which executes external HTTP requests to perform provider discovery. Without configuring a custom HTTP client in the context via `oidc.ClientContext`, it defaults to Go's `http.DefaultClient` which has no timeout. If the issuer endpoint hangs, the configuration hot-reload loop blocks indefinitely.

**Current Pattern**:
```go
func NewOAuthProvider(ctx context.Context, cfg OAuthProviderConfig, callbackURL string) (*OAuthProvider, error) {
	provider, err := oidc.NewProvider(ctx, cfg.Issuer)
```

**Proposed Fix**: Create a dedicated `http.Client` with a strict timeout (e.g. 10s) and set it in the context before calling discovery.
```go
	httpClient := &http.Client{Timeout: 10 * time.Second}
	clientCtx := oidc.ClientContext(ctx, httpClient)
	provider, err := oidc.NewProvider(clientCtx, cfg.Issuer)
```

**Estimated Impact**: Prevents OIDC client discovery from hanging configuration hot-reload.

**Acceptance Criteria**:
- [ ] Inject an HTTP client with a strict timeout (max 10 seconds) using `oidc.ClientContext` before performing OIDC provider discovery.
- [ ] Verify that a mock slow/unreachable issuer aborts discovery within the configured timeout window rather than blocking indefinitely.

---

### Performance

#### apikey-verification-loop-allocation

**Severity**: High | **Confidence**: High | **ROI**: High
**Location**: `auth.go:158-173`

**Problem**:
- `Authenticator.authenticateKey` performs string-to-byte-slice conversions (`[]byte(k)` and `[]byte(key)`) inside a sequential loop on every HTTP/WS request, allocating heap memory and stressing GC.
- The early return in `authenticateKey` defeats constant-time execution as matching times vary based on the index position.
- Lookup complexity scales as $O(N)$ with the number of keys.

**Current Pattern**:
```go
func (a *Authenticator) authenticateKey(key string) bool {
	for k := range a.validKey {
		if subtle.ConstantTimeCompare([]byte(k), []byte(key)) == 1 {
			return true
		}
	}
	for k := range a.dbKeys {
		if subtle.ConstantTimeCompare([]byte(k), []byte(key)) == 1 {
			return true
		}
	}
	return false
}
```

**Proposed Fix**: Pre-calculate SHA256 hashes of all valid API keys at loading/CRUD time and store them in a hash map. Compute the SHA256 hash of the incoming key once and perform a fast, constant-time $O(1)$ lookup in the pre-hashed map.
```go
// In Authenticator struct:
// validKeyHashes map[[32]byte]bool

func (a *Authenticator) authenticateKey(key string) bool {
	hash := sha256.Sum256([]byte(key))
	return a.validKeyHashes[hash]
}
```

**Estimated Impact**: Reduces API key validation time to $O(1)$, completely eliminates timing attacks, and removes heap allocations from request paths.

**Acceptance Criteria**:
- [ ] Pre-calculate and store SHA256 hashes of API keys in a map (`map[[32]byte]bool`) during startup/reload/CRUD, rather than raw strings.
- [ ] Perform lookup in `authenticateKey` by hashing the input key once and doing a direct map lookup.
- [ ] Verify all authenticator unit tests pass with the new hashing model.

---

#### login-username-enumeration-timing

**Severity**: High | **Confidence**: High | **ROI**: High
**Location**: `local_account_provider.go:43-59`

**Problem**: If the username does not exist, `LocalAccountProvider.Authenticate` returns immediately (0ms). If it exists but the password is wrong, it performs a compute-heavy `bcrypt.CompareHashAndPassword` which takes ~200ms. Attackers can easily identify valid usernames by measuring response time.

**Current Pattern**:
```go
	u, err := p.store.GetUserByUsername(ctx, lc.Username)
	if errors.Is(err, ErrUserNotFound) || u == nil {
		return "", errInvalidCredentials
	}
	// ...
	if err := bcrypt.CompareHashAndPassword([]byte(u.PasswordHash), []byte(lc.Password)); err != nil {
		return "", errInvalidCredentials
	}
```

**Proposed Fix**: If a user is not found or has no password hash, perform a dummy bcrypt comparison against a pre-computed dummy hash to match the time taken by a real comparison.
```go
	var passwordHash = dummyHash
	var checkBcrypt = false
	u, err := p.store.GetUserByUsername(ctx, lc.Username)
	if err != nil && !errors.Is(err, ErrUserNotFound) {
		return "", err
	}
	if u != nil && u.PasswordHash != "" {
		passwordHash = u.PasswordHash
		checkBcrypt = true
	}
	err = bcrypt.CompareHashAndPassword([]byte(passwordHash), []byte(lc.Password))
	if !checkBcrypt || err != nil {
		return "", errInvalidCredentials
	}
```

**Estimated Impact**: Secures the password login route against timing-based username enumeration.

**Acceptance Criteria**:
- [ ] If a user does not exist or has no password hash, perform a dummy bcrypt comparison to match the time taken by a real comparison.
- [ ] Verify that login response times for existing vs non-existing usernames are statistically indistinguishable.

---

## Implementation Priority

| Finding | Priority | Effort | Risk | Impact |
|---------|----------|--------|------|--------|
| login-username-enumeration-timing | P0 | Small | Low | Fixes username timing enumeration |
| apikey-verification-loop-allocation | P0 | Small | Low | $O(1)$ API key lookup & zero allocations |
| oidc-discovery-timeout-hang | P1 | Small | Low | Prevents reload lock-up on unreachable IdP |
| apikey-dbresolver-cache-penetration | P1 | Small | Low | Prevents DB query exhaustion DoS |
| apikey-dbresolver-cache-memory-leak | P2 | Medium | Low | Cures long-term API key cache memory leak |

**Recommended starting point**: Start with `login-username-enumeration-timing` and `apikey-verification-loop-allocation` as they offer maximum security and performance return for minimal effort.

---

## Out of Scope

- Modifying the password cost factor itself (BcryptCostDefault = 12).
- Rewriting the general token signing model.

---

## Verification

- [ ] \`make test\` passes with zero regressions.
- [ ] \`make lint\` passes with no new warnings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(security): resource management leaks, cache penetration, timing attacks, and validation allocations #761

Background

Finding Summary

Findings

Scalability

apikey-dbresolver-cache-penetration

Resource Management

apikey-dbresolver-cache-memory-leak

oidc-discovery-timeout-hang

Performance

apikey-verification-loop-allocation

login-username-enumeration-timing

Implementation Priority

Out of Scope

Verification

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Finding	Priority	Effort	Risk	Impact
login-username-enumeration-timing	P0	Small	Low	Fixes username timing enumeration
apikey-verification-loop-allocation	P0	Small	Low	$O(1)$ API key lookup & zero allocations
oidc-discovery-timeout-hang	P1	Small	Low	Prevents reload lock-up on unreachable IdP
apikey-dbresolver-cache-penetration	P1	Small	Low	Prevents DB query exhaustion DoS
apikey-dbresolver-cache-memory-leak	P2	Medium	Low	Cures long-term API key cache memory leak

perf(security): resource management leaks, cache penetration, timing attacks, and validation allocations #761

Description

Background

Finding Summary

Findings

Scalability

apikey-dbresolver-cache-penetration

Resource Management

apikey-dbresolver-cache-memory-leak

oidc-discovery-timeout-hang

Performance

apikey-verification-loop-allocation

login-username-enumeration-timing

Implementation Priority

Out of Scope

Verification

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions