Skip to content

perf(security): resource management leaks, cache penetration, timing attacks, and validation allocations #761

@hrygo

Description

@hrygo

Background

The internal/security module handles API key authentication, OIDC provider integration, password login verification, CORS/headers, and path/SSRF validation. This cycle reviews the module for Resource Management (Phase 2), Performance (Phase 3), and Scalability (Phase 3).

Scope: resource-mgmt, performance, scalability — cycle 205 (模块分析通过 3)
Key files: apikey_resolver.go, auth.go, oauth_provider.go, local_account_provider.go


Finding Summary

Category Critical High Medium Low
resource-mgmt 0 1 1 0
performance 0 2 0 0
scalability 0 1 0 0
合计 0 4 1 0

Findings

Scalability

apikey-dbresolver-cache-penetration

Severity: High | Confidence: High | ROI: High
Location: apikey_resolver.go:93-123

Problem: When an invalid or non-existent API key is provided, DBResolver.Resolve queries the database. If the query returns sql.ErrNoRows, it returns "", false but does not cache this negative lookup result. An attacker sending requests with invalid keys will bypass the cache completely and trigger database queries for every request, saturating SQLite/Postgres connection limits.

Current Pattern:

	var userID string
	err := r.db.QueryRowContext(ctx,
		r.dialect.Rebind("SELECT user_id FROM api_key_users WHERE api_key = ?"),
		key,
	).Scan(&userID)
	if err != nil {
		if !errors.Is(err, sql.ErrNoRows) {
			slog.Warn("security: DBResolver query failed", "error", err)
		}
		return "", false
	}
	r.cache.Store(key, &cacheEntry{
		userID:    userID,
		expiresAt: time.Now().Add(60 * time.Second),
	})

Proposed Fix: Cache negative lookup results (sql.ErrNoRows) for a short duration (e.g. 5 seconds) to protect the DB from repeated queries.

	if err != nil {
		if errors.Is(err, sql.ErrNoRows) {
			r.cache.Store(key, &cacheEntry{
				userID:    "",
				expiresAt: time.Now().Add(5 * time.Second),
			})
		} else {
			slog.Warn("security: DBResolver query failed", "error", err)
		}
		return "", false
	}

Estimated Impact: Protects database resources against API key brute-force / scanning Denial-of-Service.

Acceptance Criteria:

  • Modify DBResolver.Resolve to cache negative lookup results (sql.ErrNoRows) for a duration of 5 seconds.
  • Add a unit test verifying that consecutive resolves for a non-existent API key query the DB exactly once within the negative TTL window.

Resource Management

apikey-dbresolver-cache-memory-leak

Severity: Medium | Confidence: High | ROI: High
Location: apikey_resolver.go:61-123

Problem: DBResolver uses a standard sync.Map to cache key resolutions. Although expired cache entries are marked with a TTL, they are only removed when they are queried again (passive eviction). If keys are frequently rotated or single-use, expired entries accumulate in memory indefinitely, causing a memory leak.

Current Pattern:

type DBResolver struct {
	db      *sql.DB
	dialect dbutil.Dialect
	cache   sync.Map // key → *cacheEntry
}

Proposed Fix: Implement an active background cleanup loop using a time ticker to delete expired entries periodically.

func (r *DBResolver) StartCleanupLoop(ctx context.Context) {
	ticker := time.NewTicker(5 * time.Minute)
	go func() {
		for {
			select {
			case <-ticker.C:
				now := time.Now()
				r.cache.Range(func(k, v any) bool {
					if e, ok := v.(*cacheEntry); ok && now.After(e.expiresAt) {
						r.cache.Delete(k)
					}
					return true
				})
			case <-ctx.Done():
				ticker.Stop()
				return
			}
		}
	}()
}

Estimated Impact: Places a bound on API key cache memory usage, preventing slow memory exhaustion.

Acceptance Criteria:

  • Add active eviction logic to DBResolver (either a background worker loop or size-capped cache map).
  • Ensure the loop goroutine terminates cleanly on system shutdown/cancellation.
  • Add a unit test verifying that expired entries are successfully swept automatically.

oidc-discovery-timeout-hang

Severity: High | Confidence: High | ROI: High
Location: oauth_provider.go:60-64

Problem: NewOAuthProvider calls oidc.NewProvider(ctx, cfg.Issuer) which executes external HTTP requests to perform provider discovery. Without configuring a custom HTTP client in the context via oidc.ClientContext, it defaults to Go's http.DefaultClient which has no timeout. If the issuer endpoint hangs, the configuration hot-reload loop blocks indefinitely.

Current Pattern:

func NewOAuthProvider(ctx context.Context, cfg OAuthProviderConfig, callbackURL string) (*OAuthProvider, error) {
	provider, err := oidc.NewProvider(ctx, cfg.Issuer)

Proposed Fix: Create a dedicated http.Client with a strict timeout (e.g. 10s) and set it in the context before calling discovery.

	httpClient := &http.Client{Timeout: 10 * time.Second}
	clientCtx := oidc.ClientContext(ctx, httpClient)
	provider, err := oidc.NewProvider(clientCtx, cfg.Issuer)

Estimated Impact: Prevents OIDC client discovery from hanging configuration hot-reload.

Acceptance Criteria:

  • Inject an HTTP client with a strict timeout (max 10 seconds) using oidc.ClientContext before performing OIDC provider discovery.
  • Verify that a mock slow/unreachable issuer aborts discovery within the configured timeout window rather than blocking indefinitely.

Performance

apikey-verification-loop-allocation

Severity: High | Confidence: High | ROI: High
Location: auth.go:158-173

Problem:

  • Authenticator.authenticateKey performs string-to-byte-slice conversions ([]byte(k) and []byte(key)) inside a sequential loop on every HTTP/WS request, allocating heap memory and stressing GC.
  • The early return in authenticateKey defeats constant-time execution as matching times vary based on the index position.
  • Lookup complexity scales as $O(N)$ with the number of keys.

Current Pattern:

func (a *Authenticator) authenticateKey(key string) bool {
	for k := range a.validKey {
		if subtle.ConstantTimeCompare([]byte(k), []byte(key)) == 1 {
			return true
		}
	}
	for k := range a.dbKeys {
		if subtle.ConstantTimeCompare([]byte(k), []byte(key)) == 1 {
			return true
		}
	}
	return false
}

Proposed Fix: Pre-calculate SHA256 hashes of all valid API keys at loading/CRUD time and store them in a hash map. Compute the SHA256 hash of the incoming key once and perform a fast, constant-time $O(1)$ lookup in the pre-hashed map.

// In Authenticator struct:
// validKeyHashes map[[32]byte]bool

func (a *Authenticator) authenticateKey(key string) bool {
	hash := sha256.Sum256([]byte(key))
	return a.validKeyHashes[hash]
}

Estimated Impact: Reduces API key validation time to $O(1)$, completely eliminates timing attacks, and removes heap allocations from request paths.

Acceptance Criteria:

  • Pre-calculate and store SHA256 hashes of API keys in a map (map[[32]byte]bool) during startup/reload/CRUD, rather than raw strings.
  • Perform lookup in authenticateKey by hashing the input key once and doing a direct map lookup.
  • Verify all authenticator unit tests pass with the new hashing model.

login-username-enumeration-timing

Severity: High | Confidence: High | ROI: High
Location: local_account_provider.go:43-59

Problem: If the username does not exist, LocalAccountProvider.Authenticate returns immediately (0ms). If it exists but the password is wrong, it performs a compute-heavy bcrypt.CompareHashAndPassword which takes ~200ms. Attackers can easily identify valid usernames by measuring response time.

Current Pattern:

	u, err := p.store.GetUserByUsername(ctx, lc.Username)
	if errors.Is(err, ErrUserNotFound) || u == nil {
		return "", errInvalidCredentials
	}
	// ...
	if err := bcrypt.CompareHashAndPassword([]byte(u.PasswordHash), []byte(lc.Password)); err != nil {
		return "", errInvalidCredentials
	}

Proposed Fix: If a user is not found or has no password hash, perform a dummy bcrypt comparison against a pre-computed dummy hash to match the time taken by a real comparison.

	var passwordHash = dummyHash
	var checkBcrypt = false
	u, err := p.store.GetUserByUsername(ctx, lc.Username)
	if err != nil && !errors.Is(err, ErrUserNotFound) {
		return "", err
	}
	if u != nil && u.PasswordHash != "" {
		passwordHash = u.PasswordHash
		checkBcrypt = true
	}
	err = bcrypt.CompareHashAndPassword([]byte(passwordHash), []byte(lc.Password))
	if !checkBcrypt || err != nil {
		return "", errInvalidCredentials
	}

Estimated Impact: Secures the password login route against timing-based username enumeration.

Acceptance Criteria:

  • If a user does not exist or has no password hash, perform a dummy bcrypt comparison to match the time taken by a real comparison.
  • Verify that login response times for existing vs non-existing usernames are statistically indistinguishable.

Implementation Priority

Finding Priority Effort Risk Impact
login-username-enumeration-timing P0 Small Low Fixes username timing enumeration
apikey-verification-loop-allocation P0 Small Low $O(1)$ API key lookup & zero allocations
oidc-discovery-timeout-hang P1 Small Low Prevents reload lock-up on unreachable IdP
apikey-dbresolver-cache-penetration P1 Small Low Prevents DB query exhaustion DoS
apikey-dbresolver-cache-memory-leak P2 Medium Low Cures long-term API key cache memory leak

Recommended starting point: Start with login-username-enumeration-timing and apikey-verification-loop-allocation as they offer maximum security and performance return for minimal effort.


Out of Scope

  • Modifying the password cost factor itself (BcryptCostDefault = 12).
  • Rewriting the general token signing model.

Verification

  • `make test` passes with zero regressions.
  • `make lint` passes with no new warnings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2High: affects many users, daily occurrencesarchitectureDomain: design patterns, coupling, separation of concerns

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions