Non-deterministic output at temperature=0 — possible memory corruption

## Description

Sending the identical prompt twice to the same server with `temperature=0.0` produces different outputs. The second response often contains corrupted text (Cyrillic characters, garbled tokens), suggesting uninitialized memory or state corruption between requests.

## Steps to Reproduce

```bash
# Start server
./build-metal/quant-server SmolLM2-1.7B-Instruct-Q8_0.gguf -p 8080

# First request
curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is 2+2?"}],"temperature":0.0,"max_tokens":30}'
# Response 1: "2+2 is equal to 4." (coherent)

# Second request (identical)
curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is 2+2?"}],"temperature":0.0,"max_tokens":30}'
# Response 2: "2+2 = 4\nОтвет: 4" (Cyrillic corruption)
```

## Expected Behavior

With `temperature=0.0` (greedy decoding), identical inputs must produce identical outputs every time.

## Impact

- **Severity: P0** — Breaks reproducibility, a fundamental requirement for testing and production
- Suggests memory corruption or uninitialized state in the KV cache between requests
- May be related to the KV cache reuse feature (chat-mode optimization)

## Root Cause Hypothesis

The KV cache from the previous request may not be fully cleared/reset before the next request. If the cache reuse logic incorrectly detects a "match" or leaves stale data, the attention computation reads corrupted values, producing non-deterministic output.

## Suggested Investigation

1. Check if the KV cache is properly reset between unrelated requests
2. Verify that `memset`/zero-initialization happens on all state buffers
3. Test with KV cache reuse disabled to isolate the issue
4. Run under AddressSanitizer (`-fsanitize=address`) to detect memory issues

## Environment

- quant.cpp: latest main (49c6605)
- Model: SmolLM2-1.7B-Instruct-Q8_0.gguf
- Build: cmake -DTQ_BUILD_METAL=ON
- OS: macOS 15 (Apple M3)

---
*Reported by ClawTeam Claw-5 (Researcher persona)*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-deterministic output at temperature=0 — possible memory corruption #62

Description

Steps to Reproduce

Expected Behavior

Impact

Root Cause Hypothesis

Suggested Investigation

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-deterministic output at temperature=0 — possible memory corruption #62

Description

Description

Steps to Reproduce

Expected Behavior

Impact

Root Cause Hypothesis

Suggested Investigation

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions