Skip to content

Non-deterministic output at temperature=0 — possible memory corruption #62

@unamedkr

Description

@unamedkr

Description

Sending the identical prompt twice to the same server with temperature=0.0 produces different outputs. The second response often contains corrupted text (Cyrillic characters, garbled tokens), suggesting uninitialized memory or state corruption between requests.

Steps to Reproduce

# Start server
./build-metal/quant-server SmolLM2-1.7B-Instruct-Q8_0.gguf -p 8080

# First request
curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is 2+2?"}],"temperature":0.0,"max_tokens":30}'
# Response 1: "2+2 is equal to 4." (coherent)

# Second request (identical)
curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is 2+2?"}],"temperature":0.0,"max_tokens":30}'
# Response 2: "2+2 = 4\nОтвет: 4" (Cyrillic corruption)

Expected Behavior

With temperature=0.0 (greedy decoding), identical inputs must produce identical outputs every time.

Impact

  • Severity: P0 — Breaks reproducibility, a fundamental requirement for testing and production
  • Suggests memory corruption or uninitialized state in the KV cache between requests
  • May be related to the KV cache reuse feature (chat-mode optimization)

Root Cause Hypothesis

The KV cache from the previous request may not be fully cleared/reset before the next request. If the cache reuse logic incorrectly detects a "match" or leaves stale data, the attention computation reads corrupted values, producing non-deterministic output.

Suggested Investigation

  1. Check if the KV cache is properly reset between unrelated requests
  2. Verify that memset/zero-initialization happens on all state buffers
  3. Test with KV cache reuse disabled to isolate the issue
  4. Run under AddressSanitizer (-fsanitize=address) to detect memory issues

Environment

  • quant.cpp: latest main (49c6605)
  • Model: SmolLM2-1.7B-Instruct-Q8_0.gguf
  • Build: cmake -DTQ_BUILD_METAL=ON
  • OS: macOS 15 (Apple M3)

Reported by ClawTeam Claw-5 (Researcher persona)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions