Skip to content

[BUG] [v0.1.0] create_simple_summary() panics on multi-byte UTF-8 due to byte-index slicing and String::truncate (context/compaction.rs) #166

@climax-dev-1

Description

@climax-dev-1

Bug Description

create_simple_summary() in cortex-engine/src/context/compaction.rs has two byte-index operations that panic on multi-byte UTF-8 content:

  1. Line 357-358: &text[..97] — byte-index slice that panics if byte 97 falls inside a multi-byte character
  2. Line 372: summary.truncate(max_length - 3)String::truncate() panics if the index is not on a UTF-8 char boundary

Location

src/cortex-engine/src/context/compaction.rs, lines 344–377

Root Cause

Bug 1: Byte-index slice on line 358

fn create_simple_summary(messages: &[Message], max_length: usize) -> String {
    // ...
    for msg in messages {
        if let Some(text) = msg.content.as_text() {
            // Truncate long messages
            let text = if text.len() > 100 {
                format!("{}...", &text[..97])  // BUG: byte-index slice
            } else {
                text.to_string()
            };
            // ...
        }
    }

text.len() returns byte length, and &text[..97] slices at byte position 97. If the text contains multi-byte UTF-8 characters, byte 97 may fall in the middle of a character, causing a panic.

Bug 2: String::truncate on line 372

    if summary.len() > max_length {
        summary.truncate(max_length - 3);  // BUG: panics if not char boundary
        summary.push_str("...");
    }

String::truncate() panics if the given index is not on a UTF-8 char boundary. Since max_length - 3 is an arbitrary byte offset, it can land inside a multi-byte character.

Reproduction

// Bug 1: CJK text where byte 97 is mid-character
// Each CJK char is 3 bytes. 33 chars = 99 bytes.
// text.len() = 99 < 100, so no panic. But with 34 chars = 102 bytes:
// text.len() = 102 > 100, &text[..97] splits the 33rd char → PANIC
let text = "日".repeat(34); // 102 bytes

// Bug 2: summary with multi-byte chars where max_length-3 is mid-char
let mut summary = "日本語".repeat(100); // 900 bytes
// If max_length = 502, then max_length-3 = 499
// Byte 499 is inside a 3-byte char → PANIC
summary.truncate(499);

Fix

Bug 1:

let text = if text.len() > 100 {
    let truncated: String = text.chars().take(97).collect();
    format!("{}...", truncated)
} else {
    text.to_string()
};

Bug 2:

if summary.len() > max_length {
    let end = summary.floor_char_boundary(max_length - 3);
    summary.truncate(end);
    summary.push_str("...");
}

Impact

When compacting conversations that contain multi-byte UTF-8 content (internationalized text, emoji, accented characters), the compaction process panics, potentially crashing the application or leaving the conversation in an inconsistent state. This is triggered during the Summarize, Hybrid, or any compaction strategy that calls create_simple_summary.

Note

This is distinct from issue #161 which covers generate_summary() in compaction.rs (the top-level module). This bug is in context/compaction.rs — a different file with a different function (create_simple_summary).

Version

v0.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions