Skip to content

P0-B: Verify and confirm text chunking implementation is complete#48

Merged
devlux76 merged 1 commit intomainfrom
copilot/p0-b-text-chunking-implementation
Mar 13, 2026
Merged

P0-B: Verify and confirm text chunking implementation is complete#48
devlux76 merged 1 commit intomainfrom
copilot/p0-b-text-chunking-implementation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 13, 2026

P0-B required a token-aware text chunker backed by ModelProfile limits, with sentence-boundary awareness and full edge-case handling. This PR confirms the implementation is already in place and fully correct.

What's in place

  • hippocampus/Chunker.ts — exports chunkText(text, profile) (delegates to profile.maxChunkTokens) and the lower-level chunkTextWithMaxTokens(text, maxChunkTokens):

    • Whitespace-token budget enforcement — never emits a chunk exceeding maxChunkTokens tokens
    • Sentence boundary heuristic via lookbehind regex on . ! ? — keeps sentences whole when they fit
    • Oversized sentences split at token boundaries across consecutive chunks
    • Empty / whitespace-only input returns []; huge inputs handled iteratively (no stack growth)
  • tests/hippocampus/Chunker.test.ts — 8 tests covering empty input, single-token, 10k-token scale, multi-chunk split, sentence-boundary preference, oversized sentence, and ModelProfile integration via chunkText

Usage

import { chunkText } from "./hippocampus/Chunker";

const chunks = chunkText(rawText, profile); // profile.maxChunkTokens drives the budget

Both P0-B1 and P0-B2 exit criteria are satisfied: 203/203 unit tests pass, lint clean, TypeScript build clean.

Original prompt

This section details on the original issue you should resolve

<issue_title>P0-B: Text Chunking (BLOCKS: ingest orchestration)</issue_title>
<issue_description>Why: Must split text into page-sized chunks respecting ModelProfile token limits.

  • P0-B1: Implement hippocampus/Chunker.ts

    • Token-aware splitting (use ModelProfile maxContextLength)
    • Respect sentence boundaries where possible
    • Handle edge cases (empty input, single-token input, huge paragraphs)
  • P0-B2: Add chunker test coverage

    • tests/hippocampus/Chunker.test.ts
    • Test various text lengths (short, medium, long, huge)
    • Test boundary conditions

Exit Criteria: Can reliably split arbitrary text into page chunks.</issue_description>

<agent_instructions>I think some of this was done already, if so verify completeness and make sure it aligns with the design and plan, if not then fix it please.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] [P0-B] Implement text chunking for ingestion orchestration P0-B: Verify and confirm text chunking implementation is complete Mar 13, 2026
Copilot AI requested a review from devlux76 March 13, 2026 08:20
@devlux76 devlux76 marked this pull request as ready for review March 13, 2026 08:22
Copilot AI review requested due to automatic review settings March 13, 2026 08:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

@devlux76 devlux76 merged commit d46bbf6 into main Mar 13, 2026
2 checks passed
@devlux76 devlux76 deleted the copilot/p0-b-text-chunking-implementation branch March 13, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

P0-B: Text Chunking (BLOCKS: ingest orchestration)

3 participants