Skip to content

Conversation

@cjbell
Copy link
Contributor

@cjbell cjbell commented Jan 22, 2026

Description

This PR introduces a new indexing strategy for the docs, which will now include headings and content within a page in addition to the page title/tags that we previously indexed.

Note: right now we're not indexing API content within this result set, but I can easily change that if we'd like!

Phase 1 of Algolia search improvements:

- Add EnhancedDocsSearchItem type with new fields:
  - pageTitle: Always the parent page title
  - description: From frontmatter (page-level only)
  - content: Text content (truncated ~2000 chars)
  - headingLevel: 0 for page, 2 for H2, 3 for H3
  - isPageLevel: True if page-level record (not a heading)

- Create scripts/indexDocsForSearch.ts:
  - Parses all MDX/MD content files
  - Extracts frontmatter using remark
  - Creates page-level records with intro content
  - Extracts H2/H3 headings with surrounding content
  - Creates heading-level records with anchor links
  - Batches uploads to Algolia (1000 per batch)
  - Gracefully handles missing Algolia credentials

- Update package.json:
  - Add 'index-docs' script
  - Run new indexer in prebuild before index-apis

This enables:
- Deep linking to specific sections via #anchor URLs
- Better relevance for specific queries
- Smaller, more focused search records
- Content-based search (not just titles)

Co-authored-by: chris <chris@knock.app>
@cursor
Copy link

cursor bot commented Jan 22, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@vercel
Copy link

vercel bot commented Jan 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
docs Ready Ready Preview, Comment Jan 27, 2026 9:17pm

Request Review

Co-authored-by: chris <chris@knock.app>
@cjbell cjbell changed the title Docs search content indexing feat: improve docs search Jan 23, 2026
@cjbell cjbell marked this pull request as ready for review January 23, 2026 22:09
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

</Text>
<Text as="span" size="1" color="gray" weight="regular">
{item.section}
{item.pageTitle ? `${item.pageTitle as string} •` : ""} {item.section}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page-level results redundantly display title twice

Low Severity

For page-level search results, pageTitle equals title (both are set to frontmatter.title in the indexing script), causing the same title to appear twice in the UI - once as the main title and again in the subtitle. The isPageLevel field exists on EnhancedDocsSearchItem specifically to distinguish page-level from heading-level records, but the display logic doesn't use it. The condition should check !item.isPageLevel && item.pageTitle to only show pageTitle for heading-level results where it provides useful parent-page context.

Fix in Cursor Fix in Web

matches.push({
index: match.index,
level: match[1].length,
title: match[2].trim(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heading titles retain raw markdown formatting characters

Medium Severity

Heading titles captured by the regex at line 183 are stored directly without cleaning markdown formatting. While heading content is properly cleaned via extractTextContent() at line 200, the title field never is. Headings like ## Using \config` variablesor## Important notes` will display with literal backticks, asterisks, or link syntax visible in search results. The title needs the same markdown cleanup applied to it.

Additional Locations (1)

Fix in Cursor Fix in Web

@cjbell cjbell requested a review from samseely January 27, 2026 19:50
@samseely
Copy link
Contributor

@cjbell should i be able to test this on preview link? I'm trying it out but not getting results by querying for headers or content

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants