-
Notifications
You must be signed in to change notification settings - Fork 5
feat: improve docs search #1277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Phase 1 of Algolia search improvements: - Add EnhancedDocsSearchItem type with new fields: - pageTitle: Always the parent page title - description: From frontmatter (page-level only) - content: Text content (truncated ~2000 chars) - headingLevel: 0 for page, 2 for H2, 3 for H3 - isPageLevel: True if page-level record (not a heading) - Create scripts/indexDocsForSearch.ts: - Parses all MDX/MD content files - Extracts frontmatter using remark - Creates page-level records with intro content - Extracts H2/H3 headings with surrounding content - Creates heading-level records with anchor links - Batches uploads to Algolia (1000 per batch) - Gracefully handles missing Algolia credentials - Update package.json: - Add 'index-docs' script - Run new indexer in prebuild before index-apis This enables: - Deep linking to specific sections via #anchor URLs - Better relevance for specific queries - Smaller, more focused search records - Content-based search (not just titles) Co-authored-by: chris <chris@knock.app>
|
Cursor Agent can help with this pull request. Just |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-authored-by: chris <chris@knock.app>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| </Text> | ||
| <Text as="span" size="1" color="gray" weight="regular"> | ||
| {item.section} | ||
| {item.pageTitle ? `${item.pageTitle as string} •` : ""} {item.section} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Page-level results redundantly display title twice
Low Severity
For page-level search results, pageTitle equals title (both are set to frontmatter.title in the indexing script), causing the same title to appear twice in the UI - once as the main title and again in the subtitle. The isPageLevel field exists on EnhancedDocsSearchItem specifically to distinguish page-level from heading-level records, but the display logic doesn't use it. The condition should check !item.isPageLevel && item.pageTitle to only show pageTitle for heading-level results where it provides useful parent-page context.
| matches.push({ | ||
| index: match.index, | ||
| level: match[1].length, | ||
| title: match[2].trim(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heading titles retain raw markdown formatting characters
Medium Severity
Heading titles captured by the regex at line 183 are stored directly without cleaning markdown formatting. While heading content is properly cleaned via extractTextContent() at line 200, the title field never is. Headings like ## Using \config` variablesor## Important notes` will display with literal backticks, asterisks, or link syntax visible in search results. The title needs the same markdown cleanup applied to it.
Additional Locations (1)
|
@cjbell should i be able to test this on preview link? I'm trying it out but not getting results by querying for headers or content |
Description
This PR introduces a new indexing strategy for the docs, which will now include headings and content within a page in addition to the page title/tags that we previously indexed.
Note: right now we're not indexing API content within this result set, but I can easily change that if we'd like!