Skip to content

feature: add language server implementation#18

Open
16bit-ykiko wants to merge 2 commits intomainfrom
language-server-2
Open

feature: add language server implementation#18
16bit-ykiko wants to merge 2 commits intomainfrom
language-server-2

Conversation

@16bit-ykiko
Copy link
Member

@16bit-ykiko 16bit-ykiko commented Feb 17, 2026

Summary by CodeRabbit

  • New Features

    • Added comprehensive LSP types and serialization support, including enum↔string handling and improved map-key parsing/serialization.
    • Added a Python code generator to produce LSP headers from schema files.
    • Introduced name-normalization and key/string conversion utilities for consistent protocol formatting.
  • Chores

    • Updated formatter configuration and .gitignore to exclude generated LSP schema artifacts.

@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

Adds LSP generation and runtime support: a Python LSP schema code generator, new C++ LSP type definitions, comprehensive spelling/map-key utilities, enum-string serialization support, improved map key parsing, and related serializer/deserializer and reflection helpers.

Changes

Cohort / File(s) Summary
LSP Type Definitions
include/language/ts.h
Adds LSP public types, optional/nullable helpers, LSPVariant/LSPAny/LSPArray/LSPObject, and primitive/URI aliases.
Code Generator
scripts/lsp_codegen.py
New CLI Python generator that parses/fetches LSP metaModel JSON and emits a single C++23 header (language::protocol) with ordered structs/enums/aliases and docs.
Name Transformation & Key Handling
include/serde/spelling.h
Introduces rename policies, enum↔string mapping, map key stringification and parsing, parseable_map_key concept, and low-level ASCII/parse helpers.
Serialization Core & Attributes
include/serde/attrs.h, include/serde/serde.h
Adds enum_string annotation, constructors/assignments for annotate wrapper, integrates enum_string and enum_policy into (de)serialization, and updates map key parsing to use parse_map_key.
Simdjson Serializer/Deserializer
include/serde/simdjson/serializer.h, include/serde/simdjson/deserializer.h
Deserializer: adds invalid_key() to signal invalid map keys. Serializer: delegates map key stringification to serde::spelling::map_key_to_string and removes in-class map_key_to_string implementation.
Reflection Utilities
include/reflection/enum.h
Adds template<enum_type E> constexpr std::optional<E> enum_value(std::string_view) for name→enum lookup.
Config & Ignore
.gitignore, pixi.toml
Ignores Python cache and generated schema JSON; excludes include/language/protocol.h from clang-format task.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Fetcher as SchemaFetcher
    participant Parser as SchemaParser
    participant Generator as Generator
    participant Renderer as TypeRenderer
    participant Writer as FileWriter

    CLI->>Fetcher: fetch_schema(--fetch-schema / --schema)
    Fetcher-->>Parser: raw JSON
    Parser->>Generator: SchemaModel
    Generator->>Generator: build_name_map(), build_node_dependencies()
    Generator->>Generator: topological_order(nodes,deps)
    loop for each node
        Generator->>Renderer: render_type / render_or
        Renderer-->>Generator: C++ type strings
        alt struct
            Generator->>Generator: emit_struct()
        else enum
            Generator->>Generator: emit_enum()
        else alias
            Generator->>Generator: emit_alias()
        end
    end
    Generator->>Writer: make_header(includes, blocks)
    Writer->>Writer: write_file(output_path, content)
    Writer-->>CLI: summary metrics
Loading
sequenceDiagram
    participant Serializer as Serializer
    participant Meta as AnnotationMeta
    participant Spelling as SpellingUtil
    participant Output as OutputStream

    Serializer->>Meta: query enum_string & enum_policy
    Meta-->>Serializer: enum_string=true, policy
    Serializer->>Spelling: map_enum_to_string(value, policy)
    Spelling-->>Serializer: string
    Serializer->>Output: write field key/value

    Serializer->>Meta: check map key parseable type
    Meta-->>Serializer: parseable_map_key<Key>
    Serializer->>Spelling: map_key_to_string(key)
    Spelling-->>Serializer: stringified key
    Serializer->>Output: write key-value pair
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I hopped through JSON fields in the night,
Generated types with nimble delight.
Enums now speak in polished string,
Keys parse true — what joy they bring!
A little rabbit stamped: protocol takes flight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feature: add language server implementation' directly aligns with the primary changes in the PR, which introduce comprehensive LSP support infrastructure including type definitions, serialization/deserialization utilities, and code generation tooling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch language-server-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

fmt

[fmt] reported by reviewdog 🐶

constexpr static inline std::string_view label = "label";


[fmt] reported by reviewdog 🐶

enum class TokenFormat {
Relative
};


[fmt] reported by reviewdog 🐶

/// An optional string which is rendered less prominently directly after {@link CompletionItem.label label},
/// without any spacing. Should be used for function signatures and type annotations.


[fmt] reported by reviewdog 🐶

/// An optional string which is rendered less prominently after {@link CompletionItem.detail}. Should be used
/// for fully qualified names and file paths.


[fmt] reported by reviewdog 🐶

/// via the client capability [`general.positionEncodings`](https://microsoft.github.io/language-server-protocol/specifications/specification-current/#clientCapabilities).


[fmt] reported by reviewdog 🐶

/// [`capabilities.positionEncoding`](https://microsoft.github.io/language-server-protocol/specifications/specification-current/#serverCapabilities). If the string value
/// `utf-16` is missing from the client's capability `general.positionEncodings`


[fmt] reported by reviewdog 🐶

/// Whether implementation supports dynamic registration for selection range providers. If this is set to `true`
/// the client supports the new `SelectionRangeRegistrationOptions` return value for the corresponding server
/// capability as well.


[fmt] reported by reviewdog 🐶

/// These trigger characters are only active when signature help is already showing. All trigger characters
/// are also counted as re-trigger characters.


[fmt] reported by reviewdog 🐶

/// An optional token that a server can use to report work done progress. (Inherited from [WorkDoneProgressParams])


[fmt] reported by reviewdog 🐶

/// diagnostic request for the same document. (Inherited from [UnchangedDocumentDiagnosticReport])


[fmt] reported by reviewdog 🐶

/// The zero-based start line of the range to fold. The folded area starts after the line's last character.
/// To be valid, the end must be zero or larger and smaller than the number of lines in the document.


[fmt] reported by reviewdog 🐶

/// The zero-based character offset from where the folded range starts. If not defined, defaults to the length of the start line.


[fmt] reported by reviewdog 🐶

/// The zero-based end line of the range to fold. The folded area ends with the line's last character.
/// To be valid, the end must be zero or larger and smaller than the number of lines in the document.


[fmt] reported by reviewdog 🐶

/// The zero-based character offset before the folded range ends. If not defined, defaults to the length of the end line.


[fmt] reported by reviewdog 🐶

/// If the kind is generic, such as `CodeActionKind.Refactor`, the documentation will be shown whenever any
/// refactorings are returned. If the kind if more specific, such as `CodeActionKind.RefactorExtract`, the
/// documentation will only be shown when extract refactoring code actions are returned.


[fmt] reported by reviewdog 🐶

/// - `{}` to group sub patterns into an OR expression. (e.g. `**​/*.{ts,js}` matches all TypeScript and JavaScript files)
/// - `[]` to declare a range of characters to match in a path segment (e.g., `example.[0-9]` to match on `example.0`, `example.1`, …)
/// - `[!...]` to negate a range of characters to match in a path segment (e.g., `example.[!0-9]` to match on `example.a`, `example.b`, but not `example.0`)


[fmt] reported by reviewdog 🐶

/// issues. See https://help.github.com/articles/creating-and-highlighting-code-blocks/#syntax-highlighting


[fmt] reported by reviewdog 🐶

/// Open and close notifications are sent to the server. If omitted open close notification should not
/// be sent.


[fmt] reported by reviewdog 🐶

/// Change notifications are sent to the server. See TextDocumentSyncKind.None, TextDocumentSyncKind.Full
/// and TextDocumentSyncKind.Incremental. If omitted it defaults to TextDocumentSyncKind.None.


[fmt] reported by reviewdog 🐶

/// If present will save notifications are sent to the server. If omitted the notification should not be
/// sent.


[fmt] reported by reviewdog 🐶

/// If present will save wait until requests are sent to the server. If omitted the request should not be
/// sent.


[fmt] reported by reviewdog 🐶

/// If present save notifications are sent to the server. If omitted the notification should not be
/// sent.


[fmt] reported by reviewdog 🐶

/// An optional token that a server can use to report work done progress. (Inherited from [WorkDoneProgressParams])


[fmt] reported by reviewdog 🐶

/// Whether the clients accepts diagnostics with related information. (Inherited from [DiagnosticsCapabilities])


[fmt] reported by reviewdog 🐶

/// Whether the clients accepts diagnostics with related information. (Inherited from [DiagnosticsCapabilities])


[fmt] reported by reviewdog 🐶

/// Its intended use case is to highlight the parameter label part in the `SignatureInformation.label`.


[fmt] reported by reviewdog 🐶

/// - Code actions of `kind` are requested by the editor. In this case, the editor will show the documentation that
/// most closely matches the requested code action kind. For example, if a provider has documentation for
/// both `Refactor` and `RefactorExtract`, when the user requests code actions for `RefactorExtract`,
/// the editor will use the documentation for `RefactorExtract` instead of the documentation for `Refactor`.


[fmt] reported by reviewdog 🐶

/// The range enclosing this symbol not including leading/trailing whitespace but everything else, e.g. comments and code.


[fmt] reported by reviewdog 🐶

/// The range that should be selected and revealed when this symbol is being picked, e.g. the name of a function.
/// Must be contained by the {@link CallHierarchyItem.range `range`}.


[fmt] reported by reviewdog 🐶

/// A document link is a range in a text document that links to an internal or external resource, like another
/// text document or a web site.


[fmt] reported by reviewdog 🐶

/// If a tooltip is provided, is will be displayed in a string that includes instructions on how to
/// trigger the link, such as `{0} (ctrl + click)`. The specific instructions vary depending on OS,
/// user settings, and localization.


[fmt] reported by reviewdog 🐶

/// An optional token that a server can use to report work done progress. (Inherited from [WorkDoneProgressParams])


[fmt] reported by reviewdog 🐶

/// An optional token that a server can use to report work done progress. (Inherited from [WorkDoneProgressParams])


[fmt] reported by reviewdog 🐶

/// The range enclosing this symbol not including leading/trailing whitespace but everything else
/// like comments. This information is typically used to determine if the clients cursor is


[fmt] reported by reviewdog 🐶

/// The range that should be selected and revealed when this symbol is being picked, e.g the name of a function.
/// Must be contained by the `range`.


[fmt] reported by reviewdog 🐶

/// An optional token that a server can use to report work done progress. (Inherited from [WorkDoneProgressParams])


[fmt] reported by reviewdog 🐶

/// An inline completion item represents a text snippet that is proposed inline to complete text that is being typed.


[fmt] reported by reviewdog 🐶

/// A text that is used to decide if this inline completion should be shown. When `falsy` the {@link InlineCompletionItem.insertText} is used.


[fmt] reported by reviewdog 🐶

/// Represents the connection of two locations. Provides additional metadata over normal {@link Location locations},
/// including an origin range.


[fmt] reported by reviewdog 🐶

/// The full target range of this link. If the target for example is a symbol then target range is the
/// range enclosing this symbol not including leading/trailing whitespace but everything else
/// like comments. This information is typically used to highlight the range in the editor.


[fmt] reported by reviewdog 🐶

/// The range that should be selected and revealed when this link is being followed, e.g the name of a function.
/// Must be contained by the `targetRange`. See also `DocumentSymbol#range`


[fmt] reported by reviewdog 🐶

/// The parent selection range containing this range. Therefore `parent.range` must contain `this.range`.


[fmt] reported by reviewdog 🐶

/// Represents an outgoing call, e.g. calling a getter from a method or a method from a constructor etc.


[fmt] reported by reviewdog 🐶

/// The range at which this item is called. This is the range relative to the caller, e.g the item
/// passed to {@link CallHierarchyItemProvider.provideCallHierarchyOutgoingCalls `provideCallHierarchyOutgoingCalls`}
/// and not {@link CallHierarchyOutgoingCall.to `this.to`}.


[fmt] reported by reviewdog 🐶

/// Represents a collection of {@link InlineCompletionItem inline completion items} to be presented in the editor.


[fmt] reported by reviewdog 🐶

/// An optional token that a server can use to report work done progress. (Inherited from [WorkDoneProgressParams])


[fmt] reported by reviewdog 🐶

using InlineValue = variant<InlineValueText, InlineValueVariableLookup, InlineValueEvaluatableExpression>;


[fmt] reported by reviewdog 🐶

/// Provides additional metadata over normal {@link Location location} declarations, including the range of
/// the declaring symbol.


[fmt] reported by reviewdog 🐶

/// Provides additional metadata over normal {@link Location location} definitions, including the range of
/// the defining symbol


[fmt] reported by reviewdog 🐶

/// Provides information about the currently selected item in the autocomplete widget if it is visible.


[fmt] reported by reviewdog 🐶

using TextDocumentContentChangeEvent = variant<TextDocumentContentChangePartial, TextDocumentContentChangeWholeDocument>;


[fmt] reported by reviewdog 🐶

/// selecting this color presentation. Edits must not overlap with the main {@link ColorPresentation.textEdit edit} nor with themselves.


[fmt] reported by reviewdog 🐶

/// An optional set of characters that when pressed while this completion is active will accept it first and
/// then type that character. *Note* that all commit characters should have `length=1` and that superfluous
/// characters will be ignored.


[fmt] reported by reviewdog 🐶

/// An optional {@link Command command} that is executed *after* inserting this completion. *Note* that
/// additional modifications to the current document should be described with the


[fmt] reported by reviewdog 🐶

/// Depending on the client capability `workspace.workspaceEdit.resourceOperations` document changes
/// are either an array of `TextDocumentEdit`s to express changes to n different text documents
/// where each text document edit addresses a specific version of a text document. Or it can contain
/// above `TextDocumentEdit`s mixed with create, rename and delete file / folder operations.


[fmt] reported by reviewdog 🐶

/// If a client neither supports `documentChanges` nor `workspace.workspaceEdit.resourceOperations` then
/// only plain `TextEdit`s using the `changes` property are supported.
optional<std::vector<variant<TextDocumentEdit, CreateFile, RenameFile, DeleteFile>>> document_changes = {};


[fmt] reported by reviewdog 🐶

/// A map of change annotations that can be referenced in `AnnotatedTextEdit`s or create, rename and
/// delete file / folder operations.


[fmt] reported by reviewdog 🐶

/// Whether clients honor this property depends on the client capability `workspace.changeAnnotationSupport`.


[fmt] reported by reviewdog 🐶

using NotebookDocumentFilter = variant<NotebookDocumentFilterNotebookType, NotebookDocumentFilterScheme, NotebookDocumentFilterPattern>;


[fmt] reported by reviewdog 🐶

/// - `{}` to group sub patterns into an OR expression. (e.g. `**​/*.{ts,js}` matches all TypeScript and JavaScript files)
/// - `[]` to declare a range of characters to match in a path segment (e.g., `example.[0-9]` to match on `example.0`, `example.1`, …)
/// - `[!...]` to negate a range of characters to match in a path segment (e.g., `example.[!0-9]` to match on `example.a`, `example.b`, but not `example.0`)
///
/// @sample A language filter that applies to typescript files on disk: `{ language: 'typescript', scheme: 'file' }`
/// @sample A language filter that applies to all package.json paths: `{ language: 'json', pattern: '**package.json' }`


[fmt] reported by reviewdog 🐶

using TextDocumentFilter = variant<TextDocumentFilterLanguage, TextDocumentFilterScheme, TextDocumentFilterPattern>;


[fmt] reported by reviewdog 🐶

/// Retriggers occurs when the signature help is already active and can be caused by actions such as
/// typing a trigger character, a cursor move, or document content changes.


[fmt] reported by reviewdog 🐶

optional_variant<TextDocumentContentOptions, TextDocumentContentRegistrationOptions> text_document_content = {};


[fmt] reported by reviewdog 🐶

/// A CodeAction must set either `edit` and/or a `command`. If both are supplied, the `edit` is applied first, then the `command` is executed.


[fmt] reported by reviewdog 🐶

/// Marks this as a preferred action. Preferred actions are used by the `auto fix` command and can be targeted
/// by keybindings.


[fmt] reported by reviewdog 🐶

/// A refactoring should be marked preferred if it is the most reasonable choice of actions to take.


[fmt] reported by reviewdog 🐶

/// - Disabled code actions are not shown in automatic [lightbulbs](https://code.visualstudio.com/docs/editor/editingevolved#_code-action)


[fmt] reported by reviewdog 🐶

/// - Disabled actions are shown as faded out in the code action menu when the user requests a more specific type


[fmt] reported by reviewdog 🐶

/// - If the user has a [keybinding](https://code.visualstudio.com/docs/editor/refactoring#_keybindings-for-code-actions)
/// that auto applies a code action and only disabled code actions are returned, the client should show the user an
/// error message with `reason` in the editor.


[fmt] reported by reviewdog 🐶

/// An optional token that a server can use to report work done progress. (Inherited from [WorkDoneProgressParams])


[fmt] reported by reviewdog 🐶

/// to send this using the client capability `textDocument.signatureHelp.contextSupport === true`


[fmt] reported by reviewdog 🐶

std::map<DocumentUri, variant<FullDocumentDiagnosticReport, UnchangedDocumentDiagnosticReport>> related_documents;


[fmt] reported by reviewdog 🐶

optional<std::map<DocumentUri, variant<FullDocumentDiagnosticReport, UnchangedDocumentDiagnosticReport>>> related_documents = {};


[fmt] reported by reviewdog 🐶

/// diagnostic request for the same document. (Inherited from [UnchangedDocumentDiagnosticReport])


[fmt] reported by reviewdog 🐶

optional<std::map<DocumentUri, variant<FullDocumentDiagnosticReport, UnchangedDocumentDiagnosticReport>>> related_documents = {};


[fmt] reported by reviewdog 🐶

using WorkspaceDocumentDiagnosticReport = variant<WorkspaceFullDocumentDiagnosticReport, WorkspaceUnchangedDocumentDiagnosticReport>;


[fmt] reported by reviewdog 🐶

std::vector<variant<NotebookDocumentFilterWithNotebook, NotebookDocumentFilterWithCells>> notebook_selector;


[fmt] reported by reviewdog 🐶

using DocumentDiagnosticReport = variant<RelatedFullDocumentDiagnosticReport, RelatedUnchangedDocumentDiagnosticReport>;


[fmt] reported by reviewdog 🐶

/// @sample `let sel:DocumentSelector = [{ language: 'typescript' }, { language: 'json', pattern: '**∕tsconfig.json' }]`;


[fmt] reported by reviewdog 🐶

/// the document selector provided on the client side will be used. (Inherited from [TextDocumentRegistrationOptions])


[fmt] reported by reviewdog 🐶

optional_variant<NotebookDocumentSyncOptions, NotebookDocumentSyncRegistrationOptions> notebook_document_sync = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, DeclarationOptions, DeclarationRegistrationOptions> declaration_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, TypeDefinitionOptions, TypeDefinitionRegistrationOptions> type_definition_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, ImplementationOptions, ImplementationRegistrationOptions> implementation_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, DocumentColorOptions, DocumentColorRegistrationOptions> color_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, DocumentRangeFormattingOptions> document_range_formatting_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, FoldingRangeOptions, FoldingRangeRegistrationOptions> folding_range_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, SelectionRangeOptions, SelectionRangeRegistrationOptions> selection_range_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, CallHierarchyOptions, CallHierarchyRegistrationOptions> call_hierarchy_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, LinkedEditingRangeOptions, LinkedEditingRangeRegistrationOptions> linked_editing_range_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<SemanticTokensOptions, SemanticTokensRegistrationOptions> semantic_tokens_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, TypeHierarchyOptions, TypeHierarchyRegistrationOptions> type_hierarchy_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, InlineValueOptions, InlineValueRegistrationOptions> inline_value_provider = {};


[fmt] reported by reviewdog 🐶

optional_variant<boolean, InlayHintOptions, InlayHintRegistrationOptions> inlay_hint_provider = {};


[fmt] reported by reviewdog 🐶

def generate_files(schema_path: pathlib.Path, output_file: pathlib.Path) -> dict[str, object]:


[fmt] reported by reviewdog 🐶

parser.add_argument(
"--schema", type=pathlib.Path, default=DEFAULT_SCHEMA_PATH
)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
include/serde/spelling.h (2)

240-286: Robust enum-string round-trip logic with good fallback strategies.

The multi-strategy parsing in map_string_to_enum (keyword-suffixed names, digit-prefixed names, camelCase retries) handles the naming quirks of LSP code generation well.

One minor observation: the function is declared constexpr but calls non-constexpr helpers (normalize_to_lower_snake, snake_to_camel), so it won't be evaluable at compile time. Not a bug — the compiler will just treat it as a regular function — but the constexpr specifier is misleading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/serde/spelling.h` around lines 240 - 286, The function
map_string_to_enum is marked constexpr but calls non-constexpr helpers
(detail::snake_to_camel and other normalization helpers), so change its
declaration to remove the constexpr specifier; update the signature of
map_string_to_enum<...> to be a normal function (not constexpr) so the
declaration matches its non-constexpr behavior and avoids misleading callers,
referencing the function name map_string_to_enum and the helper calls
detail::snake_to_camel and any normalize_to_lower_snake usages when making the
edit.

113-136: Minor: digits after underscores don't trigger capitalization of subsequent alpha characters.

In snake_to_camel, when a digit follows an underscore, capitalize_next is consumed by the digit (which is not alpha), so the subsequent alpha character won't be capitalized. For example, "foo_2bar""foo2bar" rather than "foo2Bar". If this is intentional for LSP enum naming, it's fine; otherwise the digit should not reset capitalize_next.

Potential fix if the behavior is unintended
         if(c == '_') {
             capitalize_next = true;
             continue;
         }
         if(capitalize_next && is_ascii_alpha(c)) {
             out.push_back(ascii_upper(c));
-        } else if(!seen_output) {
+            capitalize_next = false;
+        } else if(capitalize_next && is_ascii_digit(c)) {
+            out.push_back(c);
+            // keep capitalize_next = true for next alpha
+        } else if(!seen_output) {
             out.push_back(upper_first ? ascii_upper(c) : ascii_lower(c));
+            capitalize_next = false;
         } else {
             out.push_back(c);
+            capitalize_next = false;
         }
-        capitalize_next = false;
         seen_output = true;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/serde/spelling.h` around lines 113 - 136, The snake_to_camel function
currently consumes capitalize_next for any non-underscore character (including
digits), so "foo_2bar" becomes "foo2bar" instead of "foo2Bar"; update
snake_to_camel (and its local flags capitalize_next and seen_output) so that
capitalize_next is only cleared when an alphabetic character is processed.
Specifically, leave the underscore handling as-is, but when iterating characters
ensure digits (and other non-alpha non-underscore chars) are appended without
resetting capitalize_next, and only set capitalize_next = false inside the
branches that handle alphabetic conversion (where you call is_ascii_alpha and
ascii_upper/ascii_lower), preserving seen_output behavior; refer to symbols
snake_to_camel, normalize_to_lower_snake, capitalize_next, seen_output,
is_ascii_alpha, ascii_upper, and ascii_lower.
include/serde/serde.h (1)

369-388: Silent fallback to default enum value on unknown strings.

When map_string_to_enum returns std::nullopt (unrecognized string), the code silently assigns enum_t{} (zero-initialized). This is likely intentional for LSP forward-compatibility (unknown enum values from newer protocol versions), but it could mask protocol errors or data corruption.

Consider whether logging/tracing the fallback or providing an opt-in strict mode would be valuable for debugging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/serde/serde.h` around lines 369 - 388, The code currently falls back
silently to enum_t{} when serde::detail::map_string_to_enum<enum_t, typename
meta::enum_policy>(enum_text) returns nullopt; change this to either emit a
trace/log and then fallback or to fail fast in a strict opt-in mode: when
parsed.has_value() is false, call a logging/tracing helper (e.g.
serde::detail::report_unknown_enum(enum_text, typeid(enum_t).name()) or similar)
and keep the fallback, and additionally honor a strict flag on typename
meta::enum_policy (e.g. meta::enum_policy::strict) so that when strict is true
the code returns std::unexpected(...) instead of assigning enum_t{}; make these
changes inside the block that uses enum_t, map_string_to_enum, deserialize_value
and d_struct so behavior is controlled by meta::enum_policy.
scripts/lsp_codegen.py (3)

344-344: Audit: urlopen accepts arbitrary URL schemes.

The url parameter comes from --fetch-url CLI arg and is passed directly to urlopen, which accepts file:// and other schemes. For a dev-only code generator this is low risk, but consider validating the scheme if this script could be invoked in CI with untrusted input.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` at line 344, The call to
urllib.request.urlopen(source, ...) accepts arbitrary schemes and the CLI
--fetch-url value is used directly; before calling urlopen (the line using
urlopen and the variable source), parse the URL with urllib.parse.urlparse and
enforce allowed schemes (e.g. "http" and "https"), returning an error/exit if
the scheme is missing or not allowed; update the code path that handles fetching
(the code that constructs/uses source) to validate and reject unsafe schemes
rather than passing them to urlopen.

888-890: Dead/misleading ternary branch; would be an IndexError if reached.

comments is guaranteed non-empty by lines 885–886, so if comments is always true. However, if that guarantee ever broke, the else suffix path would crash with IndexError because it still assigns to comments[-1] on an empty list. Simplify:

         if inherited_from is not None:
             suffix = f"(Inherited from [{inherited_from}])"
-            comments[-1] = f"{comments[-1]} {suffix}" if comments else suffix
+            comments[-1] = f"{comments[-1]} {suffix}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` around lines 888 - 890, The conditional ternary
around updating comments is misleading and invalid if comments were ever empty;
since lines above ensure comments is non-empty, remove the unnecessary if/else
and directly append the suffix to the last comment: compute suffix from
inherited_from and set comments[-1] = f"{comments[-1]} {suffix}" (use the
existing inherited_from/suffix/comments variables). Alternatively, if you want
defensive code, replace the ternary with a simple if comments: comments[-1] =
f"{comments[-1]} {suffix}" else: comments.append(suffix).

803-806: Add strict=True to zip() for defensive consistency.

The length check on line 801 makes this safe, but adding strict=True documents the invariant and guards against future refactors that might remove the length check.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` around lines 803 - 806, The zip over child_items and
parent_items in the generator passed to all(...) should be called with
strict=True to enforce that both iterables have equal length; update the
expression that yields self.is_type_subtype(child_item, parent_item) for
child_item, parent_item in zip(child_items, parent_items) to use
zip(child_items, parent_items, strict=True) so that the existing length
invariant is documented and guarded against future refactors (this concerns the
code that calls self.is_type_subtype with child_items and parent_items).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/language/ts.h`:
- Around line 51-73: The aliases LSPArray and LSPObject currently use LSPAny
while LSPAny is incomplete, which is undefined behavior on some toolchains;
change them to hold an indirection to a complete type (e.g., use
std::vector<std::unique_ptr<LSPAny>> and std::unordered_map<std::string,
std::unique_ptr<LSPAny>>) or alternatively std::unique_ptr<LSPVariant> if you
prefer to store the variant type; update any code constructing or accessing
these containers to allocate and dereference the unique_ptrs and keep the type
names (LSPArray, LSPObject, LSPAny, LSPVariant) so callers can be updated
consistently.

In `@include/serde/serde.h`:
- Around line 686-701: The loop handling parsed_key failure doesn't consume the
corresponding value when d_map->invalid_key(**key) returns success (lenient
mode), leaving has_pending_value true and breaking the next next_key() call;
before the continue in the invalid_key success branch inside the parsed_key
handling, call the routine that discards the pending value (invoke skip_value()
/ the function used to skip the current pending value in this parser) so the
pending value is cleared and control can safely continue; update the branch
handling d_map->invalid_key(**key) success in the code around parsed_key and
has_pending_value/next_key to call skip_value() before continuing.

In `@scripts/lsp_codegen.py`:
- Around line 1348-1349: Change the misleading print in the else branch that
currently outputs "[WARNING] no suspicious optional-bool defaults detected" to
use the "[INFO]" prefix instead, i.e. update the print statement that prints "no
suspicious optional-bool defaults detected" so it matches the informational
pattern used on the earlier line (the same message prefix as on line 1342).
- Around line 497-501: The branch handling inline literal types currently
collapses every literal (when kind == "literal") to "LspEmptyObject" even if
type_expr.get("value", {}).get("properties", []) is non-empty; change this so
non-empty inline object literals emit a meaningful type instead of silently
dropping structure: either generate an anonymous struct type from the properties
(preferred) or fall back to a more honest representation such as "json::Value"
and include a clear // TODO comment explaining the fallback; update the code
paths that currently return "LspEmptyObject" to inspect properties and return
the generated anonymous struct name or the fallback, keeping the empty-case
return as-is.

---

Nitpick comments:
In `@include/serde/serde.h`:
- Around line 369-388: The code currently falls back silently to enum_t{} when
serde::detail::map_string_to_enum<enum_t, typename meta::enum_policy>(enum_text)
returns nullopt; change this to either emit a trace/log and then fallback or to
fail fast in a strict opt-in mode: when parsed.has_value() is false, call a
logging/tracing helper (e.g. serde::detail::report_unknown_enum(enum_text,
typeid(enum_t).name()) or similar) and keep the fallback, and additionally honor
a strict flag on typename meta::enum_policy (e.g. meta::enum_policy::strict) so
that when strict is true the code returns std::unexpected(...) instead of
assigning enum_t{}; make these changes inside the block that uses enum_t,
map_string_to_enum, deserialize_value and d_struct so behavior is controlled by
meta::enum_policy.

In `@include/serde/spelling.h`:
- Around line 240-286: The function map_string_to_enum is marked constexpr but
calls non-constexpr helpers (detail::snake_to_camel and other normalization
helpers), so change its declaration to remove the constexpr specifier; update
the signature of map_string_to_enum<...> to be a normal function (not constexpr)
so the declaration matches its non-constexpr behavior and avoids misleading
callers, referencing the function name map_string_to_enum and the helper calls
detail::snake_to_camel and any normalize_to_lower_snake usages when making the
edit.
- Around line 113-136: The snake_to_camel function currently consumes
capitalize_next for any non-underscore character (including digits), so
"foo_2bar" becomes "foo2bar" instead of "foo2Bar"; update snake_to_camel (and
its local flags capitalize_next and seen_output) so that capitalize_next is only
cleared when an alphabetic character is processed. Specifically, leave the
underscore handling as-is, but when iterating characters ensure digits (and
other non-alpha non-underscore chars) are appended without resetting
capitalize_next, and only set capitalize_next = false inside the branches that
handle alphabetic conversion (where you call is_ascii_alpha and
ascii_upper/ascii_lower), preserving seen_output behavior; refer to symbols
snake_to_camel, normalize_to_lower_snake, capitalize_next, seen_output,
is_ascii_alpha, ascii_upper, and ascii_lower.

In `@scripts/lsp_codegen.py`:
- Line 344: The call to urllib.request.urlopen(source, ...) accepts arbitrary
schemes and the CLI --fetch-url value is used directly; before calling urlopen
(the line using urlopen and the variable source), parse the URL with
urllib.parse.urlparse and enforce allowed schemes (e.g. "http" and "https"),
returning an error/exit if the scheme is missing or not allowed; update the code
path that handles fetching (the code that constructs/uses source) to validate
and reject unsafe schemes rather than passing them to urlopen.
- Around line 888-890: The conditional ternary around updating comments is
misleading and invalid if comments were ever empty; since lines above ensure
comments is non-empty, remove the unnecessary if/else and directly append the
suffix to the last comment: compute suffix from inherited_from and set
comments[-1] = f"{comments[-1]} {suffix}" (use the existing
inherited_from/suffix/comments variables). Alternatively, if you want defensive
code, replace the ternary with a simple if comments: comments[-1] =
f"{comments[-1]} {suffix}" else: comments.append(suffix).
- Around line 803-806: The zip over child_items and parent_items in the
generator passed to all(...) should be called with strict=True to enforce that
both iterables have equal length; update the expression that yields
self.is_type_subtype(child_item, parent_item) for child_item, parent_item in
zip(child_items, parent_items) to use zip(child_items, parent_items,
strict=True) so that the existing length invariant is documented and guarded
against future refactors (this concerns the code that calls self.is_type_subtype
with child_items and parent_items).

Comment on lines +1348 to +1349
else:
print("[WARNING] no suspicious optional-bool defaults detected")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading [WARNING] prefix for an informational "all clear" message.

When there are no suspicious defaults, printing [WARNING] no suspicious optional-bool defaults detected is confusing — it looks like a warning when it's actually an all-clear status. Use [INFO] instead, matching the pattern on line 1342.

     else:
-        print("[WARNING] no suspicious optional-bool defaults detected")
+        print("[INFO] no suspicious optional-bool defaults detected")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` around lines 1348 - 1349, Change the misleading print
in the else branch that currently outputs "[WARNING] no suspicious optional-bool
defaults detected" to use the "[INFO]" prefix instead, i.e. update the print
statement that prints "no suspicious optional-bool defaults detected" so it
matches the informational pattern used on the earlier line (the same message
prefix as on line 1342).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (6)
scripts/lsp_codegen.py (6)

803-806: zip() without strict=True.

The length check on line 801 makes this safe at runtime, but adding strict=True would make the intent self-documenting and guard against future refactors that might remove the length check.

             return all(
                 self.is_type_subtype(child_item, parent_item)
-                for child_item, parent_item in zip(child_items, parent_items)
+                for child_item, parent_item in zip(child_items, parent_items, strict=True)
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` around lines 803 - 806, Update the zip call in the
is_type_subtype logic to use strict=True so the pairwise iteration enforces
equal lengths even if the earlier length check is removed in the future;
specifically change the generator in the return statement that iterates over
child_items and parent_items (the zip(...) inside the is_type_subtype-related
code) to pass strict=True. Ensure the environment supports Python 3.10+ where
zip(strict=True) is available.

888-890: Dead else branch in ternary — comments is always non-empty here.

Lines 885–886 guarantee comments has at least one element, so the if comments else suffix ternary's false branch is unreachable. Simplify for clarity:

         if inherited_from is not None:
             suffix = f"(Inherited from [{inherited_from}])"
-            comments[-1] = f"{comments[-1]} {suffix}" if comments else suffix
+            comments[-1] = f"{comments[-1]} {suffix}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` around lines 888 - 890, The ternary assigning to
comments[-1] is dead because comments is guaranteed non-empty when
inherited_from is not None; replace the ternary expression in the block handling
inherited_from with a direct append to the last comment: compute suffix =
f"(Inherited from [{inherited_from}])" and set comments[-1] = f"{comments[-1]}
{suffix}" (remove the "if comments else suffix" branch) to simplify the code
around the inherited_from handling.

1228-1237: Return type dict[str, object] forces # type: ignore at every call site.

The generate_files return value has a known shape but is typed as dict[str, object], requiring # type: ignore[assignment] on lines 1337, 1344, 1351, and 1355. A TypedDict or a @dataclass would eliminate those suppressions and make the contract explicit.

Sketch using a dataclass
`@dataclass`
class GenerationSummary:
    struct_count: int
    enum_count: int
    alias_count: int
    output_file: str
    keyword_hits: list[str]
    bool_warnings: list[str]
    unsafe_overrides: list[str]
    member_collisions: list[str]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` around lines 1228 - 1237, The current generate_files
function returns an untyped dict (with keys like "struct_count", "enum_count",
"alias_count", "output_file", and lists from generator.keyword_hits,
generator.bool_default_warnings, generator.unsafe_override_warnings,
generator.member_collision_warnings) which forces callers to use # type: ignore;
define a concrete return type (either a TypedDict or a `@dataclass` named e.g.
GenerationSummary) with typed fields matching those keys (struct_count:int,
enum_count:int, alias_count:int, output_file:str, keyword_hits:list[str],
bool_warnings:list[str], unsafe_overrides:list[str],
member_collisions:list[str]), update generate_files to return that type instead
of dict[str, object], adjust import typing/dataclasses, and remove the # type:
ignore suppressions at the call sites (ensure callers expect GenerationSummary
and access the same attributes).

1209-1210: Duplicate of build_node_order() logic.

generator.build_node_order() (line 713) already wraps build_node_dependencies() + topological_order(). Consider using it directly:

-    nodes, node_deps = generator.build_node_dependencies()
-    node_order = topological_order(nodes, node_deps)
+    node_order = generator.build_node_order()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` around lines 1209 - 1210, Replace the manual two-step
sequence that calls generator.build_node_dependencies() and topological_order()
with the single helper generator.build_node_order() call: remove the nodes,
node_deps = generator.build_node_dependencies() and node_order =
topological_order(nodes, node_deps) lines and instead call
generator.build_node_order() to obtain the node_order (assign its return to the
same node_order variable and update any downstream usage if the helper returns a
different shape). This eliminates the duplicate logic and uses the existing
build_node_order() wrapper.

344-344: URL scheme is not validated before opening.

urllib.request.urlopen accepts file:// and other non-HTTP schemes. Since the URL can be user-supplied via --fetch-url, consider restricting to https:// (or http://) to avoid unintended local file reads, even though this is a developer-facing CLI tool.

Proposed fix
+    parsed_url = urllib.parse.urlparse(source)
+    if parsed_url.scheme not in ("http", "https"):
+        raise RuntimeError(f"unsupported URL scheme: {parsed_url.scheme!r}")
+
     try:
         with urllib.request.urlopen(source, timeout=timeout) as response:

This would also require adding import urllib.parse at the top of the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` at line 344, The call to
urllib.request.urlopen(source, timeout=timeout) is unguarded and can open
non-HTTP schemes (e.g., file://); update the fetch routine that reads the
user-supplied --fetch-url (the code using the variable source and calling
urllib.request.urlopen) to validate the URL scheme first using
urllib.parse.urlparse and only allow http or https schemes (explicitly reject or
raise an error for other schemes), and add import urllib.parse at the top of the
file; ensure the error message clearly indicates the invalid scheme and that
only http/https are permitted.

395-400: Cycles in the dependency graph are silently absorbed.

When len(ordered) != len(nodes), the remaining (cyclically-dependent) nodes are appended without any diagnostic. This can produce a generated header with forward-reference errors that are hard to trace back to a dependency cycle. Consider emitting a warning listing the affected nodes.

Proposed fix
     if len(ordered) != len(nodes):
         existing = set(ordered)
+        cycle_nodes = sorted(n for n in nodes if n not in existing)
+        import sys as _sys
+        print(
+            f"[WARNING] dependency cycle detected; appending {len(cycle_nodes)} node(s) in arbitrary order: "
+            + ", ".join(f"{k}:{n}" for k, n in cycle_nodes),
+            file=_sys.stderr,
+        )
         for node in sorted(nodes):
             if node not in existing:
                 ordered.append(node)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lsp_codegen.py` around lines 395 - 400, The topological-sort logic
silently appends remaining cyclic nodes (when len(ordered) != len(nodes));
change it to detect the cycle, compute the cyclic set as remaining =
sorted(set(nodes) - set(ordered)), log or warn (e.g., using logging.warning or
sys.stderr) with a clear message that names the affected nodes before
proceeding, then (if desired) append them as the fallback; update the block that
currently builds existing/ordered to emit this warning and include the list of
remaining node names for diagnostics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@scripts/lsp_codegen.py`:
- Around line 497-501: The literal-type branch currently collapses both empty
and non-empty inline object literals to "LspEmptyObject"; update the branch
under kind == "literal" (the code that reads properties via
type_expr.get("value", {}).get("properties", [])) so that only the
empty-properties case returns "LspEmptyObject" and non-empty inline object
literals are preserved (either add a TODO comment explaining this limitation on
the non-empty branch or return a safe fallback such as "serde_json::Value" /
"json::Value" instead of "LspEmptyObject").
- Around line 1348-1349: Replace the misleading print in the else branch that
emits "[WARNING] no suspicious optional-bool defaults detected" with an
informational message; update the string to "[INFO] no suspicious optional-bool
defaults detected" where the print call is in scripts/lsp_codegen.py (the else
branch that prints the all-clear message) so it matches the "[INFO]" pattern
used earlier around line 1342.

---

Nitpick comments:
In `@scripts/lsp_codegen.py`:
- Around line 803-806: Update the zip call in the is_type_subtype logic to use
strict=True so the pairwise iteration enforces equal lengths even if the earlier
length check is removed in the future; specifically change the generator in the
return statement that iterates over child_items and parent_items (the zip(...)
inside the is_type_subtype-related code) to pass strict=True. Ensure the
environment supports Python 3.10+ where zip(strict=True) is available.
- Around line 888-890: The ternary assigning to comments[-1] is dead because
comments is guaranteed non-empty when inherited_from is not None; replace the
ternary expression in the block handling inherited_from with a direct append to
the last comment: compute suffix = f"(Inherited from [{inherited_from}])" and
set comments[-1] = f"{comments[-1]} {suffix}" (remove the "if comments else
suffix" branch) to simplify the code around the inherited_from handling.
- Around line 1228-1237: The current generate_files function returns an untyped
dict (with keys like "struct_count", "enum_count", "alias_count", "output_file",
and lists from generator.keyword_hits, generator.bool_default_warnings,
generator.unsafe_override_warnings, generator.member_collision_warnings) which
forces callers to use # type: ignore; define a concrete return type (either a
TypedDict or a `@dataclass` named e.g. GenerationSummary) with typed fields
matching those keys (struct_count:int, enum_count:int, alias_count:int,
output_file:str, keyword_hits:list[str], bool_warnings:list[str],
unsafe_overrides:list[str], member_collisions:list[str]), update generate_files
to return that type instead of dict[str, object], adjust import
typing/dataclasses, and remove the # type: ignore suppressions at the call sites
(ensure callers expect GenerationSummary and access the same attributes).
- Around line 1209-1210: Replace the manual two-step sequence that calls
generator.build_node_dependencies() and topological_order() with the single
helper generator.build_node_order() call: remove the nodes, node_deps =
generator.build_node_dependencies() and node_order = topological_order(nodes,
node_deps) lines and instead call generator.build_node_order() to obtain the
node_order (assign its return to the same node_order variable and update any
downstream usage if the helper returns a different shape). This eliminates the
duplicate logic and uses the existing build_node_order() wrapper.
- Line 344: The call to urllib.request.urlopen(source, timeout=timeout) is
unguarded and can open non-HTTP schemes (e.g., file://); update the fetch
routine that reads the user-supplied --fetch-url (the code using the variable
source and calling urllib.request.urlopen) to validate the URL scheme first
using urllib.parse.urlparse and only allow http or https schemes (explicitly
reject or raise an error for other schemes), and add import urllib.parse at the
top of the file; ensure the error message clearly indicates the invalid scheme
and that only http/https are permitted.
- Around line 395-400: The topological-sort logic silently appends remaining
cyclic nodes (when len(ordered) != len(nodes)); change it to detect the cycle,
compute the cyclic set as remaining = sorted(set(nodes) - set(ordered)), log or
warn (e.g., using logging.warning or sys.stderr) with a clear message that names
the affected nodes before proceeding, then (if desired) append them as the
fallback; update the block that currently builds existing/ordered to emit this
warning and include the list of remaining node names for diagnostics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant