Skip to content

Conversation

@timosachsenberg
Copy link
Collaborator

@timosachsenberg timosachsenberg commented Jan 9, 2026

This change allows conversion providers (like StdStringUnicodeConverter) to work inside containers (like std::vector) by adding a delegation mechanism.

Changes:

  • Add supports_delegation() method to TypeConverterBase (default: False)
  • Enable delegation for StdStringUnicodeConverter and StdStringUnicodeOutputConverter
  • Modify StdVectorConverter to delegate to element converters when supports_delegation() returns True
  • Add test for UTF-8 string vectors

This fixes the limitation where UTF-8 string converters would not be picked up when used inside containers. Now, vectors of libcpp_utf8_string or libcpp_utf8_output_string properly encode/decode UTF-8 strings.

Summary by CodeRabbit

Release Notes

  • New Features

    • Improved UTF-8 string handling within container types (e.g., vectors) with delegation-based conversion support, enabling better per-element processing.
  • Tests

    • Added comprehensive test coverage for UTF-8 string vector conversion, validating both retrieval and input handling of multi-language strings.

✏️ Tip: You can customize this high-level summary in your review settings.

This change allows conversion providers (like StdStringUnicodeConverter)
to work inside containers (like std::vector) by adding a delegation
mechanism.

Changes:
- Add supports_delegation() method to TypeConverterBase (default: False)
- Enable delegation for StdStringUnicodeConverter and
  StdStringUnicodeOutputConverter
- Modify StdVectorConverter to delegate to element converters when
  supports_delegation() returns True
- Add test for UTF-8 string vectors

This fixes the limitation where UTF-8 string converters would not be
picked up when used inside containers. Now, vectors of libcpp_utf8_string
or libcpp_utf8_output_string properly encode/decode UTF-8 strings.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

This PR introduces delegation support for type converters, enabling UTF-8 string converters to handle per-element conversion within container types. A new supports_delegation() method flags converter capability, and container conversion logic now detects and uses delegating converters for both input and output transformation paths, with accompanying test coverage.

Changes

Cohort / File(s) Summary
Core Delegation Feature
autowrap/ConversionProvider.py
Added supports_delegation() -> bool method to TypeConverterBase (defaults False) and overridden in StdStringUnicodeConverter and StdStringUnicodeOutputConverter (returns True). Added internal helper _has_delegating_converter() to detect delegating converters. Modified input/output conversion paths for container types to detect delegation support and perform explicit per-element conversion loops instead of relying on default Cython container handling.
C++ Test Fixture
tests/test_files/libcpp_utf8_string_vector_test.hpp
New test class Utf8VectorTest with methods: get_greetings() returning vector of UTF-8 strings (Hello, World, Привет, 你好), echo() for round-trip vector conversion, and count_strings() for vector length verification.
Cython Bindings & Test
tests/test_files/libcpp_utf8_string_vector_test.pxd
New Cython declaration file (language_level=3) with extern C++ bindings for Utf8VectorTest class, mapping C++ methods to appropriate Cython types including libcpp_utf8_string and libcpp_utf8_output_string aliases.
Test Implementation
tests/test_code_generator.py
Added test_utf8_string_vector_conversion() validating UTF-8 string vector handling: retrieval of greetings list with non-ASCII characters, input conversion of string/bytes lists, and count verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • jpfeuffer

Poem

🐰 UTF-8 strings now hop through containers with care,
Delegation flags wave through the code everywhere,
From Russian to Chinese, each character finds its way,
Vector conversions dance in a brand new ballet! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Design conversion provider delegation system' directly and specifically describes the main change: introducing a delegation mechanism for conversion providers to work inside containers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Docstrings were successfully generated.

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6236c3d and 8b5ebfb.

📒 Files selected for processing (4)
  • autowrap/ConversionProvider.py
  • tests/test_code_generator.py
  • tests/test_files/libcpp_utf8_string_vector_test.hpp
  • tests/test_files/libcpp_utf8_string_vector_test.pxd
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_code_generator.py (2)
autowrap/Utils.py (1)
  • compile_and_import (65-148)
tests/test_files/libcpp_utf8_string_vector_test.hpp (1)
  • Utf8VectorTest (6-6)
autowrap/ConversionProvider.py (2)
tests/test_code_generator_minimal.py (2)
  • input_conversion (69-75)
  • output_conversion (77-78)
tests/test_files/converters/IntHolderConverter.py (2)
  • input_conversion (18-31)
  • output_conversion (33-34)
🪛 Clang (14.0.6)
tests/test_files/libcpp_utf8_string_vector_test.hpp

[error] 1-1: 'string' file not found

(clang-diagnostic-error)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: test (==3.1.0, 3.10)
  • GitHub Check: test (==3.2.0, 3.11)
  • GitHub Check: test (==3.2.0, 3.12)
  • GitHub Check: test (==3.1.0, 3.13)
  • GitHub Check: test (==3.1.0, 3.12)
  • GitHub Check: test (==3.2.0, 3.10)
  • GitHub Check: test (==3.2.0, 3.13)
  • GitHub Check: test (==3.1.0, 3.11)
🔇 Additional comments (9)
tests/test_files/libcpp_utf8_string_vector_test.hpp (1)

1-19: LGTM! Clean test fixture implementation.

The test fixture correctly implements methods for testing UTF-8 string vector handling, including non-ASCII characters (Cyrillic "Привет" and Chinese "你好"). The structure is appropriate for validating the delegation mechanism.

Note: The static analysis hint about missing 'string' file is a false positive - this occurs because the static analyzer lacks the full C++ include path context. This will compile correctly when the proper include directories are provided.

tests/test_files/libcpp_utf8_string_vector_test.pxd (1)

1-11: LGTM! Correct Cython declarations for UTF-8 vector testing.

The .pxd file correctly:

  • Uses libcpp_utf8_output_string and libcpp_utf8_string type aliases to trigger the delegation-supporting converters
  • Declares method signatures matching the C++ header
  • Follows the established pattern for enabling UTF-8 string conversion within vectors

This properly exercises the new delegation mechanism where StdStringUnicodeConverter and StdStringUnicodeOutputConverter now return True from supports_delegation().

tests/test_code_generator.py (1)

401-450: LGTM! Comprehensive test coverage for UTF-8 vector delegation.

The test properly validates:

  • Output conversion (lines 420-430): Vectors of UTF-8 strings become Python list[str] with correct handling of non-ASCII characters (Russian "Привет", Chinese "你好")
  • Input conversion with str (lines 432-440): Python list of strings is correctly passed to C++ and echoed back
  • Input conversion with bytes (lines 442-445): Bytes input is also handled correctly
  • Simple input path (lines 447-449): count_strings validates the basic input conversion

The test structure follows the established pattern in this file and exercises both directions of the delegation mechanism.

autowrap/ConversionProvider.py (6)

137-146: LGTM! Clean delegation hook design.

The new supports_delegation() method provides a clean opt-in mechanism for converters to enable per-element conversion within containers. The default False ensures backward compatibility with existing converters.


1682-1690: LGTM! Safe delegation detection helper.

The _has_delegating_converter() helper correctly:

  • Performs defensive checks for registry availability
  • Safely handles lookup failures with exception handling
  • Returns a clear boolean result

1910-1981: LGTM! Delegation logic correctly implements per-element conversion.

The new Case 5 properly:

  • Detects when element types support delegation (line 1910)
  • Delegates to the element converter for input conversion (lines 1918-1920)
  • Handles both simple and complex element conversion code paths (lines 1930-1952)
  • Implements proper cleanup for reference parameters with output conversion (lines 1955-1979)

The implementation follows the established patterns in the file and correctly integrates element-level conversion into the container conversion flow.


2079-2106: LGTM! Output conversion delegation mirrors input conversion design.

The output conversion delegation correctly:

  • Detects delegating converters (line 2079)
  • Retrieves element-level output conversion (lines 2087-2093)
  • Handles different output conversion return types (None, renderable, string)
  • Generates proper iteration loop to build Python list (lines 2095-2106)

The symmetric design with input_conversion ensures consistent behavior in both directions.


2540-2541: LGTM! Enables UTF-8 string conversion delegation.

The override correctly enables StdStringUnicodeConverter to handle UTF-8 string conversions when strings are element types inside containers. This directly addresses the PR objective to support UTF-8 string vectors.


2582-2583: LGTM! Completes UTF-8 output delegation support.

The override enables StdStringUnicodeOutputConverter to handle UTF-8 string output conversions within containers, completing the bidirectional delegation capability for UTF-8 string vectors.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #244

coderabbitai bot added a commit that referenced this pull request Jan 9, 2026
Docstrings generation was requested by @timosachsenberg.

* #243 (comment)

The following files were modified:

* `autowrap/ConversionProvider.py`
@jpfeuffer
Copy link
Contributor

Thanks a lot Timo!!
I'm sorry but I just had another idea! I think what actually happens is that for annotations like libcpp_vector[libcpp_string] or any libcpp standard type, the cython autoconversion abilities are used. And these are currently set to ASCII

|#cython: c_string_encoding=ascii

One could probably try to change that but I think it is a global change so you wouldn't be able to change it for some functions but not others. Probably not so important.

This PR here looks ok I think but I wonder if it works for further nesting and if not if it's worth to support further nesting. Deep nesting in autowrap is in general very unstable and untested. Especially when mixing containers 😅
In any case, nesting limitations if any should be documented better.
Maybe I let the AI do that at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants