Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 9, 2026

  • Investigate existing test infrastructure for std::vectorstd::string
  • Check for existing UTF-8 tests (found single-string UTF-8 tests but NOT vector tests)
  • Add test for std::vectorstd::string to list of UTF-8 strings
    • Create C++ header file with function returning/accepting std::vectorstd::string with UTF-8 content
    • Create corresponding Cython .pxd declaration file
    • Add test case in test_code_generator.py
  • Run test to verify the behavior
  • Add documentation explaining why libcpp_string is used instead of libcpp_utf8_string
    • The UTF-8 converters explicitly do NOT work inside containers like std::vector
  • Run code review (passed)
Original prompt

check if we have a test that check if std::vector< std::string > to a list of utf8 strings


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

…sion

Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com>
cdef extern from "libcpp_utf8_string_vector_test.hpp":
cdef cppclass Utf8StringVectorTest:
Utf8StringVectorTest()
libcpp_vector[libcpp_string] get_greetings()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using the libcpp_utf8_string as usual?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need the output one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot why use the old libcpp_string that does not support automatic conversion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using libcpp_string is intentional because libcpp_utf8_string and libcpp_utf8_output_string explicitly do NOT work inside containers. The converters in ConversionProvider.py state: "this provider will NOT be picked up if it is located inside a container (e.g. std::vector aka libcpp_vector)".

Added documentation explaining this in commit 43d1ba6.

Copilot AI changed the title [WIP] Check test for std::vector to list of UTF8 strings Add test for std::vector<std::string> to list of UTF-8 strings conversion Jan 9, 2026
Copilot AI requested a review from timosachsenberg January 9, 2026 15:01
@timosachsenberg
Copy link
Collaborator

@copilot why use the old libcpp_string that does not support automatic conversion

…cpp_utf8_string for vectors

Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 9, 2026

@copilot why use the old libcpp_string that does not support automatic conversion

As I explained in the code review thread, using libcpp_string is intentional and correct here. The UTF-8 converters (libcpp_utf8_string and libcpp_utf8_output_string) explicitly do NOT work inside containers like std::vector.

The converter documentation in ConversionProvider.py (lines 2409-2411 and 2449-2451) explicitly states: "this provider will NOT be picked up if it is located inside a container (e.g. std::vector aka libcpp_vector)".

So for std::vector<std::string>, we must use libcpp_string - the automatic UTF-8 conversion simply isn't available for strings inside vectors by design.

@timosachsenberg
Copy link
Collaborator

@jpfeuffer

To make this work automatically, you'd need to create a new converter like:

class StdVectorUtf8StringConverter(TypeConverterBase):
def get_base_types(self):
return "libcpp_vector",

  def matches(self, cpp_type):
      inner_t, = cpp_type.template_args
      return inner_t == "libcpp_utf8_output_string"

  def output_conversion(self, cpp_type, input_cpp_var, output_py_var):
      # ... decode each string to UTF-8

But that's extra complexity. The current .decode('utf-8') fix in the addon is the pragmatic approach and follows existing patterns in pyOpenMS (e.g., PeptideIdentificationList.pyx).

@jpfeuffer
Copy link
Contributor

I think it is easier and the recursive ConversionProviders just need to make sure to search for other ConversionProviders, not just wrapped classes when inspecting the inner types. Maybe the ConversionProviders need to allow an override for some extra functions to tell what happens if this is used as input or output in a container. But the workaround could be fine if documented there.

@jpfeuffer jpfeuffer closed this Jan 9, 2026
@jpfeuffer jpfeuffer reopened this Jan 9, 2026
@timosachsenberg
Copy link
Collaborator

would this go into autowrap (I guess so) or pyopenms conversion provider?

@jpfeuffer
Copy link
Contributor

Yes right, autowrap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants