Skip to content

Conversation

@Shamzik
Copy link
Contributor

@Shamzik Shamzik commented Dec 29, 2025

In this PR I'm adding pcre2-functions.h and pcre2-functions.cpp, into which I've extracted all PCRE2 specifics.

@Shamzik Shamzik added this to the next milestone Dec 29, 2025
@Shamzik Shamzik self-assigned this Dec 29, 2025
@Shamzik Shamzik added the k2 k2 related label Dec 29, 2025
@Shamzik Shamzik marked this pull request as ready for review December 29, 2025 12:27
@Shamzik Shamzik force-pushed the kshamazov/pcre2_functions branch 3 times, most recently from a0b27ae to 5082d72 Compare January 12, 2026 09:43
@Shamzik Shamzik force-pushed the kshamazov/pcre2_functions branch from 5082d72 to b7c8ac8 Compare January 12, 2026 10:00
@Shamzik Shamzik requested a review from apolyakov January 12, 2026 10:52

static constexpr size_t MAX_SUBPATTERNS_COUNT{512};

struct compiled_regex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be private as it's used in the public API

const regex_pcre2_compile_context_t compile_context;
const regex_pcre2_match_context_t match_context;
regex_pcre2_match_data_t regex_pcre2_match_data;
const kphp::pcre2::regex_general_context_t regex_pcre2_general_context;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks odd having these members const while they are used in a non-const contexts, e.g. pcre2_match_8

const kphp::pcre2::regex_match_context_t match_context;
kphp::pcre2::regex_match_data_t regex_pcre2_match_data;

RegexInstanceState() noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving the constructor definition into this header?

#include "runtime-light/coroutine/task.h"
#include "runtime-light/coroutine/type-traits.h"
#include "runtime-light/stdlib/diagnostics/logs.h"
#include "runtime-light/stdlib/string/regex-include.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment that this include is actually used

struct info final {
const string& regex;
const string& subject;
string replacement;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason replacement's type differs from regex and subject ones?

using reference = kphp::pcre2::group_name;

group_name_iterator() = delete;
group_name_iterator(const PCRE2_UCHAR8* current_entry, size_t entry_size) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that:

  1. this API can be made more safe by at least checking if the pointer is nullptr
  2. this API should not be public

m_entry_size{entry_size} {}

kphp::pcre2::group_name operator*() const noexcept {
enum class index_bytes { upper, lower, count };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a really good use case for enums? I think it can be simplified a lot with simple constants


class group_name_iterator {
const PCRE2_UCHAR8* m_ptr{nullptr};
size_t m_entry_size{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be marked const I think

}

bool operator==(const group_name_iterator& other) const noexcept {
return m_ptr == other.m_ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to also check m_entry_size?

}
};

group_name_range names() const noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to group_names? I think names is meaningless here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

k2 k2 related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants