Skip to content

feat: add support for per-level alphabet in NFT#618

Open
tmokenc wants to merge 3 commits intoVeriFIT:develfrom
tmokenc:nft-per-level-alphabet
Open

feat: add support for per-level alphabet in NFT#618
tmokenc wants to merge 3 commits intoVeriFIT:develfrom
tmokenc:nft-per-level-alphabet

Conversation

@tmokenc
Copy link
Copy Markdown

@tmokenc tmokenc commented Apr 15, 2026

This PR adds support for per-level alphabets in mata::nft::Nft.

The main addition is:

  • std::vector<Alphabet*> level_alphabets

with the following semantics:

  • in case the vector is empty, Nft can fallback to use the inherited nfa::Nfa::alphabet
  • if different levels use different alphabets, Nft::alphabet is set to nullptr, and the per-level alphabets are stored in level_alphabets
  • When all levels share the same alphabet, Nft can still expose a shared alphabet through the inherited Nfa::Nfa field. (not sure if this is necessary)

The PR also adds two helper functions:

  • alphabet_of_level returns the alphabet for a given level and falls back to the inherited Nfa alphabet when no level-specific alphabet is available
  • set_level_alphabets to assign or update alphabets

This PR does not yet introduce any changes to NFT operations or to the .mata format for NFTs. The .mata format part likely deserves a separate discussion.

tmokenc added 2 commits April 16, 2026 01:35
This is a very naive implementation, not sure if it is actually correct
or I missed something.

(cherry picked from commit 705db38)
@tmokenc tmokenc requested a review from Adda0 as a code owner April 15, 2026 23:42
@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented Apr 16, 2026

I think it would be cleaner if we had a single Alphabet* alphabet argument, as until now, where one Alphabet class implementation could be called LevelAlphabet, and this alphabet would then contain the vector of alphabets.

The abstract alphabet would probably have to allow for translating optionally on a specific level, so Alphabet::translate_symb(Symbol symbol, std::optional<Level> level = std::nullopt), and all alphabets except the level one would ignore the level (and translate the symbol as normal).

Most, if not all, of the additional functions here would then be unnecessary.

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 16, 2026

I agree that it would be cleaner, reduce a lot of the confusion and potentially be better for the serialisation/deserialisation of the .mata format later on. I will work on it as soon as possible.

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 17, 2026

@Adda0 Please take a look. I refactored the code to use LevelAlphabet and moved the per-level alphabet logic into it. I also added level-aware functions to Alphabet. For normal alphabets, these simply ignore the level and use their non-level counterparts.
This makes it possible for NFTs to use a regular alphabet as though all levels shared the same one. When different levels need different alphabets, they can use LevelAlphabet.

This is my first C++ project, so I may have missed some conventions or done something in a way that is uncommon in C++.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 56.06061% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.83%. Comparing base (b5f27b8) to head (9e1d342).
⚠️ Report is 31 commits behind head on devel.

Files with missing lines Patch % Lines
src/nft/nft.cc 14.28% 12 Missing ⚠️
include/mata/alphabet.hh 59.25% 11 Missing ⚠️
src/alphabet.cc 71.42% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #618      +/-   ##
==========================================
- Coverage   72.90%   72.83%   -0.08%     
==========================================
  Files          45       45              
  Lines        6795     6832      +37     
  Branches     1538     1541       +3     
==========================================
+ Hits         4954     4976      +22     
- Misses       1227     1239      +12     
- Partials      614      617       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread include/mata/alphabet.hh
Comment on lines +70 to +73
virtual std::string reverse_translate_symbol(Symbol symbol, size_t level) const {
(void)level;
return reverse_translate_symbol(symbol);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: We definitely need to rename these monstrosities when we are refactoring alphabets. It is awful to work with this in a function call, e.g., something like minimize(nfa, nfa->alphabet.reverse_translate_symbol(symbol), true)...

Comment thread include/mata/alphabet.hh
* @param[in] level Level on which the symbols should be retrieved.
* @return Set of symbols known to the alphabet on the given level.
*/
virtual utils::OrdVector<Symbol> get_alphabet_symbols(size_t level) const {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to just use std::optional<Level> level and have one function for each of the operations. Also, definitely use Level instead of size_t everywhere for levels.

Comment thread include/mata/alphabet.hh
*
* @param[in] level Level on which the alphabet should be cleared.
*/
virtual void clear(size_t level) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these functions, I would really rather just have one function with std::optional<Level> level = std::nullopt (including the default). Makes it a bit more concise and readable, IMO.

Comment thread include/mata/alphabet.hh
* @return Resolved alphabet reference.
* @throws std::runtime_error If the level is invalid.
*/
const Alphabet& resolve_alphabet(size_t level) const;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe get_alphabet_for_level(Level level)? resolve sounds like it does actually something complex rather than just returning an element from a vector.

@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented Apr 20, 2026

It looks like you can ignore the failing CI actions. Not related.

The approach and the interface look good to me overall. It is a schame we cannot hide the invalid operations (with/without levels) for specific alphabet types without something like std::variant etc. Maybe even exploring something like std::variant might be worth it... I have not thought about it yet, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants