Skip to content

Feat/readability metric#8

Open
Noel-Bastubbe wants to merge 15 commits intomainfrom
feat/readability-metric
Open

Feat/readability metric#8
Noel-Bastubbe wants to merge 15 commits intomainfrom
feat/readability-metric

Conversation

@Noel-Bastubbe
Copy link
Collaborator

No description provided.

@Noel-Bastubbe Noel-Bastubbe marked this pull request as ready for review February 27, 2026 12:02
@Noel-Bastubbe Noel-Bastubbe requested a review from lisehr February 27, 2026 12:03
@Tomic-Riedel
Copy link
Collaborator

Hey, Lisa tasked me with checking the PRs for framework conventions and I found a few things to address before we can merge this:

metis/metric/readability/__init__.py and README.md
Neither of these should exist inside a dimension folder. None of the other dimensions (completeness/, consistency/, etc.) have them, and it's important to keep that consistent. Please remove both files and move any relevant documentation to docs/ or the main README.

Class naming
I see that ReadabilityLLM and ReadabilityWordNet are the "real" classes, with readability_llm and readability_wordnet as thin alias subclasses at the bottom just for registry registration. The convention (which all four existing metrics follow) is that the metric class itself carries the {dimension}_{technique} name and there shouldn't be an intermediate PascalCase class at all. Could you flatten these into a single class each?

DQmetric values
DQmetric="llm" and DQmetric="readability_wordnet" don't follow the PascalCase short-name convention we use elsewhere (e.g. "NullRatio", "CountFDViolations", "DuplicateCount"). Please update to something like "LLM" and "WordNet".

Helper files in the dimension folder
llm_backend.py, scorers.py, and tokenization.py are utility modules sitting inside metis/metric/readability/. Was there a particular reason for keeping them there? If not, the convention is that helpers live in metis/utils/. In that case I'd suggest moving them there, possibly under a metis/utils/readability/ subfolder if they're readability-specific.

DQgranularity="schema"
This introduces a new granularity level that doesn't exist anywhere else in the framework right now. Before we add a new one I think we should loop in @lisehr to check whether a "schema" granularity level makes sense to add to the framework.

Docstrings
Both assess() methods are missing docstrings. The base class has a detailed template for exactly this, so please add them.

Thanks!

@@ -0,0 +1,51 @@
from __future__ import annotations

import re
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please shortly comment what this does and why you couldn't use the python tokenization function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tomic-Riedel thanks for all the comments. We agreed that "schema" granularity make sense for readability since we might also have more schema metrics in the future. For now, we focused on the data level per default.

# Utilities
# ----------------------------

def load_abbreviations(abbr_csv: Optional[str]) -> Dict[str, str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, please shortly comment what this class is about

Copy link
Collaborator

@lisehr lisehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and free to merge together with the other files, the small comments can be fixed lateron

@lisehr lisehr self-assigned this Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants