Skip to content

refactor: remove redundant fields from TaskSolution and ValidationResult#50

Merged
kohankhaki merged 3 commits into
mainfrom
refactor/remove-redundant-schema-fields
Jan 23, 2026
Merged

refactor: remove redundant fields from TaskSolution and ValidationResult#50
kohankhaki merged 3 commits into
mainfrom
refactor/remove-redundant-schema-fields

Conversation

@kohankhaki
Copy link
Copy Markdown
Contributor

@kohankhaki kohankhaki commented Jan 21, 2026

PR Type

Fix

Short Description

Remove redundant task_id and task string fields from TaskSolution and ValidationResult dataclasses. These fields duplicated information already available in the nested objects (task_obj and task_solution). Added convenience properties to maintain backward compatibility for accessing these values. Also clarified documentation for the reasoning field to distinguish it from Bloom's taxonomy level.

Tests Added

None


This change is Reviewable

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes redundant data duplication in TaskSolution and ValidationResult dataclasses by eliminating direct task_id and task string fields that duplicated information from nested objects. The refactor maintains backward compatibility through convenience properties and updates documentation to clarify field purposes.

Changes:

  • Removed redundant task_id and task fields from TaskSolution, keeping only the task object (renamed from task_obj)
  • Removed redundant task_id and task fields from ValidationResult, relying on the nested task_solution object
  • Added convenience properties (task_id and task_text) to both classes for backward-compatible access

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/schemas/solution_schemas.py Refactored TaskSolution to remove redundant fields and rename task_obj to task, added convenience properties
src/schemas/validation_schemas.py Refactored ValidationResult to remove redundant fields, added convenience properties
src/schemas/GENERATION_PIPELINE_SCHEMAS.md Updated documentation to reflect schema changes and clarify the reasoning field purpose
src/base_stages/solve_tasks.py Updated TaskSolution instantiation to use new schema structure
src/base_stages/validate_tasks.py Updated references from task_obj to task and from task to task_text

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/schemas/solution_schemas.py
- `task_obj`: Task (required, Task dataclass object with full hierarchy)
- `task`: Task (required, Task dataclass object with full hierarchy including task_id, task text, capability, etc.)
- `solution`: String (required, the final answer/solution to the task)
- `reasoning`: String (required, the step-by-step explanation generated by the solver showing how the solution was derived. This is the LLM's chain-of-thought when solving the task, NOT Bloom's taxonomy level which is stored in `task.bloom_level`)
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clarification about Bloom's taxonomy appears defensive and may indicate past confusion. Consider simplifying to focus on what 'reasoning' IS rather than what it is NOT. For example: 'the step-by-step explanation generated by the solver showing how the solution was derived (chain-of-thought)'

Suggested change
- `reasoning`: String (required, the step-by-step explanation generated by the solver showing how the solution was derived. This is the LLM's chain-of-thought when solving the task, NOT Bloom's taxonomy level which is stored in `task.bloom_level`)
- `reasoning`: String (required, the step-by-step explanation generated by the solver showing how the solution was derived; this is the model's chain-of-thought for the task. The Bloom's taxonomy level for the task is stored separately in `task.bloom_level`.)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@afkanpour afkanpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afkanpour reviewed 5 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @kohankhaki and @saidul-islam98).


src/schemas/validation_schemas.py line 26 at r1 (raw file):

    def task_id(self) -> str:
        """Get task_id from the task_solution for convenience."""
        return self.task_solution.task_id

Does task_solution contain task_id?

Code quote:

task_id

src/schemas/solution_schemas.py line 31 at r1 (raw file):

    def task_text(self) -> str:
        """Get task text from the task object for convenience."""
        return self.task.task

Does this refer to the problem statement? If so, can you rename it to task_statement?

Code quote:

task

Copy link
Copy Markdown
Contributor Author

@kohankhaki kohankhaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kohankhaki made 2 comments.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @afkanpour and @saidul-islam98).


src/schemas/solution_schemas.py line 31 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

Does this refer to the problem statement? If so, can you rename it to task_statement?

Updated all of the task variable name to task_statement wherever we are referring to problem statement


src/schemas/validation_schemas.py line 26 at r1 (raw file):

Previously, afkanpour (Arash) wrote…

Does task_solution contain task_id?

Yes, TaskSolution has a task_id property that returns self.task.task_id.

@kohankhaki kohankhaki force-pushed the refactor/remove-redundant-schema-fields branch from 37e01b2 to 68d9e4c Compare January 22, 2026 18:24
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our current pipeline, we are storing the following information in each tasks:

{
      "task_id": "Easy_Remember__task_000",
      "task_statement": "<task_statement>",
      "choices": {
        "A": "<option_text>",
        "B": "<option_text>",
        "C": "<option_text>",
        "D": "<option_text>"
      },
      "correct_answer": "<correct answer option>",
      "chapter_id": "<domain_book_name_chapter_number>",
      "chapter_relpath": "<bookname/domain_book_name_chapter_number>", # relative path of the chapter
      "difficulty": "Easy/Medium/Hard",
      "blooms_level": "Remember/Understand/Apply/Analyze/Evaluate/Create",
    }

Could you please add "correct_answer", "chapter_id", and "chapter_relpath" fields? (Or maybe I can add and update the schema them in my PR if you think its more appropriate). Also, can the choices be just a dictionary (as there can only be 4 choices in an MCQ question)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the schema should stay as-is to keep it general, but let me know what you think:

Re: choices format
The current List[Dict] format ([{"label": "A", "solution": "..."}]) is more flexible—it allows adding metadata per choice if needed later (e.g., explanation for why an option is wrong). A simple dict works for basic MCQ but limits extensibility.

Re: correct_answer
This belongs in TaskSolution, not Task. The Task represents the problem before it's solved (Stage 3). The correct answer is determined in Stage 4 (solution generation) and stored in TaskSolution.solution.

Re: chapter_id, chapter_relpath
These are domain-specific (textbook structure). Use generation_metadata for these:

Task(
    task_id="Easy_Remember__task_000",
    task_statement="...",
    choices=[{"label": "A", "solution": "..."}, ...],
    difficulty="Easy",
    bloom_level="Remember",
    generation_metadata={
        "chapter_id": "domain_book_name_chapter_number",
        "chapter_relpath": "bookname/domain_book_name_chapter_number",
    }
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I will update my saving function accordingly. Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use the standard save_tasks(), load_tasks() etc. from io_utils.py - no need for a custom saving function.

The only difference is how you construct the objects:

task = Task(
    task_id="Easy_Remember__task_000",
    task_statement="...",
    choices=[{"label": "A", "solution": "..."}, ...],
    difficulty="Easy",
    bloom_level="Remember",
    capability=capability,
    generation_metadata={
        "chapter_id": "...",
        "chapter_relpath": "...",
    }
)

# Standard save/load works
save_tasks([task], metadata, path)
tasks, metadata = load_tasks(path)

# Access your custom fields via generation_metadata
chapter_id = tasks[0].generation_metadata.get("chapter_id")

Copy link
Copy Markdown
Collaborator

@afkanpour afkanpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afkanpour reviewed 7 files and all commit messages, and resolved 2 discussions.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @kohankhaki and @saidul-islam98).

@kohankhaki kohankhaki merged commit 6bb8af1 into main Jan 23, 2026
0 of 2 checks passed
@kohankhaki kohankhaki deleted the refactor/remove-redundant-schema-fields branch January 23, 2026 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants