Skip to content

Add comprehensive documentation for MindGYM logic implementation in codebase#1

Draft
Copilot wants to merge 2 commits into
mainfrom
copilot/find-mindgym-logic-in-code
Draft

Add comprehensive documentation for MindGYM logic implementation in codebase#1
Copilot wants to merge 2 commits into
mainfrom
copilot/find-mindgym-logic-in-code

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 25, 2025

User requested location of MindGYM (NeurIPS'25) implementation - a question synthesis system for thinking-centric fine-tuning that achieved 16% improvement on MathVision using only 400 samples.

Documentation Added

Created MindGYM_LOGIC_DOCUMENTATION.md (273 lines) mapping the complete implementation:

Core Components

  • Difficulty Scoring: data_juicer/ops/filter/llm_difficulty_score_filter.py - Multi-dimensional difficulty evaluation (linguistic complexity, conceptual depth, prior knowledge, step complexity, ambiguity) on 1-5 scale
  • QA Generation from Examples: data_juicer/ops/mapper/generate_qa_from_examples_mapper.py - Self-synthesis with ROUGE-L deduplication
  • QA Generation from Text: data_juicer/ops/mapper/generate_qa_from_text_mapper.py - Text-to-QA conversion
  • Query Optimization: data_juicer/ops/mapper/optimize_query_mapper.py - Question refinement
  • Query Calibration: data_juicer/ops/mapper/calibrate_query_mapper.py - Reference-based question adjustment

Typical Pipeline

process:
  - generate_qa_from_examples_mapper:
      hf_model: 'Qwen/Qwen2.5-7B-Instruct'
      seed_file: 'seeds.jsonl'
      similarity_threshold: 0.7
  
  - llm_difficulty_score_filter:
      api_or_hf_model: 'qwen2.5-72b-instruct'
      min_score: 0.6  # Keep moderate to high difficulty
      max_score: 1.0

Documentation includes file locations, usage examples, configuration reference, workflow diagrams, and test file mappings.

Original prompt

帮我找找MindGYM的逻辑在代码哪里?


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: JingofXin <31361630+JingofXin@users.noreply.github.com>
Copilot AI changed the title [WIP] Identify MindGYM logic within the codebase Add comprehensive documentation for MindGYM logic implementation in codebase Dec 25, 2025
Copilot AI requested a review from JingofXin December 25, 2025 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants