I work on AI training data and evaluation, with a focus on LLM data quality, annotation systems, preference data, synthetic data, data governance, and financial-domain AI evaluation.
My public work is intentionally centered on resources that can be reviewed, reused, and improved without relying on private company data or proprietary workflows.
- Training data quality engineering for LLM systems
- Dataset cleaning, deduplication, inspection, and documentation
- Annotation quality, agreement, adjudication, and reviewer calibration
- Human preference data, RLHF / DPO data, and synthetic data evaluation
- Financial-domain LLM benchmarks, risk-aware evaluation, and data governance
- awesome-llm-training-data - A curated bilingual Awesome list for LLM training data quality, annotation, preference data, synthetic data, governance, and evaluation.
- Maintaining Awesome LLM Training Data v0.1.0, including an LLM training data operating model, Claw-style agent evaluation notes, a Harbor repeated-trial metric example, practitioner guides, and automated resource-format audits.
- Tracking upstream documentation proposals for LLM data workflows:
- huggingface/datatrove#485 - dataset-audit example using filters, rejected-sample capture, metadata, and summary stats.
- argilla-io/argilla#5861 - annotation QA workflow using guidelines, suggestions, filters, and adjudication.
- harbor-framework/harbor#1700 - Claw-style trajectory-aware evaluation pattern with repeated attempts and safety evidence.
- Prefer primary sources, reproducible resources, and practical engineering value.
- Avoid private company data, real user data, and proprietary workflows.
- Treat financial-domain AI evaluation as a governance problem, not a leaderboard exercise.
- Make data quality work visible through documentation, checklists, issues, and small useful contributions.
我关注 AI 训练数据与评测工程,重点方向包括 LLM 数据质量、标注系统、偏好数据、合成数据、数据治理,以及金融领域 AI 评测。
我的公开项目会尽量使用可审查、可复用、可持续改进的公开资料,不包含私有公司数据、真实用户数据或专有工作流。
当前主要维护 Awesome LLM Training Data,并逐步沉淀 LLM 训练数据操作模型、Claw-style Agent 评测笔记、Harbor 多次运行指标示例、质量清单和金融领域评测治理文档。