Skip to content
View Alfonsobang's full-sized avatar

Block or report Alfonsobang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Alfonsobang/README.md

Hi, I'm Alfonsobang

I work on AI training data and evaluation, with a focus on LLM data quality, annotation systems, preference data, synthetic data, data governance, and financial-domain AI evaluation.

My public work is intentionally centered on resources that can be reviewed, reused, and improved without relying on private company data or proprietary workflows.

Current Focus

  • Training data quality engineering for LLM systems
  • Dataset cleaning, deduplication, inspection, and documentation
  • Annotation quality, agreement, adjudication, and reviewer calibration
  • Human preference data, RLHF / DPO data, and synthetic data evaluation
  • Financial-domain LLM benchmarks, risk-aware evaluation, and data governance

Public Projects

  • awesome-llm-training-data - A curated bilingual Awesome list for LLM training data quality, annotation, preference data, synthetic data, governance, and evaluation.

Current Public Work

Open-source Principles

  • Prefer primary sources, reproducible resources, and practical engineering value.
  • Avoid private company data, real user data, and proprietary workflows.
  • Treat financial-domain AI evaluation as a governance problem, not a leaderboard exercise.
  • Make data quality work visible through documentation, checklists, issues, and small useful contributions.

中文简介

我关注 AI 训练数据与评测工程,重点方向包括 LLM 数据质量、标注系统、偏好数据、合成数据、数据治理,以及金融领域 AI 评测。

我的公开项目会尽量使用可审查、可复用、可持续改进的公开资料,不包含私有公司数据、真实用户数据或专有工作流。

当前主要维护 Awesome LLM Training Data,并逐步沉淀 LLM 训练数据操作模型、Claw-style Agent 评测笔记、Harbor 多次运行指标示例、质量清单和金融领域评测治理文档。

Pinned Loading

  1. awesome-llm-training-data awesome-llm-training-data Public

    Curated tools, papers, datasets, and practices for LLM training data engineering.

    Python 1