feat(recall): 配置化 RRF rank constant#279
Merged
Merged
Conversation
Fang0415
added a commit
that referenced
this pull request
Jul 4, 2026
dev 分支自 PR #279 合并起,migrations/versions/ 下就存在两个文件都声明 revision="0031"(0031_20260701_dataset_recall_rrf_k.py 与 0031_20260702_provider_icon_fields.py,均以 down_revision="0030" 分叉), 导致 alembic 出现两个并列 head,`alembic upgrade head` 报 "Multiple head revisions are present"——这个 CI 检查从 #279 起就一直是 失败状态,PR #289 合入 dev 时也踩了同一个坑,只是没有阻断合并。 本次修复: - provider_icon_fields 顺延为 0032(down_revision 指向 0031 的 rrf_k)。 - chunk_type_contract 从 0032 顺延为 0033。 - 同步更新 scripts/db/init.sql 里两处过期的迁移号引用。 验证:起一个隔离的临时 mysql:8.0 容器(避免污染本地开发库),完整复现 CI 的步骤序列(导入 migrations/db.sql baseline → alembic stamp 0001 → alembic upgrade head ×2 验证幂等 → alembic current),0001→0033 全部 迁移顺序应用成功,无报错;alembic heads 现在只有单一 0033 (head)。 Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
实现 Linear LINK-218:将 RRF rank constant 从代码默认值提升为显式配置项,支持系统级
RECALL_RRF_K与数据集级recall_config.rrf_k,默认仍保持60。主要改动
RECALL_RRF_K=60,并校验为正整数。RecallConfig新增数据集级rrf_k,通过RecallRequest.rrf_k_override覆盖 pipeline 装配期默认。RecallPipeline执行时按「数据集级 override > 系统级默认」选择有效rrf_k,RRF 使用1 / (rrf_k + rank)计算贡献。weighted_score策略继续忽略rrf_k,原有权重融合语义不变。rrf_k,保持策略参数仅由服务端配置控制。数据库迁移
0031_20260701_dataset_recall_rrf_k.py。dataset_parse_config.recall_configJSON 列 COMMENT,从 14 项更新为 15 项。scripts/db/init.sql当前完整结构快照。配置与文档同步
.env.example。docs/ops/configure.md。docs/api/schemas/mysql.md。测试
已通过:
补充说明:
/Users/kawauso/Documents/Projects/LinkRag/.venv/bin/python -m mypy src/config.py src/application/recall_pipeline_provider.py src/core/dataset_config/models.py src/core/pipeline/recall/models.py src/core/pipeline/recall/pipeline.py # Success: no issues found in 5 source files全量检查现状:
mypy src仍有仓库既有类型问题,当前输出为 279 errors / 66 files,主要集中在 parser、MQ、SQLAlchemy model typing 等既有区域。pytest tests/unit -q当前为 813 passed, 4 failed;失败集中在未触及的test_rag_stream_resilience.pyproducer 参数签名和test_sparse_indexing_pipeline.pymock 签名。已知风险
rrf_k是 JSON 配置字段,数据库不做 JSON schema 约束;非法值由 PythonRecallConfig/RecallPipeline校验并向上抛出。recall_config。