-
Notifications
You must be signed in to change notification settings - Fork 20
Description
操作系统及版本
ubuntu VERSION="22.04.5 LTS (Jammy Jellyfish)"
安装工具的python环境
docker容器中的python环境
python版本
3.11
AISBench工具版本
3月18日下载的最新版本
AISBench执行命令
ais_bench --work-dir /workspace/test --models vllm_api_general_chat --custom-dataset-path /home/dataset/customdataset/cus_test_mcq.jsonl --num-prompts 10 --dump-eval-details
模型配置文件或自定义配置文件内容
仅自定义了数据集,为文档中的样例:
{"question": "165+833+650+615=", "A": "2258", "B": "2263", "C": "2281", "answer": "B"}
{"question": "368+959+918+653+978=", "A": "3876", "B": "3878", "C": "3880", "answer": "A"}
{"question": "776+208+589+882+571+996+515+726=", "A": "5213", "B": "5263", "C": "5383", "answer": "B"}
{"question": "803+862+815+100+409+758+262+169=", "A": "4098", "B": "4128", "C": "4178", "answer": "C"}
预期行为
精度测试应该为100%
实际行为
精度结果只有25%,且在模型预测结果正确的情况下,提取的结果错误
{
"example_abbr": "cus_test_mcq_test_0",
"pred": [
"To solve the problem:\n\n$$\n165 + 833 + 650 + 615\n$$\n\nwe will add the numbers step by step to ensure accuracy.\n\n---\n\n### Step 1: Add 165 and 833\n\n$$\n165 + 833 = 998\n$$\n\n---\n\n### Step 2: Add 650 to the result\n\n$$\n998 + 650 = 1648\n$$\n\n---\n\n### Step 3: Add 615 to the new total\n\n$$\n1648 + 615 = 2263\n$$\n\n---\n\n### Verification via Column Addition\n\nLet?~Ys verify the final addition using column-wise addition:\n\n\n 1648\n+ 615\n--------\n 2263\n\n\n-
Units place: 8 + 5 = 13 ?~R write 3, carry 1 \n- Tens place: 4 + 1 + 1 = 6 \n- Hundreds place: 6 + 6 = 12 ?~R write 2, carry 1 \n- Thousands place:
1 + 1 = 2 \n\nResult: 2263\n\n---\n\n### Final Answer\n\n$$\n\boxed{B}\n$$"
],
"parsed": [
"A"
],
"refr": [
"B"
],
"correct": [
false
]
}
前置检查
- 我已读懂主页文档的快速入门,无法解决问题
- 我已检索过FAQ,无重复问题
- 我已搜索过现有Issue,无重复问题
- 我已更新到最新版本,问题仍存在