Skip to content

small fixes for LLM config#1130

Merged
farook-edev merged 5 commits intosubmission-v6.0from
config-fix
Apr 22, 2026
Merged

small fixes for LLM config#1130
farook-edev merged 5 commits intosubmission-v6.0from
config-fix

Conversation

@farook-edev
Copy link
Copy Markdown
Contributor

This PR fixes the incorrect model filename for 3B and 8B benchmarks, and disables all benchmark set options by default.

@farook-edev farook-edev requested review from a team and anhappdev as code owners April 20, 2026 09:33
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@farook-edev farook-edev mentioned this pull request Apr 20, 2026
10 tasks
@Mostelk
Copy link
Copy Markdown

Mostelk commented Apr 21, 2026

@farook-edev @mohitmundhragithub @freedomtan This PR solves the IC offline issue, but LLM accuracy with IFEval still doesn't work. Error: throughput must be a finite number,: it must be related to previous PR when we reported Throughout as 10^12/mean token output time; interesting that this issue shows up only in the accuracy mode and not in performance mode.

@freedomtan
Copy link
Copy Markdown
Contributor

@farook-edev @mohitmundhragithub @freedomtan This PR solves the IC offline issue, but LLM accuracy with IFEval still doesn't work. Error: throughput must be a finite number,: it must be related to previous PR when we reported Throughout as 10^12/mean token output time; interesting that this issue shows up only in the accuracy mode and not in performance mode.

This is consistent with my tests conducted at #1098 (comment) and a previous macOS CLI test. That is, mostly he C++ part is working fine.

@farook-edev
Copy link
Copy Markdown
Contributor Author

farook-edev commented Apr 21, 2026

@freedomtan @Mostelk I ran the app and it is erring at the very end of MMLU benchmark, that's why there were no IFEval logs. It is related to the flutter code, I'm working on a fix and will push ASAP. (I suspect it happens in Accuracy mode because no Throughput numbers are getting reported)

@farook-edev
Copy link
Copy Markdown
Contributor Author

farook-edev commented Apr 21, 2026

Update: I'm still running the test, but it just passed MMLU and is progressing through IFEval.

I pushed the fix for further testing.


The problem was that token latency was coming as 0 because the value doesn't exist in loadgen's accuracy logs, my previous PR takes the reciprocal of that which is ∞.

For some reason, the app emits an error if the throughput is not a finite number, regardless of run mode. Hence the error.

@Mostelk
Copy link
Copy Markdown

Mostelk commented Apr 21, 2026

I think we can adopt 2 out of 3 commits, this 3e51658 was not reason for previous crash, so we dont need it, The error still persists

I flutter : Error: throughput must be a finite number: Infinity
I flutter : #0 new RunInfo (package:mlperfbench/benchmark/run_info.dart:19)

@farook-edev
Copy link
Copy Markdown
Contributor Author

@Mostelk I don't believe the CI artifact for the latest commit is built yet, could you please test once CI is finished, if all is well, I'll re-add the accuracy.txt part

@farook-edev
Copy link
Copy Markdown
Contributor Author

latest tflite-only CI build (before re-adding accuracy.txt) available here, others available from Action 151's artifacts

@sonarqubecloud
Copy link
Copy Markdown

@farook-edev
Copy link
Copy Markdown
Contributor Author

farook-edev commented Apr 21, 2026

@freedomtan @Mostelk @mohitmundhragithub I was able to complete the LLM-1B and LLM-1B-Instruct benchmarks on my S25 in Accuracy mode, and got results for both benchmarks with no errors or crashing on 0492023, the later commit shouldn't affect this result since it was already tested and confirmed to be working separately.

Please test the latest CI artifact and let me know if any errors occur. APK Build number is 802 and can be found here.

@freedomtan
Copy link
Copy Markdown
Contributor

freedomtan commented Apr 22, 2026

@freedomtan @Mostelk @mohitmundhragithub I was able to complete the LLM-1B and LLM-1B-Instruct benchmarks on my S25 in Accuracy mode, and got results for both benchmarks with no errors or crashing on 0492023, the later commit shouldn't affect this result since it was already tested and confirmed to be working separately.

Please test the latest CI artifact and let me know if any errors occur. APK Build number is 802 and can be found here.

Running the 802 tflite app on Pixel 10 Pro now, will report the results later.

update:

  • Yes, accuracy mode for 1B model finished running and reported expected numbers.

Copy link
Copy Markdown
Contributor

@freedomtan freedomtan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@farook-edev farook-edev merged commit 5e5472d into submission-v6.0 Apr 22, 2026
29 of 30 checks passed
@farook-edev farook-edev deleted the config-fix branch April 22, 2026 05:03
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants