Your job is to improve one figure: a single, wide visual for a Substack post that explains the worker-critic pattern in one glance.
Raise the score reported by prepare.py.
The current evaluation averages four 0-10 criteria:
semantic_fidelityone_glance_clarityreadability_layoutvisual_coherence
The score is computed by the script, not by you. A run is only considered accepted if:
- the average score is at least
8.5, and - every individual criterion is at least
8.0.
plot.pyis the only file you should modify.prepare.pyis the fixed evaluation harness. Do not edit it.artifacts/autoresearch/current/figure.svgandfigure.pngare the outputs ofplot.py.artifacts/autoresearch/current/review.jsonandreview.mdare the outputs ofprepare.py.
The figure should make these ideas visible:
- a worker produces drafts;
- a critic reviews those drafts;
- feedback flows back into revision;
- the worker and critic are persistent roles, not one-shot calls;
- the loop stops only when the result is approved.
The ideal figure is:
- understandable in about five seconds;
- legible at publication scale;
- visually coherent, not cluttered;
- explicit about the direction of information flow.
- Edit
plot.py. - Run
uv run python plot.py. - Run
uv run python prepare.py. - Read
review.mdandreview.json. - Keep changes only if the score improves, or if the average ties but the weakest criterion improves.
- Do not change the task.
- Do not change the scoring rubric.
- Do not add extra files unless they are outputs inside
artifacts/autoresearch/current/. - Prefer clearer structure over more decoration.
- Prefer fewer words over denser explanations.