Pattern: Evaluator-Optimizer Loop
Pattern: Evaluator-Optimizer Loop
Category: Reasoning Source: Internal usage (Autoresearch program), FOR-0012 (Reflection) Status: Active
When to Use
When output quality matters and a first draft is unlikely to be sufficient. The agent generates, evaluates against criteria, and iteratively improves until a quality threshold is met or a maximum iteration count is reached. Essential for creative tasks, complex reasoning, or any output that benefits from self-critique.
How It Works
- Generate a first draft or attempt
- Evaluate the output against defined quality criteria (scoring rubric, test cases, acceptance criteria)
- If the score meets the target threshold, accept the output
- If not, diagnose what failed and why
- Hypothesize a single change that would improve the score
- Apply the change, re-evaluate, and decide: keep if improved, revert if not
- Repeat until target met, max iterations reached, or plateau detected (no improvement for N iterations)
Example
The Autoresearch program in the Role Factory: a role scoring 11/15 enters the auto-improve loop. The optimizer diagnoses which eval cases failed, hypothesizes one change (e.g., sharpen the agent description), modifies one file, re-runs all eval cases, and keeps or reverts. It loops until the score hits 13/15 or 10 iterations pass.
Tradeoffs
| Pro | Con |
|---|---|
| Systematically improves output quality | Cost scales with iteration count |
| One change per iteration makes debugging easy | Can plateau without reaching target |
| Revert-on-failure prevents quality regression | Requires well-defined evaluation criteria |
| Works autonomously without human intervention | Diminishing returns after several iterations |
Factory Usage
- Role Factory auto-improve stage (
workflows/role-factory/autoresearch-program.md): The core self-improvement loop for roles that score between 10-12/15. - QA re-validation: Quinn re-runs quality checks after modifications, enabling iterative quality improvement.