Pattern: Evaluator-Optimizer Loop

Category: Reasoning Source: Internal usage (Autoresearch program), FOR-0012 (Reflection) Status: Active

When to Use

When output quality matters and a first draft is unlikely to be sufficient. The agent generates, evaluates against criteria, and iteratively improves until a quality threshold is met or a maximum iteration count is reached. Essential for creative tasks, complex reasoning, or any output that benefits from self-critique.

How It Works

Generate a first draft or attempt
Evaluate the output against defined quality criteria (scoring rubric, test cases, acceptance criteria)
If the score meets the target threshold, accept the output
If not, diagnose what failed and why
Hypothesize a single change that would improve the score
Apply the change, re-evaluate, and decide: keep if improved, revert if not
Repeat until target met, max iterations reached, or plateau detected (no improvement for N iterations)

Example

The Autoresearch program in the Role Factory: a role scoring 11/15 enters the auto-improve loop. The optimizer diagnoses which eval cases failed, hypothesizes one change (e.g., sharpen the agent description), modifies one file, re-runs all eval cases, and keeps or reverts. It loops until the score hits 13/15 or 10 iterations pass.

Tradeoffs

Pro	Con
Systematically improves output quality	Cost scales with iteration count
One change per iteration makes debugging easy	Can plateau without reaching target
Revert-on-failure prevents quality regression	Requires well-defined evaluation criteria
Works autonomously without human intervention	Diminishing returns after several iterations

Factory Usage

Role Factory auto-improve stage (workflows/role-factory/autoresearch-program.md): The core self-improvement loop for roles that score between 10-12/15.
QA re-validation: Quinn re-runs quality checks after modifications, enabling iterative quality improvement.