Pattern: Evaluator-Optimizer Loop

Pattern: Evaluator-Optimizer Loop

Category: Reasoning Source: Internal usage (Autoresearch program), FOR-0012 (Reflection) Status: Active

When to Use

When output quality matters and a first draft is unlikely to be sufficient. The agent generates, evaluates against criteria, and iteratively improves until a quality threshold is met or a maximum iteration count is reached. Essential for creative tasks, complex reasoning, or any output that benefits from self-critique.

How It Works

  • Generate a first draft or attempt
  • Evaluate the output against defined quality criteria (scoring rubric, test cases, acceptance criteria)
  • If the score meets the target threshold, accept the output
  • If not, diagnose what failed and why
  • Hypothesize a single change that would improve the score
  • Apply the change, re-evaluate, and decide: keep if improved, revert if not
  • Repeat until target met, max iterations reached, or plateau detected (no improvement for N iterations)

Example

The Autoresearch program in the Role Factory: a role scoring 11/15 enters the auto-improve loop. The optimizer diagnoses which eval cases failed, hypothesizes one change (e.g., sharpen the agent description), modifies one file, re-runs all eval cases, and keeps or reverts. It loops until the score hits 13/15 or 10 iterations pass.

Tradeoffs

Pro Con
Systematically improves output quality Cost scales with iteration count
One change per iteration makes debugging easy Can plateau without reaching target
Revert-on-failure prevents quality regression Requires well-defined evaluation criteria
Works autonomously without human intervention Diminishing returns after several iterations

Factory Usage

  • Role Factory auto-improve stage (workflows/role-factory/autoresearch-program.md): The core self-improvement loop for roles that score between 10-12/15.
  • QA re-validation: Quinn re-runs quality checks after modifications, enabling iterative quality improvement.