Research Log — ai-layer-kit/agents/explorer.md

Research Log — ai-layer-kit/agents/explorer.md

Target: production-lines/digital-talent/templates/ai-layer-kit/agents/explorer.md Started: 2026-05-26


Iteration #0 — 2026-05-26 (baseline)

Score: 10/12

Failing criteria

  • Q1 — Frontmatter description started with a noun phrase ("Read-only code/content exploration.") instead of an action verb, weakening trigger-matching for the parent agent.
  • Q11 — Body intro restated "You never write, edit, or delete." That theme was already covered by the description AND by Rule 1 (which names Write/Edit/NotebookEdit specifically). One layer was redundant.

Iteration #1 — 2026-05-26

Score: 12/12 (was 10/12, +2) Status: Kept

Changes applied

  1. Q1 fix — Rewrote the description opener from "Read-only code/content exploration." to "Explore code or content read-only." Verb-first; same information; slightly more agent-actionable when the parent scans descriptions.
  2. Q11 fix — Removed the sentence "You never write, edit, or delete." from the body intro. The no-write guarantee now lives in (a) the description, (b) the tools: list omission, and (c) Rule 1 which names the forbidden tools explicitly. Three layers, no redundancy.

Criteria results

# Question (short) Before After Delta
1 Description verb-first NO YES +1
2 "Use when" trigger YES YES
3 Non-action stated YES YES
4 Tools no-write YES YES
5 Body names Write/Edit by name YES YES
6 Bash allow/forbid defined YES YES
7 Rules numbered YES YES
8 "When NOT to call" present YES YES
9 Output format literal YES YES
10 < 50 content lines YES YES
11 No body↔description duplication NO YES +1
12 Numeric threshold / example YES YES

Observations

  • Defense-in-depth ≠ duplication. The criteria correctly distinguished intentional belt-and-suspenders (Rule 1 naming the forbidden tools, complementing the tools: omission) from lazy restating (body intro repeating the description's no-write theme). Worth keeping that distinction in future agent-definition reviews.
  • Verb-first descriptions look trivial but compound: the parent agent's subagent-selection heuristic seems to weight the leading token. Default this for all future agent definitions in the kit.

Next direction

12/12. Possible criteria extensions if we want to keep pushing:

  • "Is the output format compatible with downstream parsing (e.g., distinguishable Markdown sections)?"
  • "Does the file declare the model choice's rationale (model: sonnet — why not haiku?)?" Neither would change the current file; both would extend the eval.