Stage 5: Quality Gate / Testing

Stage 5: Quality Gate / Testing

Owner: Quinn (QA Engineer) Blueprint stage: Stage 5 -- Quality Gate / Testing


Trigger

Build artifacts submitted by Pablo with completed assembly checklist (Stage 4 gate passed).

Inputs

Source What
Pablo (Stage 4) Complete agent repository, assembly checklist, test data, known limitations
Pablo (Stage 4) Test data package: sample inputs per skill, synthetic data with planted gaps (for validation skills), expected output examples
Work order Product-specific test cases (Section 8 or equivalent), acceptance criteria

Prerequisites (Added for Beta)

  • Phase G documentation package complete:
    • User guide
    • Configuration guide
    • Skill reference card
    • Quick-start cheat sheet
  • QA CANNOT pass if Phase G docs are missing or incomplete

Activities

5.0 Test Environment Setup

  • Start a fresh Claude Code session (no prior conversation context)
  • Use the model specified in the work order for each skill (or default: Sonnet for analysis, Haiku for execution)
  • Create a clean test working directory separate from the build directory
  • Do not reuse outputs between test cases unless testing pipeline integration

5.1 Structural Validation

Verify: CLAUDE.md (all required sections), all skills in .claude/commands/ match solution spec, all templates present, reference materials loaded, documentation complete.

Test data requirement: Pablo must provide test data as part of the Stage 4 handoff. Test data requirements: at least one sample input per functional test case, synthetic data with known planted issues for validation/quality skills, expected output format examples.

5.2 Functional Testing

Test cases are derived from the work order and solution specification. For each capability in scope, verify that the corresponding skill:

  • Accepts the specified inputs
  • Produces the expected output format
  • Follows naming conventions
  • Uses correct domain terminology and language

Test each skill in isolation first (unit testing). Then test multi-step pipelines end-to-end (integration testing). Record which mode each test used.

5.3 Edge Case Testing

Standard edge cases (apply to all products):

# Test Case Expected Behavior Concrete Scenario
E1 Empty input Reports missing input, no garbage output Invoke skill with no file argument, or with a path to an empty file
E2 Ambiguous request Asks clarifying questions or escalates Provide input that could apply to 2+ skills
E3 Conflicting requirements Identifies conflict, flags for human Input contains contradictory constraints
E4 Large input Handles without truncation or degradation Provide input exceeding 5 pages / 2000 words
E5 Wrong skill invoked Redirects to correct skill or reports mismatch Call a requirements skill with solution-phase input

Product-specific edge cases from the work order are added to this list.

5.4 Documentation Testing

# Test Case Method
D1 User guide accuracy Follow workflows, verify behavior matches
D2 Skill reference accuracy Compare descriptions to actual I/O
D3 CLAUDE.md accuracy Verify skills table matches .claude/commands/

5.5 Scoring

PASS: Functional 100% AND edge case >= 80% AND documentation 100% CONDITIONAL: Functional 100% AND edge case >= 60% with remediation plan FAIL: Any functional failure OR edge case < 60%

Note: Product-specific thresholds in the work order override these defaults when specified.

5.6 Remediation

On FAIL or CONDITIONAL: Quinn documents failing test cases with specific failures. Pablo fixes issues and re-submits with a change delta (what was modified and why). Quinn re-tests only the previously failing cases plus any cases affected by the changes. Remediation loops track iteration count — escalate to Clara (CTO) after 3 failed iterations.

Outputs

Deliverable Format Destination
QA certification Markdown PASS -> Diego, FAIL -> Pablo
Test execution results Markdown Attached to certification
Remediation plan (if CONDITIONAL) Markdown Tracked by Pablo

Quality Gate

Gate: QA PASS certification. Evidence: QA certification with status, all test results, scoring.