Stage 2 — Solution Spec (WO-0008 / Maya)

Stage 2 — Solution Spec (WO-0008 / Maya)

Owner: Pablo Started: 2026-05-26 (Stage 1 close kickoff) Status: OPENED — Stage 1 closes 2026-06-02, Stage 2 active 2026-06-03 → 06-09 Inputs: order.md, TFD-0023, stage-1-decisions.md, stage-1-claude-md-draft.md, sandbox-report.md


1. Architecture overview

maya/                                 # Bundle root (ships per deployment)
├── CLAUDE.md                         # Schema (from stage-1-claude-md-draft.md)
├── corpus.config.yaml                # Per-deployment config (the only file that changes M1 → M2 → client N)
├── manifest.json                     # Generated index
├── agent/
│   └── agent.md                      # Maya agent definition (multi-turn, bilingual, citation, deflection)
├── skills/
│   ├── maya-ingest.md                # Ingest workflow (post-commit + full rebuild)
│   ├── maya-query.md                 # Query / answer loop
│   ├── maya-deflect.md               # Telegram + JSM routing
│   ├── maya-reindex.md               # Manual reindex command
│   └── lint-wiki.md                  # The 7 named lint checks (also ships standalone in toolkit-catalog)
├── widget/
│   ├── maya.astro                    # Astro component (beta-portal + JCT)
│   ├── maya.bundle.js                # Standalone bundle for non-Astro hosts
│   └── styles.css                    # CSS variables for per-deployment override
└── hooks/
    └── post-commit-ingest.sh         # Git hook installed by deployment script

External:

  • wiki/ — sibling folder, not in the bundle, generated and maintained by Maya.
  • MODEL_BASE_URL — env var for LLM endpoint (Anthropic-compatible).
  • DEFLECTION_TARGET — env var (telegram://... or jsm://...).

2. Manifest schema (manifest.json)

{
  "version": 1,
  "generated_at": "2026-05-26T10:00:00Z",
  "corpus_root": "C:/Projects/talent-factory/",
  "files": [
    {
      "path": "company/decisions/TFD-0019-ea-catalog-storage-delivery.md",
      "id": "TFD-0019",
      "title": "EA Catalog Storage & Delivery Model",
      "summary": "Per-customer Cloudflare D1 + Pages with CSV-validated lifecycle; replaces per-DAE CSV→HTML sprawl.",
      "language": "en",
      "date": "2026-05-15",
      "tags": ["decision", "infrastructure", "ea"],
      "wiki_pages": ["concepts/foundry-independence.md", "concepts/customer-data-isolation.md", "entities/cloudflare-d1-pages.md"],
      "last_ingested": "2026-05-26T09:58:00Z",
      "hash": "sha256:..."
    }
  ]
}

Manifest is the only thing Maya consults to know "what's in the corpus" at query time. Wiki is consulted for answers.

3. Query loop (multi-turn + citation)

User → Widget → Maya agent
                    │
                    ├─► Detect language (FR/EN/mixed)
                    ├─► Search wiki/INDEX.md + concept/entity titles (semantic via Claude Haiku)
                    ├─► If match: read top 2-3 wiki pages → synthesize answer (Sonnet/Opus)
                    │     └─► Return: answer + citations [{page, anchor, source_path}]
                    ├─► If no match in wiki: fall back to manifest summary search
                    │     ├─► If still no match: propose deflection
                    │     └─► If match: read source files from corpus, answer, propose adding concept page
                    └─► Persist non-trivial cross-cutting answers to wiki/answers/<slug>.md

4. Widget contract

// What the widget calls
interface MayaQueryRequest {
  message: string;
  conversation_id: string;     // for multi-turn
  language_hint?: 'fr' | 'en';
}

interface MayaQueryResponse {
  answer: string;              // markdown, language matches user input
  citations: Citation[];
  conversation_id: string;
  follow_ups?: string[];       // suggested next questions
  deflection_proposed?: {
    target: 'telegram' | 'jsm' | 'none';
    payload_preview: { title: string; body: string };
  };
}

interface Citation {
  wiki_page: string;           // 'concepts/foundry-independence.md'
  wiki_anchor?: string;        // section anchor
  source_path: string;         // 'company/decisions/TFD-0019-...md'
  display_id: string;          // 'TFD-0019 §3'
  click_url: string;           // resolved to intranet HTML if available, else raw markdown
}

5. Corpus filter spec (M1 implementation of Q1)

Stored in corpus.config.yaml:

corpus_path: C:\Projects\talent-factory
wiki_path: C:\Projects\talent-factory-maya-wiki
language_primary: fr
deflection_target: telegram://factory-ops

include:
  - company/**/*.md
  - departments/*/role.md
  - departments/*/agent.md
  - departments/**/requests/**/request.md
  - departments/**/requests/**/*-brief.md
  - departments/**/requests/**/output/closing-memo.md
  - departments/**/intake-log.md
  - departments/**/publishing-log.md
  - references/videos/RD-*/notes/*.md
  - references/videos/RD-*/out/flashcard_*.md
  - production-lines/orders/WO-PROD-*/order.md
  - production-lines/orders/WO-PROD-*/process/*-complete.md
  - production-lines/digital-talent/frameworks/**/SPEC.md
  - production-lines/digital-talent/frameworks/**/README.md
  - workflows/**/*.md

exclude:
  - .claude/
  - node_modules/
  - "**/dist/"
  - "**/_publish/"
  - intranet/dist/
  - "**/*.test.md"

6. Deflection payload format

{
  "title": "Question on rendering layer ownership for Macroscope A100",
  "body_markdown": "## Conversation\n\n**User:** ...\n**Maya:** ...\n\n## Wiki pages consulted\n- entities/francois-framework-specialist.md\n- entities/framework-library.md\n\n## Why deflected\nMaya could not produce a confident answer because: ...",
  "citations": [...],
  "language": "fr",
  "deflection_target": "telegram://factory-ops",
  "user_id": "anonymous-widget-session-<id>",
  "conversation_id": "..."
}

7. Stage 2 deliverables (target: 2026-06-09)

  • agent/agent.md — full agent definition (system prompt, behaviors, examples)
  • Manifest generator spec (Python or Node — choice TBD; aligned with model-config-pattern)
  • Skill specs (maya-ingest, maya-query, maya-deflect, maya-reindex, lint-wiki) — header + AC + I/O contract
  • Widget API contract frozen (Section 4 above is the v0; refine if needed)
  • corpus.config.yaml schema validated against 3 real factory paths
  • Deflection payload + endpoint contract for Telegram (M1) and JSM (M2 preview)
  • Test plan for Stage 5 QA (AC checklist from order.md mapped to test cases)

8. Stage 2 acceptance gate

  • All 7 deliverables above checked off
  • Pablo + Quinn (QA) review the spec end-to-end
  • Diego confirms lint-wiki skill spec fits the toolkit-catalog bundle convention (RD-0031 alignment)
  • Kai confirms lint check semantics

9. Risks (Stage 2 specific)

Risk Mitigation
Manifest generator language choice creates dependency conflict with existing factory Python/Node stack Audit existing toolchains in Stage 2 first session; default to whichever has more factory precedent (likely Python — leanix-catalog-extract is Python)
Widget styling drift across deployments CSS variables only; ship 1 reference deployment (beta-portal) as the visual ground truth
Bilingual policy edge cases (e.g., FR source quoted in EN page) Decide explicit rule in Stage 2 Day 1; current default = preserve source-language verbatim, synthesis language follows source majority