Production Brief — WO-0008: Maya (Conversational KB Talent)

Stage: 1 — Intake (internal pilot, dogfooding) Owner: Pablo (Production Line Architect) Date: 2026-05-26 Decision ref: TFD-0023 Technical canon: RD-0017 (Karpathy LLM Wiki)

Client Profile (M1 — internal pilot)

Field	Value
Client	Talent Factory (internal, dogfooding)
Segment	Camille (CS triage), Riley (R&D retrieval), Oscar (CEO knowledge queries)
Domain	Factory knowledge — decisions, requests, KB learnings, HTML deliverables
Methodology	Karpathy LLM Wiki pattern (RD-0017)
Language	FR primary, EN mixed
AI platform	Claude API (Haiku retrieval, Opus/Sonnet synthesis), pluggable per `model-config-pattern`
Corpus path	`C:\Projects\talent-factory\` (filtered — see below)
Documentation platform	Same repo (markdown + HTML)
Naming	`maya-*` skills, `WO-0008` per TFD-0009

Discovery Summary

Business Context

The factory has accumulated ~150 markdown files (TFDs, requests, R&D analyses, KB) plus the intranet's 405 Astro pages and ~30 HTML EA deliverables. Camille currently does client triage by manual grep + memory. Riley re-reads the same RD analyses every R&D session. The CEO asks recurring questions whose answers are spread across 4-6 files at a time. JSM-Confluence deflection (CON-0006 Stack A) was the prior plan; TFD-0023 supersedes it with Maya.

Pain Points

No semantic retrieval over the corpus — keyword search misses bilingual rephrasings and cross-file concepts.
No connection layer — each query starts from zero; nothing compounds.
Confluence cost + format mismatch — paid per seat, can't render the factory's HTML deliverables (capsules, diagrams), weak FR/EN.
No deflection mechanism — every question becomes a CEO interrupt.

Current State

Markdown corpus: company/decisions/, departments/*/requests/, references/videos/RD-*/, KB lessons.
HTML deliverables: intranet/dist/, EA pages under client OneDrive (out of scope for Maya-Factory M1; in scope for Maya-STM M2).
Beta-portal live (project memory: beta-portal) — host for the Maya widget.
JSM live on jackson-creek-tech.atlassian.net — ticket route target for deflection.

Product Definition

Product Type

A digital talent packaged as a deployable bundle: agent definition + skills + widget + corpus config. Two deployment profiles share the same core (see TFD-0023 Action 1):

M1 — Maya-Factory: corpus = factory repo (filtered). Users = factory team. Host = beta-portal widget.
M2 — Maya-Client (STM POC): corpus = OneDrive-STM/agent-ea/. Users = STM staff via JCT portal. Bundled with EA handover #1.

Architecture (RD-0017 canonical pattern)

maya/
├── raw/                 ← read-only sources (symlink or copy from corpus_path)
├── wiki/                ← agent-maintained markdown KB (the synthesized layer)
├── CLAUDE.md            ← schema: purpose, folders, ingest workflow, formatting, QA
├── corpus.config.yaml   ← {corpus_path, filters, language, deflection_target}
├── manifest.json        ← generated index (titles, summaries, paths)
└── widget/              ← embeddable JS (Astro + standalone bundle)

Three behaviors driven by the schema:

Ingest — on add/update of raw source: extract concepts, update existing wiki pages, create new pages, link, log changes.
Query — multi-turn FR/EN; consult wiki first (not raw); cite source paths; flag uncertainty.
Lint — /lint-wiki: contradictions, orphan pages, outdated claims, concepts without page. Folded into RD-0031 toolkit-catalog bundle per TFD-0023.

Core Capabilities (from order.md, re-prioritized for M1)

#	Capability	M1 priority	Notes
1	Corpus ingestion → manifest + wiki	Must	Manifest-based, no vector DB <10k docs
2	Conversational retrieval (multi-turn FR/EN)	Must	Claude long context + manifest
3	Citation by paragraph	Must	Native Claude API citations
4	Deflection → Telegram (M1) / JSM (M2)	Must	M1 uses Telegram (existing channel); M2 uses JSM
5	Embeddable widget	Must	Astro component for beta-portal
6	Bilingual native	Must	No language toggle
7	Per-deployment corpus	Must	`corpus.config.yaml` is the only thing that changes between deployments
8	Re-indexing on commit	Should	Git hook → manifest refresh <60s
9	`/lint-wiki` skill	Should	Co-developed with RD-0031

Out of M1 scope

Vector DB, authentication (host portal handles it), analytics dashboard (JSM handles deflection rate), HTML deliverables ingestion beyond markdown extraction (M2 problem).

Scope

In: Conversational RAG with citations, deflection routing, embeddable widget, FR/EN, manifest-based indexing, wiki ingest workflow, /lint-wiki.
Out (v1): Vector DB, auth, analytics, multi-tenant (each Maya is single-corpus by design).
Compliance: Each instance reads only its configured corpus. No cross-tenant leakage. Citations always include source path.

Feasibility Assessment

Risk: Low. Pattern proven by RD-0017. Stack is factory-native (markdown + Claude Code + Astro). No new infrastructure. Order.md is fully specified (AC + DoD already written). The 4-week sequence per TFD-0023 has slack — week 4 STM POC depends only on M1 + STM corpus access (already available).

Open Questions for Stage 2

Corpus filters for M1 — which paths in talent-factory/ are in vs out? Proposal: include company/, departments/*/requests/, references/videos/, production-lines/orders/*/order.md. Exclude .claude/, node_modules/, intranet/dist/, OneDrive client folders.
Wiki location — does the wiki live in the repo (maya/wiki/ committed) or in a sibling folder? Pablo to decide based on git noise tolerance.
Re-index cadence — git hook (every commit) or scheduled (hourly)? Cheap to try both.
Widget styling — match Anthropic warm cream / Trustworthy Blue per docs-design-system?

4-Week Sequence (per TFD-0023)

Week	Stage	Owner	Output
1 (now → 2026-06-02)	Stage 1 close + Stage 2 design	Pablo + Riley	Sandbox proof (3 TFDs) + Stage 2 solution spec
2 (2026-06-03 → 09)	Stage 3 pattern selection + Stage 4 build start	Pablo	Schema CLAUDE.md frozen, ingest + query skills built
3 (2026-06-10 → 16)	Stage 4 build complete + Stage 5 QA	Pablo + Quinn	Maya-Factory live for Camille; QA cert
4 (2026-06-17 → 23)	Stage 6 deploy + Stage 7 delivery (STM POC)	Diego + Dana	Maya-STM bundled with EA handover #1

Week-1 Action List (Pablo)

Run Riley's RD-0017 sandbox (~25 min): create process/sandbox/{raw,wiki,CLAUDE.md}, ingest TFD-0019/021/022, run a cross-cutting question + /lint-wiki. Capture transcript in process/sandbox/sandbox-report.md. This is the Stage 1 acceptance gate.
Decide the 4 open questions above — drop a one-pager process/stage-1-decisions.md.
Adapt CLAUDE.md schema from Karpathy starter to factory context (FR primary, citation format machine-parseable for widget, deflection to Telegram for M1, JSM for M2).
Open Stage 2 — process/stage-2-solution-spec.md covering: manifest schema, query loop (multi-turn + citation), widget contract, corpus filter spec, deflection payload format.
Sync with Diego on RD-0031 bundle convention so /lint-wiki ships in the right channel from day 1.
Sync with Riley on what RD-0017 did not answer — write any residual unknowns as REQ-EXEC tickets, do not absorb silently into the build.

References

production-lines/orders/WO-0008-maya-conversational-kb-talent/order.md — full AC + DoD
company/decisions/TFD-0023-maya-load-bearing-infrastructure-talent.md — sequencing & ownership
references/videos/RD-0017/out/flashcard_rd-0017.md — technical pattern (canon)
references/videos/RD-0017/intrants/transcript_rd-0017.md — source material
TFD-0009 (request folder standard), TFD-0012 (R&D pipeline), TFD-0021 (toolkit-catalog Go)
Memory: maya-rag-wiki-pattern, delivery-model-foundry-not-hosted, documentation-format, model-config-pattern

Stage 1 Acceptance Gate

Stage 1 closes when:

Sandbox proof runs end-to-end on 3 TFDs and produces a non-trivial wiki + lint report
4 open questions answered in stage-1-decisions.md
Adapted CLAUDE.md schema committed under process/stage-1-claude-md-draft.md
Stage 2 solution spec opened (even empty header)

Target close date: 2026-06-02 EOD.