R&D Scan Digest

Source: departments/executive/rd-analyst-riley/scans/2026-05-11-scan.md

R&D Scan Digest

Date: 2026-05-11 Analyst: R&D Analyst (Riley) Coverage window: 2026-03-30 → 2026-05-11 (since prior scan)

Context

Last scan was 2026-03-29 — Computer Use, Claude Dispatch, and MCP Channels were promoted to Evaluate. Since then Anthropic ran the Code w/ Claude developer event (May 6) and shipped a major April-16 wave with Opus 4.7 GA. The factory itself is already running on Opus 4.7 (1M context), so this scan focuses on the surrounding ecosystem shifts.

Discoveries

Evaluate

# Discovery Type Why It Matters
1 Claude Managed Agents (public beta) Platform Cloud-hosted agents at scale: sandboxing, long-running sessions, scoped permissions, tracing, webhooks, multi-agent orchestration. $0.08/runtime-hour + token costs. Could become the delivery substrate for digital talents we ship to SMBs — instead of installing Claude Code on each client's machine, we host the talent. Directly affects our deployment model. Source
2 MCP Tool Search (lazy loading) Feature Auto-activates when MCP tool defs exceed 10% of context. Cuts ~95% of MCP context cost (12K→600 tokens on 3 servers; 77K→8.7K on 50 tools). Accuracy on Opus 4 jumped 49→74% on MCP evals. Factory uses Atlassian, Telegram, Figma, Context7, Gmail MCPs — immediate context budget win. Source
3 Dreaming (Managed Agents research preview) Feature Scheduled background process that reviews past agent sessions, extracts recurring patterns, curates memory stores. Harvey saw 6× task-completion improvement. Direct parallel to our /method-improve and /ci loops — Anthropic now provides this as infrastructure. Worth comparing to our auto-memory system. Source
4 Task Budgets (Opus 4.7) Feature Model receives a token-target budget for the full agentic loop and prioritizes work as the countdown runs. Relevant for /role-factory, /auto-research, and any long-horizon production-line stage where we currently lose runs to context exhaustion. Source
5 Plugin .zip + --plugin-url loading Feature --plugin-dir accepts .zip, --plugin-url fetches a plugin archive for the session. This is the distribution mechanic for shipping digital talents — package the talent as a zip, host it, client runs claude --plugin-url .... Pairs with watch item #7 (official plugin marketplace). Source
6 Auto-mode hard deny rules Feature New settings layer that blocks specific actions unconditionally in auto mode. Critical guardrail for the "autonomous execution" pattern (memory: autonomous-execution). Also opens safer client-side deployment. Source
7 Claude Cowork GA + enterprise features Platform RBAC, group spend limits, expanded usage analytics, OpenTelemetry, Zoom MCP, per-tool connector controls. Relevant when factory or a delivered talent serves multiple seats at a client. Maps directly to STM's enterprise requirements. Source
8 Multi-agent orchestration in Managed Agents (public beta) Feature Promoted from Watch (Agent Teams, 2026-04-28). Re-eval triggered. Same primitive but now positioned as a hosted product, not just an experimental flag. Even though user prefers subagents internally, this is the supply chain for delivering coordinated digital talents to clients. Source

Watch

# Discovery Type Re-eval Date Note
1 High-resolution vision (Opus 4.7) Feature 2026-06-10 2576px / 3.75MP images supported (up from 1568px / 1.15MP). Useful for DXF/floor-plan extraction (CON-0004) and screenshot-driven UX work. Test next time a vision task fails on resolution. Source
2 Filesystem memory improvements (Opus 4.7) Feature 2026-06-10 Model is "better at writing and using file-system-based memory." Our auto-memory system should benefit passively; revisit if we want to evolve memory structure.
3 Hooks see effort level ($CLAUDE_EFFORT) Feature 2026-06-10 Hooks can branch on effort level. Minor, but enables effort-aware guardrails (e.g., skip expensive checks in effort low).
4 worktree.baseRef config Feature 2026-06-10 Choose remote-default vs. local HEAD for new worktrees. Quality-of-life for our production-line worktree pattern.
5 Plugin marketplace explosion (4,200+ skills, 770+ MCPs) Market 2026-06-10 Several third-party catalogs now exist (claudemarketplaces.com, SkillsMP, tonsofskills.com). Volume signal — worth scanning for high-quality skills before building our own.

Skip

# Discovery Type Reason Skipped
1 /buddy (April 1 release) Feature April Fools cosmetic.
2 Higgsfield MCP, Meta Ads MCP, Google Ads MCP, Klaviyo MCP, Shopify AI Toolkit MCP Vertical marketing/commerce — outside factory scope. Re-evaluate only if a client needs them.
3 Memory leak fixes, CJK history fix, iTerm2 /copy, MCP startup auto-retry, session-ID header Bugfix Stability/QoL. No action.
4 VCS exclusions for Jujutsu/Sapling Bugfix We use git only.
5 Claude Opus 4.7 model alias rename ("opus" → "default") API Affects ACP downstream apps only; no factory impact.

Watch List Re-evaluation (auto-promotion)

Today is 2026-05-11. All items 1–8 and 10 on the existing watch list had re-eval dates of 2026-04-25 or 2026-04-28 — past due. Resolution:

Old Watch # Topic Decision Rationale
1 Voice Mode Demote → drop Convenience-only, no new evidence. Stop tracking.
2 Google Colab MCP Hold (re-add) No new evidence. Re-watch with date 2026-08-11 (only relevant once we have an ML production line).
3 Worktree Sparse Checkout Drop Repo growth has not become a constraint.
4 MCP Enterprise Readiness Promote → Evaluate #7 Now embodied in Cowork GA (RBAC, OpenTelemetry). Folded into Evaluate row #7.
5 Context Engineering Drop Factory already practices this implicitly; no specific artifact to evaluate.
6 Agent Teams (experimental) Promote → Evaluate #8 Now part of Managed Agents multi-agent orchestration. Folded into Evaluate row #8.
7 Official Plugin Marketplace Promote → Evaluate #5 Combined with plugin .zip / --plugin-url distribution mechanic.
8 Effort Frontmatter for Skills Drop Available and trivial to use; if needed, set per-skill on demand — no evaluation required.
10 Hyper Agents (Meta AI) Hold (re-add) No production tooling yet. Re-watch with date 2026-08-11.

Item 9 (Computer Use, re-eval 2026-05-28) remains on hold — date not yet passed.

Radar Cross-Reference

Radar Item Current Ring Suggested Change Rationale
Claude Code (Opus 4.6) Adopt Update label to Opus 4.7 We are already running 4.7 (1M ctx) per system prompt. Radar entry is stale.
Paperclip Assess No change No new evidence.

Potential new radar entries (pending evaluation outcomes):

  • Claude Managed Agents → Assess (Platforms) — likely delivery substrate for hosted digital talents.
  • MCP Tool Search → Trial (Tools) — low-risk, immediate context budget win; can be turned on now.
  • Dreaming → Assess (Techniques) — compare against our existing memory + /method-improve loop.
  • Plugin .zip / --plugin-url distribution → Assess (Techniques) — pairs with delivery pattern.

Recommended Next Steps

In priority order — each is a /rd-evaluate candidate:

  1. /rd-evaluate "Claude Managed Agents (public beta)" — Highest priority. Could re-shape the delivery model: instead of installing Claude Code per-client, we host digital talents. Cost ($0.08/runtime-hour + tokens), security model, multi-tenancy, and how it interacts with our current OneDrive-handover pattern all need scoring.

  2. /rd-evaluate "MCP Tool Search lazy loading" — Quick win. Likely Trial-by-default decision; mostly about confirming it auto-activates correctly with our MCP stack (Atlassian, Telegram, Figma, Context7, Gmail) and measuring real factory context savings.

  3. /rd-evaluate "Dreaming for factory memory + continuous improvement" — Strategic. We already have /ci, /method-improve, and an auto-memory system. Question: does Dreaming subsume any of these, or stack on top? Worth a head-to-head.

  4. /rd-evaluate "Plugin .zip / --plugin-url distribution model" — Folds in watch item #7. Maps directly to the digital-talent packaging decision (/deploy-package) and could replace bespoke deployment scripts.

  5. /rd-evaluate "Task budgets in Opus 4.7" — Tactical. Test on /role-factory and /toolkit:auto-research to see whether explicit budgets change long-horizon behavior.

  6. /rd-evaluate "Cowork GA enterprise features for STM/multi-seat clients" — Lower urgency but unlocks the enterprise sales motion. Folds in old Watch #4 (MCP Enterprise Readiness).

Items 6 and 8 in the Evaluate table can be bundled into evaluation #1 (Managed Agents) — they are sub-capabilities of the same platform shift.