Claude Managed Agents: Dreaming, Outcomes, & Multiagent Orchestration

Claude agents can now dream. Between active sessions, they review their past work, spot recurring mistakes, and rewrite their own memory so they do better next time.

It is one of four updates Anthropic shipped to Claude Managed Agents, and together they change what an AI agent can be trusted to do without a human watching.

Here is what landed.

Dreaming

This is the headline feature. Between active work sessions, an agent now reviews its past work, looks for patterns, spots recurring mistakes, and updates its own memory. Anthropic calls it dreaming because it happens between sessions, almost like the way our brains consolidate the day during sleep.

Why this matters: most AI agents today start fresh every time. They forget what worked, what failed, and what your team actually prefers. Dreaming changes that. The agent gets better at your specific work over weeks and months, not just inside a single conversation.

Dreaming is currently a research preview, so access is gated. You can request it through Anthropic.

Outcomes

You write a rubric describing what good looks like. A separate grader checks the agent's output against that rubric. If the work falls short, the agent tries again. This loop continues until the output passes the bar.

Think of it as a senior reviewer sitting next to a junior who keeps revising until the work is ready to ship.

Anthropic shared internal numbers. Outcomes improved task success rates by up to 10 percentage points compared to standard prompting. File generation saw the biggest gains, with docx quality up 8.4% and pptx quality up 10.1% in their tests.

For any business that has been frustrated by AI output that is almost good but not quite, this is a real shift. You stop reviewing every draft and start reviewing only finished work.

Multiagent Orchestration

A lead agent breaks a complex job into smaller pieces and assigns them to specialist subagents that run in parallel. Each specialist has its own model, prompt, and tools. Think of it as a project manager coordinating a small team rather than one generalist trying to do everything.

Netflix is already using this to analyze build logs at scale. Wisedocs reports their document reviews now run 50% faster.

You can trace every step in the Claude Console, so you see which subagent did what and why.

Webhooks

Less flashy, but important. Your systems get notified when an agent finishes a task. This is what makes fire and forget workflows actually work in production. You kick off a long job, walk away, and your tools ping you when the result is ready.

The bigger picture

These four updates all point in one direction. Anthropic is building agents that you can trust to run in the background, on real production work, without constant supervision. Memory stays useful over time. Quality is verifiable. Long jobs get split across specialists. Existing systems get notified when things finish.

For teams thinking about putting AI into their workflows, the practical question changes. It is no longer whether the agent can do the task. It is whether your processes are clear enough to write a rubric, whether your data is ready for a memory system that learns from it, and whether your stack can handle async results.

That is exactly the gap between an interesting demo and a production deployment, and it is where most AI projects quietly stall.

Axentia