Inside a Production Agent That Actually Ships With small-model orchestration
Beyond the launch posts, DeepSeek V4 is reshaping how researchers approach incident response. We talked to the people actually using it in production.
Section
Autonomous systems, planning, tool use, and the emerging stack for agents that act on the world.
13 stories
Beyond the launch posts, DeepSeek V4 is reshaping how researchers approach incident response. We talked to the people actually using it in production.
The interesting story is not the demo. It is the second month, when long-context workflows either earns its keep or gets quietly rolled back.
Intercom is the latest in a string of teams treating small-model orchestration as the default, not the experiment. Here is what they got right — and what they are still figuring out.
Beyond the launch posts, Command R+ 2 is reshaping how analysts approach QBR prep. We talked to the people actually using it in production.
Beyond the launch posts, DeepSeek V4 is reshaping how engineering teams approach expense reporting. We talked to the people actually using it in production.
Booking.com is the latest in a string of teams treating evals-first development as the default, not the experiment. Here is what they got right — and what they are still figuring out.
Zendesk is the latest in a string of teams treating computer-use agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.
The interesting story is not the demo. It is the second month, when evals-first development either earns its keep or gets quietly rolled back.
The interesting story is not the demo. It is the second month, when RAG-as-a-service either earns its keep or gets quietly rolled back.