Red-Teaming GPT-5.1: What We Learned
Beyond the launch posts, Llama 4 is reshaping how sales teams approach incident response. We talked to the people actually using it in production.
The archive
100 stories from the Inference Daily desk.
Beyond the launch posts, Llama 4 is reshaping how sales teams approach incident response. We talked to the people actually using it in production.
The interesting story is not the demo. It is the second month, when small-model orchestration either earns its keep or gets quietly rolled back.
The interesting story is not the demo. It is the second month, when tool-first agents either earns its keep or gets quietly rolled back.
Stripe is the latest in a string of teams treating tool-first agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.
The interesting story is not the demo. It is the second month, when tool-first agents either earns its keep or gets quietly rolled back.
Atlassian is the latest in a string of teams treating long-context workflows as the default, not the experiment. Here is what they got right — and what they are still figuring out.
The interesting story is not the demo. It is the second month, when RAG-as-a-service either earns its keep or gets quietly rolled back.
Beyond the launch posts, Claude 4.5 Sonnet is reshaping how small studios approach code review. We talked to the people actually using it in production.
The interesting story is not the demo. It is the second month, when evals-first development either earns its keep or gets quietly rolled back.