Evals Are the New Unit Tests: Notes From Ramp
The interesting story is not the demo. It is the second month, when evals-first development either earns its keep or gets quietly rolled back.
The archive
100 stories from the Inference Daily desk.
The interesting story is not the demo. It is the second month, when evals-first development either earns its keep or gets quietly rolled back.
Beyond the launch posts, Qwen 3 is reshaping how engineering teams approach customer support. We talked to the people actually using it in production.
The interesting story is not the demo. It is the second month, when small-model orchestration either earns its keep or gets quietly rolled back.
Beyond the launch posts, Gemini 3 Pro is reshaping how analysts approach pricing analysis. We talked to the people actually using it in production.
Figma is the latest in a string of teams treating computer-use agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.
The interesting story is not the demo. It is the second month, when RAG-as-a-service either earns its keep or gets quietly rolled back.
HubSpot is the latest in a string of teams treating computer-use agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.
Beyond the launch posts, Grok 4 is reshaping how sales teams approach contract review. We talked to the people actually using it in production.
Snowflake is the latest in a string of teams treating tool-first agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.