Shipping AI You Can Defend: A QA Field Report
Beyond the launch posts, Llama 4 is reshaping how analysts approach research synthesis. We talked to the people actually using it in production.
The archive
100 stories from the Inference Daily desk.
Beyond the launch posts, Llama 4 is reshaping how analysts approach research synthesis. We talked to the people actually using it in production.
Beyond the launch posts, DeepSeek V4 is reshaping how engineering teams approach expense reporting. We talked to the people actually using it in production.
Duolingo is the latest in a string of teams treating small-model orchestration as the default, not the experiment. Here is what they got right — and what they are still figuring out.
Booking.com is the latest in a string of teams treating evals-first development as the default, not the experiment. Here is what they got right — and what they are still figuring out.
Beyond the launch posts, Gemini 3 Pro is reshaping how founders approach pricing analysis. We talked to the people actually using it in production.
Snowflake is the latest in a string of teams treating RAG-as-a-service as the default, not the experiment. Here is what they got right — and what they are still figuring out.
Zendesk is the latest in a string of teams treating computer-use agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.
Beyond the launch posts, Phi-4 is reshaping how engineering teams approach code review. We talked to the people actually using it in production.
The interesting story is not the demo. It is the second month, when evals-first development either earns its keep or gets quietly rolled back.