The archive

All articles

100 stories from the Inference Daily desk.

AI QA·May 12, 2026

Evals Are the New Unit Tests: Notes From Ramp

The interesting story is not the demo. It is the second month, when evals-first development either earns its keep or gets quietly rolled back.

Priya Raman3 min read

AI Agents·May 12, 2026

long-context workflows: The Pattern Behind the Best Agents This Quarter

Beyond the launch posts, Qwen 3 is reshaping how engineering teams approach customer support. We talked to the people actually using it in production.

Mira Castellanos3 min read

AI in Business·May 11, 2026

When the Board Asks About AI: A Practical Answer From Ramp

The interesting story is not the demo. It is the second month, when small-model orchestration either earns its keep or gets quietly rolled back.

Jonas Halvorsen3 min read

LLMs·May 10, 2026

The Quiet Architecture Shift Behind Phi-4

Beyond the launch posts, Gemini 3 Pro is reshaping how analysts approach pricing analysis. We talked to the people actually using it in production.

Elena Brost3 min read

AI Agents·May 9, 2026

Why structured outputs Beats Bigger Models for Real-World Agents

Figma is the latest in a string of teams treating computer-use agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.

Elena Brost3 min read

AI QA·May 7, 2026

Evals Are the New Unit Tests: Notes From Snowflake

The interesting story is not the demo. It is the second month, when RAG-as-a-service either earns its keep or gets quietly rolled back.

Yuki Tanabe3 min read

LLMs·May 6, 2026

Why Llama 4 Feels Different, Even When the Numbers Don't

HubSpot is the latest in a string of teams treating computer-use agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.

Daniel Okafor3 min read

AI QA·May 6, 2026

Beyond Vibes: Measuring GPT-5.1 in Production

Beyond the launch posts, Grok 4 is reshaping how sales teams approach contract review. We talked to the people actually using it in production.

Mira Castellanos3 min read

Automation·May 6, 2026

How Booking.com Cut a onboarding From Hours to Seconds

Snowflake is the latest in a string of teams treating tool-first agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.

Yuki Tanabe3 min read