Section

LLMs

Model releases, benchmarks, scaling laws, and the architectural shifts driving large language models forward.

13 stories

LLMs·Jun 23, 2026

Cohere Trims the Stack: Mistral Large 3 and the Cost Curve

Duolingo is the latest in a string of teams treating long-context workflows as the default, not the experiment. Here is what they got right — and what they are still figuring out.

Mira Castellanos3 min read

LLMs·Jun 22, 2026

The Quiet Architecture Shift Behind GPT-5.1

The interesting story is not the demo. It is the second month, when computer-use agents either earns its keep or gets quietly rolled back.

Priya Raman3 min read

LLMs·May 19, 2026

Open Weights, Closed Margins: Reading GPT-5.1

Beyond the launch posts, Llama 4 is reshaping how engineering teams approach QBR prep. We talked to the people actually using it in production.

Jonas Halvorsen3 min read

LLMs·May 10, 2026

The Quiet Architecture Shift Behind Phi-4

Beyond the launch posts, Gemini 3 Pro is reshaping how analysts approach pricing analysis. We talked to the people actually using it in production.

Elena Brost3 min read

LLMs·May 6, 2026

Why Llama 4 Feels Different, Even When the Numbers Don't

HubSpot is the latest in a string of teams treating computer-use agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.

Daniel Okafor3 min read

LLMs·Apr 23, 2026

The Quiet Architecture Shift Behind Mistral Large 3

The interesting story is not the demo. It is the second month, when fine-tuned distillation either earns its keep or gets quietly rolled back.

Elena Brost3 min read

LLMs·Apr 7, 2026

Open Weights, Closed Margins: Reading Phi-4

Beyond the launch posts, Command R+ 2 is reshaping how operators approach incident response. We talked to the people actually using it in production.

Yuki Tanabe3 min read

LLMs·Apr 6, 2026

Inflection Trims the Stack: Claude 4.5 Sonnet and the Cost Curve

The interesting story is not the demo. It is the second month, when structured outputs either earns its keep or gets quietly rolled back.

Daniel Okafor3 min read

LLMs·Mar 19, 2026

Mixture-of-Experts, Long Context, and the Gemini 3 Pro Era

Stripe is the latest in a string of teams treating tool-first agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.

Elena Brost3 min read