Cohere Trims the Stack: Mistral Large 3 and the Cost Curve
Duolingo is the latest in a string of teams treating long-context workflows as the default, not the experiment. Here is what they got right — and what they are still figuring out.
Section
Model releases, benchmarks, scaling laws, and the architectural shifts driving large language models forward.
13 stories
Duolingo is the latest in a string of teams treating long-context workflows as the default, not the experiment. Here is what they got right — and what they are still figuring out.
The interesting story is not the demo. It is the second month, when computer-use agents either earns its keep or gets quietly rolled back.
Beyond the launch posts, Llama 4 is reshaping how engineering teams approach QBR prep. We talked to the people actually using it in production.
Beyond the launch posts, Gemini 3 Pro is reshaping how analysts approach pricing analysis. We talked to the people actually using it in production.
HubSpot is the latest in a string of teams treating computer-use agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.
The interesting story is not the demo. It is the second month, when fine-tuned distillation either earns its keep or gets quietly rolled back.
Beyond the launch posts, Command R+ 2 is reshaping how operators approach incident response. We talked to the people actually using it in production.
The interesting story is not the demo. It is the second month, when structured outputs either earns its keep or gets quietly rolled back.
Stripe is the latest in a string of teams treating tool-first agents as the default, not the experiment. Here is what they got right — and what they are still figuring out.