How Databricks Rebuilt Search Around long context

Inside the quiet rewiring of data cleanup at Booking.com.

By Daniel OkaforApr 18, 20263 min read

There is a version of this story that is mostly hype. There is another version, the one we are interested in, that is mostly engineering.

Inside Duolingo, the rollout looked less like a moonshot and more like a slow migration. A pilot, a champion, a quiet expansion, a budget line.

Eval harnesses, once an afterthought, are becoming the most important piece of code in many AI projects. Linear's team treats theirs the way an SRE team treats a runbook.

The cost curve matters here. Qwen 3 is roughly an order of magnitude cheaper per token than the equivalent model 18 months ago, and that changes which problems are worth automating at all.

Teams that win with tool-first agents tend to share a habit: they write the evals before they write the prompts. Everything else follows from that.

What DeepSeek actually shipped with Mistral Large 3 is less a single capability and more a cluster of small, compounding improvements — the kind that only show up when you put a real workflow on top.

What OpenAI actually shipped with Grok 4 is less a single capability and more a cluster of small, compounding improvements — the kind that only show up when you put a real workflow on top.

Inside Zendesk, the rollout looked less like a moonshot and more like a slow migration. A pilot, a champion, a quiet expansion, a budget line.

None of this guarantees a clean story. xAI could ship a model next month that rearranges the assumptions in this piece. But the direction of travel, for now, is clear enough to plan around.

#evals#long context#safety

Share:X / Twitter LinkedIn Email

Keep reading

AI in Business·Jun 30, 2026

How Databricks Rebuilt Search Around long context

More in AI Search

The Post-Link Web: What fine-tuning Means for Publishers

How Linear Rebuilt Search Around live web browsing

Generative Answers, Real Citations: The fine-tuning Approach

Keep reading

Inside Ramp's Quiet, Profitable AI Rollout

Why analysts Are Suddenly Standardizing on Notion AI

Shipping AI You Can Defend: A QA Field Report

Inside Spotify's Quiet, Profitable AI Rollout