Context engineering for platform teams means deciding, by default, what goes into an AI coding assistant’s context window — instead of leaving it to whatever each developer happened to configure. Get this right and Claude Code or Copilot spend usually drops by half or more, without touching output quality.
Most teams find out they have a context problem the same way: someone in finance asks why the AI tooling bill tripled, and nobody can point to a cause. It’s never one runaway prompt. It’s a hundred small defaults nobody owns — an MCP server someone enabled six months ago and forgot about, a CLAUDE.md file that’s quietly grown to four thousand words, a habit of never clearing a session. Individually invisible. Added up, it’s real money.
That’s a platform problem, not a developer problem. Nobody expects each engineer to write their own CI pipeline from scratch. Nobody should be writing their own context strategy from scratch either.
Why this is a cost problem, not just a quality one
A context window is everything the model holds for one request: instructions, tool definitions, files, conversation history. Anthropic’s engineering team put it well — context is finite, and the job is curating the smallest set of high-signal tokens that gets the behavior you want, not maximizing what gets loaded in.
Two things make that expensive at scale.
Nothing falls out of the bill. A file read on turn three of a session gets re-sent, and re-billed, on turn fifty. Caching helps, but it expires fast — a stop-and-start session costs more than a focused one, and that pattern is set by defaults, not willpower.
Quality also drops before the window is even full. Long-input research has a name for this now — “context rot.” A session at 80% capacity isn’t as reliable as one at 30%, even though both are technically fine. Nobody notices until the assistant’s been quietly worse for weeks.
The numbers back it up. Enterprise Claude Code usage averages around $13 per developer per day — but teams with decent context habits run closer to $5-15/day for the same work that costs $20-40/day without them. Across fifty engineers, that gap is a budget line whether anyone meant it to be or not.
A context engineering framework for platform teams
Same idea as a CI/CD template: ship the default, don’t write a memo asking people to behave better.
1. Default to a cheaper model
Org default should be a mid-tier model, with an easy escalation path to something bigger when a task genuinely needs deep reasoning. Cheapest lever available, and it costs nothing to implement.
2. Scope agents, don’t let them roam
Point agents at specific directories, not whole repos. Every extra file pulled in is tokens you’re paying for again on every later turn.
3. Load just-in-time, not just-in-case
A session-start hook that pulls in docs only when a task actually needs them beats pre-loading “just in case,” every time.
4. Tier your context, once, centrally
Tier 1 — always loaded, under 800 tokens: project name, purpose, core rules. Tier 2 — loaded on demand: API references, component docs. Tier 3 — never loaded, just linked. Write this down in one shared CLAUDE.md template instead of leaving it to memory.
5. Compact at 60%, not 95%
Waiting until a session is forced into compaction leaves no room for a useful summary. Set it to trigger earlier, by default — not by whoever remembers to run the command.
6. Audit MCP servers like any other dependency
This is the one most guides skip, and it’s the one platform teams are best positioned to own. One documented audit found a single cloud-provider MCP plugin eating 31% of a session’s context before anyone typed a word — almost entirely from a plugin nobody was using that session. Maintain a reviewed allowlist the same way you’d review an approved base image.
7. Split long workflows into sessions on purpose
Not as an accident of someone forgetting to clear context halfway through a multi-day task — build it into the workflow template itself.
Claude Code vs. GitHub Copilot CLI: What a Platform Team Needs to Govern
| Claude Code | GitHub Copilot CLI | |
|---|---|---|
| Check usage | /context | /context or /usage |
| Manual compaction | /compact | /compact |
| Full reset | /clear | /clear |
| Auto-compaction trigger | Reserves roughly 16-20% of the window as a buffer | Begins around 80% capacity, pauses near 95% if needed |
| MCP tool schema handling | Tool definitions can be deferred until a tool is actually used | Tool schemas register at session start; reserved output buffer alone can claim ~30% on some configurations |
| Platform team’s highest-leverage move | Maintain a central MCP allowlist, ship a default model policy | Maintain a central plugin allowlist, watch the System/Tools breakdown org-wide |
The numbers shift between releases — check GitHub’s docs on managing context in Copilot CLI before committing to them. But the pattern holds on both tools: a chunk of every session is fixed overhead with nothing to do with the actual task, and that’s the layer a platform team can own centrally.
This is the Backstage story again
Nobody pitches an internal developer platform by saying engineers asked for it. It gets built because tooling sprawl was already costing time and money, just invisibly, spread across every team — the same logic that justified Apna’s Backstage IDP.
AI coding assistants are earlier in that exact same arc. The cost is real today. It’s just split across a hundred individual usage bills instead of sitting in one place anyone can point at.
Where to start
- Ship a default CLAUDE.md and a default model policy as part of onboarding — not as optional advice.
- Maintain one reviewed MCP allowlist. Adding anything outside it requires a request.
- Set compaction to trigger around 60%, baked into whatever config the platform team distributes.
- Run a quarterly context audit, same cadence as a dependency or security audit — the same budget-and-threshold approach used for AI agent monitoring.
- Report token spend per team the way you’d report compute spend. Visibility first, restrictions only where the data actually justifies it.
Worth adding to the stack
- Caveman (
JuliusBrussee/caveman) — trims a large share of output tokens, keeps the technical accuracy. - mcpick or similar — toggle on only the MCP servers a session actually needs.
- Built-in breakdown commands (
/context,/doctor,/costin Claude Code;/context,/usagein Copilot CLI) — see where tokens are actually going before standardizing anything.
FAQ
What does context engineering mean for a platform team, specifically?
For one developer, it’s scoping a single session well. For a platform team, it’s owning the defaults — the shared CLAUDE.md, the approved MCP list, the model policy — so the right habit is the easy habit, not something each person has to figure out alone.
Why should this sit with the platform team instead of individual developers?
Because the cost driver is mostly configuration, not behavior. Which MCP servers are on, what model is default, how big the shared instructions file is — that’s the same category of thing platform teams already own for CI/CD and base images.
What’s the fastest first move?
Audit your heaviest-usage teams and find which MCP plugin is quietly eating context. One documented case recovered over 30% of a session’s baseline overhead from fixing exactly that.
Does a 1M-token context window make this moot?
Not really. It means compacting less often, but accuracy still drops as the window fills, and ungoverned context costs the same per token no matter how big the window is.
The teams getting the most out of Claude Code and Copilot aren’t asking developers to try harder. They’ve made the cheap, boring choice the default one.