The MCP shift: tools matter more than your model

Every AI conversation in 2026 still opens with the same question: "which model are you using?" It's the wrong first question. In the agentic systems we've shipped over the last year — internal copilots, document pipelines, ops assistants — swapping Opus for Sonnet, or Claude for a competitor, moves the needle by single-digit percentages. Swapping the tool surface the model can reach moves it by an order of magnitude.

The Model Context Protocol (MCP) is the quiet reason this is finally visible. Before MCP, every team built bespoke glue: a function-calling schema here, a custom adapter there, a fragile JSON contract that broke the moment the model provider changed a field. MCP gave us a common socket. Once tools became portable, the comparison stopped being "which model is smartest" and started being "whose toolbox is sharper."

The model is a commodity. The toolbox isn't.

A frontier model in 2026 is a remarkable but interchangeable component. Five labs ship models within a few points of each other on every public benchmark that matters. None of them know anything about your Jira project, your warehouse schema, your customer's tier, or the fact that your finance team closes the books on the second Wednesday of the month.

The tools you expose to the model are what encode all of that. They are the actual product surface. And almost nobody invests in them the way they invest in prompt engineering or model selection.

Five labs ship models within a few points of each other. None of them know your warehouse schema. — Internal arch review note, March 2026

What a good MCP toolbox looks like

The strongest agentic systems we've shipped share four properties that have nothing to do with the model behind them.

Small and orthogonal. A single tool that does seven things is a tool the model will misuse. Seven tools that each do one thing — lookup_invoice, apply_credit, escalate_to_human — get used correctly without prompting heroics. Resist the parameterised mega-tool.

Truthful about failure. A tool that returns "error: unknown" teaches the model nothing. "invoice 4421 cannot be voided because it was settled on 2026-04-12; use issue_refund instead" teaches the model exactly what to do next. Error messages are part of the prompt — review them like product UX.

Observable end-to-end. Every MCP call gets a trace ID that joins the agent transcript, the tool input, the downstream system call, and the human-visible outcome. When something goes wrong six steps into a run, you want one query, not six log dashboards. Unglamorous, highest-leverage.

Survives model upgrades. The point of MCP is that you can swap the model under the same tool surface. If your tools depend on quirks of one provider's function-calling format, you've thrown that benefit away. We test every tool against at least two model families before shipping.

Four properties of a strong MCP toolbox — Fig. 1 — The four properties we now grade every tool surface against before shipping an agentic system.

The shift in where projects fail

A year ago, a stuck AI project usually meant a stuck prompt. The team would iterate on system prompts for weeks, tune temperature, try a different model, swap to chain-of-thought. Today, when a project is stuck, nine times out of ten the model is doing the best it can with a bad toolbox. The tools are too coarse, or too leaky, or return data the model can't use, or are missing entirely for the step where the agent fails.

Diagnosing this is harder than diagnosing a prompt problem. Prompts are visible. Tool design is structural. But the payoff is real: we've taken pilots that were "the model isn't smart enough" and shipped them in production by leaving the model untouched and rebuilding the tool layer.

A heuristic we use in arch reviews. If swapping the model produces a bigger change than swapping any single tool, the tools are probably underbuilt. The opposite — model swaps barely move the needle, a tool change reshapes the run — is the signal you've built the surface right.

What this means for teams building now

Stop opening planning meetings with the model question. Open them with: what does the agent need to do in the world, and what is the smallest set of well-named, well-scoped, well-instrumented tools that lets it do that? The model picks itself once that's clear.

Invest in your tool surface the way you'd invest in a public API. Version it. Document it. Treat its errors as product copy. Trace every call. This is the work that compounds — your tools outlive any specific model you wire them to, and the team that has the best toolbox in your domain has a moat that doesn't evaporate when the next model release drops.

The MCP shift is not really about a protocol. It's about admitting that the interesting engineering in agentic AI was never inside the model — it was always in the seams between the model and your business. MCP just made the seams visible. The teams that take that hint will look, in eighteen months, like they're working in a different field from the teams still A/B-testing model IDs.

Graffitecs is a small, loud studio in Lahore, Pakistan. Est. 2017. We build software, occasionally ship products of our own (Vero, DocEngine), and write things down when we have something to say.

The MCP shift: why your team's tools matter more than your model.

The model is a commodity. The toolbox isn't.

What a good MCP toolbox looks like

The shift in where projects fail

What this means for teams building now

More on agentic systems

What clients ask for vs. what they actually need.

Agentic AI in the enterprise: what shipped, what didn't.

LLM evals in CI: stop shipping AI features on vibes.

Get the long-reads in your inbox.