Every AI conversation in 2026 still opens with the same question: "which model are you using?" It's the wrong first question. In the agentic systems we've shipped over the last year — internal copilots, document pipelines, ops assistants — swapping Opus for Sonnet, or Claude for a competitor, moves the needle by single-digit percentages. Swapping the tool surface the model can reach moves it by an order of magnitude.
The Model Context Protocol (MCP) is the quiet reason this is finally visible. Before MCP, every team built bespoke glue: a function-calling schema here, a custom adapter there, a fragile JSON contract that broke the moment the model provider changed a field. MCP gave us a common socket. Once tools became portable, the comparison stopped being "which model is smartest" and started being "whose toolbox is sharper."
The model is a commodity. The toolbox isn't.
A frontier model in 2026 is a remarkable but interchangeable component. Five labs ship models within a few points of each other on every public benchmark that matters. None of them know anything about your Jira project, your warehouse schema, your customer's tier, or the fact that your finance team closes the books on the second Wednesday of the month.
The tools you expose to the model are what encode all of that. They are the actual product surface. And almost nobody invests in them the way they invest in prompt engineering or model selection.
Five labs ship models within a few points of each other. None of them know your warehouse schema. — Internal arch review note, March 2026
What a good MCP toolbox looks like
The strongest agentic systems we've shipped share four properties that have nothing to do with the model behind them.
lookup_invoice, apply_credit, escalate_to_human — get used correctly without prompting heroics. Resist the parameterised mega-tool.
issue_refund instead" teaches the model exactly what to do next. Error messages are part of the prompt — review them like product UX.
The shift in where projects fail
A year ago, a stuck AI project usually meant a stuck prompt. The team would iterate on system prompts for weeks, tune temperature, try a different model, swap to chain-of-thought. Today, when a project is stuck, nine times out of ten the model is doing the best it can with a bad toolbox. The tools are too coarse, or too leaky, or return data the model can't use, or are missing entirely for the step where the agent fails.
Diagnosing this is harder than diagnosing a prompt problem. Prompts are visible. Tool design is structural. But the payoff is real: we've taken pilots that were "the model isn't smart enough" and shipped them in production by leaving the model untouched and rebuilding the tool layer.
What this means for teams building now
Stop opening planning meetings with the model question. Open them with: what does the agent need to do in the world, and what is the smallest set of well-named, well-scoped, well-instrumented tools that lets it do that? The model picks itself once that's clear.
Invest in your tool surface the way you'd invest in a public API. Version it. Document it. Treat its errors as product copy. Trace every call. This is the work that compounds — your tools outlive any specific model you wire them to, and the team that has the best toolbox in your domain has a moat that doesn't evaporate when the next model release drops.
The MCP shift is not really about a protocol. It's about admitting that the interesting engineering in agentic AI was never inside the model — it was always in the seams between the model and your business. MCP just made the seams visible. The teams that take that hint will look, in eighteen months, like they're working in a different field from the teams still A/B-testing model IDs.
Graffitecs is a small, loud studio in Lahore, Pakistan. Est. 2017. We build software, occasionally ship products of our own (Vero, DocEngine), and write things down when we have something to say.



