Ready to Work Together?
Let's discuss how our expertise can help transform your business.
Jordan Saunders
·
Jun 17, 2026
There are two clocks running in AI right now, and most finance teams are reading the wrong one.
The first clock is the price of frontier intelligence. That is the number that makes headlines. Cost per token keeps falling, every lab undercuts the last announcement, and the chart only goes one direction. If you watch this clock, AI looks like a deflationary miracle and budgeting for it looks easy.
The second clock is what your company actually spends on inference each month. For most companies shipping real AI features, that number is climbing. Not because anyone got ripped off, but because consumption is growing faster than prices are falling. Cheaper tokens do not mean smaller bills. Cheaper tokens mean you do more with them, the same way cheaper compute never once lowered anyone's cloud spend.
I run a consultancy that builds AI-enabled software for mid-market companies, and the gap between those two clocks is becoming the most common source of friction between engineering and finance that we see. Engineering reads clock one and says the economics are improving. Finance reads clock two and asks why the line item doubled. They are both right, and they are not having the same conversation.
Here is the part that genuinely breaks the old mental models, and it has nothing to do with price. Traditional software has deterministic unit costs. A request hits your API, it burns a predictable slice of compute, you multiply by expected traffic, and that is your capacity plan. CFOs have run that math for decades. It works because the software does the same thing every time.
An agentic AI feature does not do the same thing every time. It decides, at runtime, how much work the job in front of it requires. One request gets answered in a single model call. The next one, which looks identical from the outside, kicks off twelve tool calls, three retries, and a context window stuffed with documents. The system is choosing its own token budget per request, and you find out what it chose when the invoice arrives.
You cannot capacity-plan a thing that picks its own appetite. The median cost per request might be a fraction of a cent while the worst requests cost ten thousand times that, and your bill lives in the tail. So the forecasting model most finance teams are using — which assumes stable unit economics and predictable scaling — is quietly wrong for this category of software. Not slightly wrong. Structurally wrong.
Now layer the maturity problem on top. Companies are investing in AI faster than they are building the discipline to run it, and I do not think most of them will close that gap before it costs them. The default behavior we see in the field is grab the biggest model, give it the whole job, and ship. Nobody routes simple tasks to small models. Nobody caches. Nobody caps the loops.
Nothing is wrong, exactly. The feature works. It just costs five or ten times what it should, and as model capabilities rise, the jobs people hand to them get bigger, so the overconsumption compounds. Multiply that across every feature a company ships over the next three years and it stops being an engineering footnote and starts being a P&L problem.
The cloud already taught this lesson once. In the early years, cheap elastic compute produced a generation of six-figure surprise bills, and an entire discipline called FinOps had to be invented to clean up the mess. We are rerunning that movie, except faster, because a VM never decided on its own to spin up twelve friends.
The good news is that the gap between what these systems cost by default and what they cost when run with discipline is enormous, which means the savings are sitting right there. The teams that capture them are not doing anything exotic.
That architecture is its own topic, and it is the next article.
The price of intelligence is falling. The cost of using it carelessly is not.
Author at NextLink Labs
The question is no longer 'should we adopt generative AI?' It's two portfolio decisions most engineering leaders aren't yet structured to answer. A 2026 framework for substrate choices and lane sequencing on AWS with Claude.
Aru Shanmugam
·
May 20, 2026
Ensure your applications are fast, reliable, and user-centric with our strategic guide to Application Performance Monitoring (APM). Learn how to move beyond basic metrics to drive real business value, optimize performance, and enhance digital experiences.
Jordan Saunders
·
Mar 13, 2026
Learn how to drive successful technology adoption in your organization with a proven 3-step framework, focusing on alignment, planning, and execution.
Jordan Saunders
·
Jun 16, 2025
Let's discuss how our expertise can help transform your business.