Digital Transformation

AI Is Getting Cheaper. Your AI Bill Is Going Up. Both Are True.

Jordan Saunders · Jun 17, 2026

AI Is Getting Cheaper. Your AI Bill Is Going Up. Both Are True.

There are two clocks running in AI right now, and most finance teams are reading the wrong one.

The first clock is the price of frontier intelligence. That is the number that makes headlines. Cost per token keeps falling, every lab undercuts the last announcement, and the chart only goes one direction. If you watch this clock, AI looks like a deflationary miracle and budgeting for it looks easy.

The second clock is what your company actually spends on inference each month. For most companies shipping real AI features, that number is climbing. Not because anyone got ripped off, but because consumption is growing faster than prices are falling. Cheaper tokens do not mean smaller bills. Cheaper tokens mean you do more with them, the same way cheaper compute never once lowered anyone's cloud spend.

I run a consultancy that builds AI-enabled software for mid-market companies, and the gap between those two clocks is becoming the most common source of friction between engineering and finance that we see. Engineering reads clock one and says the economics are improving. Finance reads clock two and asks why the line item doubled. They are both right, and they are not having the same conversation.

Why the Old Mental Models Break

Here is the part that genuinely breaks the old mental models, and it has nothing to do with price. Traditional software has deterministic unit costs. A request hits your API, it burns a predictable slice of compute, you multiply by expected traffic, and that is your capacity plan. CFOs have run that math for decades. It works because the software does the same thing every time.

An agentic AI feature does not do the same thing every time. It decides, at runtime, how much work the job in front of it requires. One request gets answered in a single model call. The next one, which looks identical from the outside, kicks off twelve tool calls, three retries, and a context window stuffed with documents. The system is choosing its own token budget per request, and you find out what it chose when the invoice arrives.

You cannot capacity-plan a thing that picks its own appetite. The median cost per request might be a fraction of a cent while the worst requests cost ten thousand times that, and your bill lives in the tail. So the forecasting model most finance teams are using — which assumes stable unit economics and predictable scaling — is quietly wrong for this category of software. Not slightly wrong. Structurally wrong.

The Maturity Problem

Now layer the maturity problem on top. Companies are investing in AI faster than they are building the discipline to run it, and I do not think most of them will close that gap before it costs them. The default behavior we see in the field is grab the biggest model, give it the whole job, and ship. Nobody routes simple tasks to small models. Nobody caches. Nobody caps the loops.

Nothing is wrong, exactly. The feature works. It just costs five or ten times what it should, and as model capabilities rise, the jobs people hand to them get bigger, so the overconsumption compounds. Multiply that across every feature a company ships over the next three years and it stops being an engineering footnote and starts being a P&L problem.

The cloud already taught this lesson once. In the early years, cheap elastic compute produced a generation of six-figure surprise bills, and an entire discipline called FinOps had to be invented to clean up the mess. We are rerunning that movie, except faster, because a VM never decided on its own to spin up twelve friends.

What to Actually Do With This

If you run finance: stop asking "what is our AI spend" and start asking what each AI feature costs per unit of work it does, and which ones have unbounded cost behavior. If nobody can answer, that is your answer.

If you run engineering: your job now includes making cost a first-class metric next to latency and errors, because finance cannot govern what it cannot see, and what finance cannot see eventually gets cut.

The good news is that the gap between what these systems cost by default and what they cost when run with discipline is enormous, which means the savings are sitting right there. The teams that capture them are not doing anything exotic.

They tier their models.

They cache.

They budget per feature.

They watch the tail.

That architecture is its own topic, and it is the next article.

The price of intelligence is falling. The cost of using it carelessly is not.

Jordan Saunders

Author at NextLink Labs

More from Digital Transformation

View all insights →

AI Is Getting Cheaper. Your AI Bill Is Going Up. Both Are True.

Want help implementing this?

Why the Old Mental Models Break

The Maturity Problem

What to Actually Do With This

If you run finance: stop asking "what is our AI spend" and start asking what each AI feature costs per unit of work it does, and which ones have unbounded cost behavior. If nobody can answer, that is your answer.

If you run engineering: your job now includes making cost a first-class metric next to latency and errors, because finance cannot govern what it cannot see, and what finance cannot see eventually gets cut.

Jordan Saunders

Ready to Work Together?

AI Is Getting Cheaper. Your AI Bill Is Going Up. Both Are True.

Want help implementing this?

Why the Old Mental Models Break

The Maturity Problem

What to Actually Do With This

If you run finance: stop asking "what is our AI spend" and start asking what each AI feature costs per unit of work it does, and which ones have unbounded cost behavior. If nobody can answer, that is your answer.

If you run engineering: your job now includes making cost a first-class metric next to latency and errors, because finance cannot govern what it cannot see, and what finance cannot see eventually gets cut.

Jordan Saunders

More from Digital Transformation

Three Doors, Three Lanes: A 2026 Gen AI Adoption Playbook

Application Performance Monitoring: A Strategic Guide to APM Maturity

How to Drive Successful Technology Adoption Across Your Organization

Ready to Work Together?