How to keep LLM spend predictable

Meet Mary, your senior engineer

Mary is a senior engineer at your organization, a mid-size SaaS company.

Her team runs on Claude Code. Everyone has a seat and it’s part of the standard dev toolkit. Mary uses it all day: writing functions, debugging, reviewing PRs. She hits his daily limit regularly, usually around 6 pm.

A few months ago her team got a new assignment from leadership: build an SRE agent that monitors the stack, detects anomalies, and surfaces recommendations without waiting for a human to notice.

The agent is an iterative project. For weeks, Mary is prompting, testing, and refining the code, with some sessions running two or three hours. The context grows every week as the agent gets more capable.

Two months in, Mary stops hitting her Claude Code ceiling as often, but she doesn’t think much of it. The development of the new SRE agent is going well.

What Mary isn’t aware of is how the economics changed. How? Six weeks earlier, a teammate configured the team’s dev environment to route to the API directly, which made it faster to set up for the agent’s runtime calls. The switch was made and nobody changed it back. Mary’s coding sessions have been hitting the API ever since, all being billed at full price every session. Plus, there was no budget set on the API key, nor rate limits on the agent.

The invoice arrives. $15,000 for the month.

Her manager is forced to escalate and as Mary pulls the logs, she’s able to see calls but not the cost breakdown so she can’t tell which sessions drove the spike or whether it was she or the agent making the calls. She can’t tell which day the spend accelerated or what triggered it.

Surprise! Tokens aren’t free

Does the above scenario sound familiar? Or trigger some concerns?

If so, you’re not alone. Only 26% percent of companies say they have a comprehensive view of their AI costs, while 50% have some visibility and 22% report no visibility or visibility after billing, as reported by the Wall Street Journal of a yet-to-be-released survey from KPMG.

According to Steve Chase, global head of AI for KPMG, “It’s [tokens] a new resource that needs to be managed that didn’t exist quite that way, and we’re seeing exponential growth.”

More and more companies are blowing through their token and cloud computing budgets in a matter of months. Most find out they’ve overspent on LLM tokens only when the bill arrives.

Reddit posts are filled with developers – usually the biggest spenders of tokens – trying to understand their usage costs. Here’s a snapshot of some of the numbers reported by various organizations:

$15K/month for a 4-person team, mostly coding agents and internal tools
~$600/developer/month on agentic workflows with sub-agents
$500/month on mostly Opus and some Sonnet
$1,000 in 6 days on an enterprise API plan before switching back to a subscription (“the difference is crazy”)
One person spending $160K in tokens in a month

This particular Redditor observes how their AI bill “doubles every quarter and nobody owns it.”

Most organizations are blowing through token spend and receiving surprise invoices

How Barndoor could have helped Mary

Mary didn’t spend $15K on purpose, nor was it the result of poor judgement. It was simply the result of a configuration change that no one reverted. With Barndoor, the right guardrails and policies can ensure surprise invoices and overspending are a thing of the past.

Route to the right model for the work

As one customer puts it, “I don’t need Albert Einstein to tutor my middle schooler.” This thinking applies to model usage: picking the right one for the right job.

Within Barndoor LLM Gateway, a model access policy created by your admin would have routed routine agent monitoring calls to another model that’s faster and cheaper, reserving a more expensive model for the reasoning-heavy tasks. This mix of models would have cut the bill significantly.

Put a hard ceiling on team spend

Barndoor LLM Gateway enables you to set daily, weekly, or monthly budgets scoped across your organization by group, role, or user. In Mary’s case, the engineering group could set a monthly budget cap before spending reached a painful number. She and her manager could see real-time spend across the team on the dashboard, not a number in an invoice.

Put a rate limit on an agent

Mary needs her agents running, it just can’t burn through the monthly budget overnight. A rate limit on the agent’s API key would have enforced continuous monitoring calls to something more sustainable, so it can’t burn through the monthly budget before Mary gets back to her office.

Subscription vs. API pricing

One more aspect of Mary’s story is few, especially engineering teams, recognize the disconnect between subscription and API pricing.

For developers using claude code, codex, or other similar AI tools, what they may not realize is that a lot of the actual token spend is being subsidized by the LLM providers themselves. For example, a $200 / month seat may actually represent several thousand dollars worth of API rates. The daily limits Mary experienced around 5 pm were LLM providers applying a hard spend cap.

When her teammate switched the dev environment to the API, which allowed the agent to continue working and making calls, that budget cap went away, even if they continued to use the same model and make the same prompts. Everything remained the same except for the economics.

Barndoor LLM Gateway closes this gap. Sitting between your developers, users, and LLM providers, it would have allowed Mary to apply budget controls based on her team’s API keys to enforce a monthly ceiling the moment the teammate switched to the API. A model access policy would have also made sure the agent didn’t use a more expensive model if a cheaper one was just as appropriate.

Create a budget for your AI

Setting it up: how an admin configures LLM cost controls in Barndoor

The above scenario isn’t hypothetical but can be set up by an admin on the Barndoor platform. Here’s what the admin can configure.

Step 1: Add credentials

Enter the API key from your LLM provider – Anthropic, OpenAI, etc. This is the key the provider gave you to programmatically access their models. You can have multiple credentials for multiple providers.

Step 2: Create a provider

Create a named provider ( eg. “Anthropic Finance”) and attach one of the credentials you just added. You can also scope which models are available under that provider. For example, if this provider is for a finance team, you might exclude Opus models and only enable Haiku and Sonnet. You can enable, disable, or remove models at any time.

This is also where pricing lives. Barndoor imports the provider’s publicly advertised pricing by default. If your organization has a negotiated discount with Anthropic or another provider, you can override the default pricing here to reflect your actual rate. Pricing must be set for any model you want budget tracking to cover and there’s an “enforce pricing” flag that ensures no model gets used without a price attached.

Step 3: Create routes

Routes are ordered lists of models with fallback logic. You create a route, give it a name (eg. “finance”), add your primary model, and then add fallback models in priority order. If the primary model has an outage or a budget is exhausted, traffic automatically falls over to the next model in the route, then the next.

You can build multiple routes for different purposes. For example, a cheaper route for simple, high-volume tasks, another for complex work, and assign different teams or API keys to different routes.

Step 4: Set up model access policies

Before setting budgets, decide which models each team or group can access. Go to Model Access and create a policy. You can allowlist or denylist at the model, provider, or provider+model level, scoped to a group or role. This ensures teams can’t route around your budget by switching to a more expensive model.

Step 5: Create budgets

For each budget you will:

Name it – e.g. “Engineering Monthly”
Set the period – daily, weekly, or monthly
Set the scope – Organization, Group, Role, or individual User
Choose budget type – Spend (dollar amount) or Tokens, or both. Spend maps directly to actual cost rather than trying to figure out ambiguous token math
Set a budget target (optional) – This will make the budget apply just to a specific Provider, Route or Model. This enables budget routing where specific models can hit their set budget ceiling while other models are still within budget.
Set the limit – eg. $2000/month for a team, $100 for an individual
Set action on exhaust (when you’ve hit your budget ceiling) – Block Requests or Warn Only

Step 6: Set rate limits

Rate limits are separate from budgets and address consumption within a period rather than total spend. A budget protects the daily, weekly or monthly ceiling; a rate limit prevents an agent or script burning through that ceiling within a certain period of time.

Avoid sticker shock

Don’t get caught with surprise overages on AI spend. Set independent spend caps that match how teams operate, without slowing anyone down. See how to achieve this through a live Barndoor demo – reach out today.

What Security Teams Should Require From an MCP Gateway

How to keep LLM spend predictable

Meet Mary, your senior engineer

Surprise! Tokens aren’t free

How Barndoor could have helped Mary

Setting it up: how an admin configures LLM cost controls in Barndoor

Avoid sticker shock

Related Posts

What Security Teams Should Require From an MCP Gateway

What happens to your LLM bill when prompt caching fails

Better controls are needed to manage harder-to-track AI projects

The 100,000 Agent Problem: Is Your Enterprise Ready?

Product release: LLM Gateway, data loss prevention, and more MCPs

MCP Governance with data protection: Govern agent tool calls and protect sensitive data

LLM Gateway with Data Loss Prevention for Enterprise AI

Who’s building the trust layer in agentic AI?

CISA and NSA released guidance on securing AI agents. What does it mean for enterprises?

Product release: Build policy in JSON, faster role search, and more room to build

Platform

Resources

Company

Enterprise AI Tools