Most teams treat LLM costs as an optimization problem:

  • reduce tokens
  • pick a cheaper model
  • cache more
  • compress prompts

Those are useful. They also miss a more important reality:

In agentic AI, cost is a security boundary.

When an agent can spend money autonomously—by looping, retrying, escalating models, or calling tools—cost becomes a form of privilege. And privilege must be governed.

This post explains why runaway spend is not “just finance,” how cost failures happen in production, and what controls actually work.


Why cost belongs in the security conversation

In traditional systems, “security incidents” often mean:

  • data exposure
  • unauthorized access
  • integrity compromise
  • service disruption

Agentic AI introduces another category:

Unbounded resource consumption that happens fast, silently, and sometimes adversarially.

When an agent is allowed to:

  • call external APIs,
  • use paid tools,
  • run multi-step workflows,
  • or self-retry without strict bounds,

then spend becomes an attack surface.

Even without a malicious actor, agents can create financial impact through:

  • misunderstanding intent,
  • failing to converge,
  • or interacting poorly with flaky tools.

That is not “optimization.” That is blast radius.


The uncomfortable truth: cost failures scale with autonomy

A single LLM call that is too expensive is annoying.

An agentic workflow is different:

  • it plans,
  • it calls tools,
  • it ingests responses,
  • it replans,
  • it retries,
  • it escalates.

That means cost can grow as a function of:

  • steps
  • retries
  • tool fan-out
  • context growth
  • model tier escalation

Autonomy multiplies spend.

If you don’t put hard limits on that multiplication, you don’t have a budget—you have a hope.


How runaway spend actually happens (real patterns)

1) Retry storms (the “it almost worked” loop)

Tool calls fail intermittently. The agent retries. The agent retries again. Then it retries with more context. Then it retries with a bigger model.

From the logs it looks like “resilience.” From the invoice it looks like an incident.

Containment controls:

  • retry ceilings
  • exponential backoff + hard stop
  • per-request “max attempts” policies
  • circuit breakers on tool failure rates

2) Tool fan-out (the “just to be safe” explosion)

An agent is asked a question. It calls search. Then it calls search again. Then it calls three connectors. Then it summarizes everything.

This is common in:

  • RAG over multiple sources
  • “research agent” workflows
  • multi-tool orchestrators

Containment controls:

  • max tools per request
  • max external calls per step
  • max parallel tool calls
  • “read-only by default” tool modes

3) Context bloat (the “carry everything forward” tax)

Agent frameworks tend to accumulate context:

  • tool results
  • intermediate reasoning
  • retrieved documents
  • conversation history

Context grows. Token cost grows. Latency grows. Quality often does not grow proportionally.

Containment controls:

  • context windows per stage
  • summarization gates
  • truncation rules tied to sensitivity
  • “budget-aware retrieval” (retrieve less when budget is tight)

4) Model tier escalation (the “switch to the expensive model” trap)

When the agent fails to solve something, it escalates:

  • from small model → large model
  • from cheap tier → premium tier

That can be useful. But if escalation is uncontrolled, it becomes an easy exploit: a user can repeatedly trigger “hard mode.”

Containment controls:

  • model allowlists by workflow
  • “premium model only with approval”
  • per-identity tier limits
  • step-based escalation policies (only after N steps, not immediately)

5) Adversarial cost attacks (yes, they exist)

You don’t need to break data boundaries to hurt a system. You can just make it expensive.

Examples:

  • intentionally ambiguous prompts that trigger long reasoning
  • tasks that force repeated retrieval
  • prompts that induce loops (“try again with more detail”)
  • tool abuse through repeated API calls

Even in benign cases, the behavior looks similar: the system is being driven into high-cost modes.

Containment controls:

  • per-user/per-tenant budgets
  • per-request hard caps
  • anomaly detection on spend rate
  • hard stops on “non-converging” workflows

Why cost telemetry is not cost governance

Many teams believe they have “cost control” because they can see spend.

That’s the same category mistake as “logging is governance.”

  • Telemetry answers: what did we spend?
  • Governance answers: what are we allowed to spend?

If your system detects a runaway loop after 15 minutes, that’s not control. That’s a postmortem.


The one rule to internalize

Every agentic workflow needs a budget contract.

A budget contract is a hard set of constraints such as:

  • maximum tokens
  • maximum steps
  • maximum retries
  • maximum tool calls
  • maximum model tier
  • maximum external spend

And critically:

  • what happens when the budget is exceeded

The only acceptable answers are deterministic:

  • stop
  • degrade
  • require approval
  • fall back to a cheaper model
  • reduce tool scope
  • return partial results safely

“Keep trying” is not a budget contract.


Budgeting is least privilege for spend

Treat budgets like you treat permissions.

Principles that work:

1) Default-deny expensive behaviors

Premium models, external tools, and long-running workflows should be off by default.

2) Scope budgets to identity + context

Budgets should vary by:

  • environment (prod vs staging)
  • workflow (support vs finance)
  • user tier (internal vs external)
  • execution identity (agent type, service, tenant)

3) Enforce budgets at decision boundaries

Budget checks must happen:

  • before tool calls
  • before retries
  • before model escalation
  • before output egress (if it triggers external sends)

If you check budget only at the end, you’re too late.

4) Emit reason-coded budget decisions

If you stop an agent, you need to explain why:

  • BUDGET_EXCEEDED
  • MAX_TOOL_CALLS_REACHED
  • MODEL_TIER_DENIED
  • RETRY_LIMIT_REACHED

This turns budget enforcement into something operators can trust.


A simple one-screen diagram: cost as a boundary

Agent request
  → plan
  → choose tool/model
  → [BUDGET GATE]  (step count, retries, tool calls, model tier, tokens)
     ├─ allow  → execute
     ├─ degrade → cheaper model / fewer tools / smaller context
     ├─ approve → require explicit approval
     └─ stop   → return safe partial result + reason code

Audit: emit decision event with cost context + reason codes

What to do next (practical rollout)

If you want “cost governance” without breaking developer velocity:

  1. Start in log-only mode: emit cost estimates + budget events without stopping anything
  2. Add hard caps on the worst offenders: retries, steps, tool fan-out
  3. Introduce model tier policies by workflow
  4. Turn on degrade paths (fallback models, reduced retrieval)
  5. Move high-risk workflows to enforced mode

This approach prevents surprise outages and keeps teams productive.


  • Cost metadata: https://docs.tealtiger.ai/audit/cost-metadata
  • Logging behavior: https://docs.tealtiger.ai/audit/logging-behavior
  • Policy modes: https://docs.tealtiger.ai/concepts/policy-modes
  • Audit and redaction: https://docs.tealtiger.ai/concepts/audit-and-redaction
  • Security vs governance: https://docs.tealtiger.ai/concepts/security-vs-governance

Closing thought

A budget overrun is not “just a finance problem.” In agentic systems, it’s often the first visible symptom of:

  • non-converging agents,
  • uncontrolled tool access,
  • missing decision boundaries,
  • and absent enforcement.

Cost is not a metric.

Cost is a boundary.