Cost Management

Octo Router provides a transparent budgeting and usage tracking system to prevent unexpected costs and ensure predictable LLM spending.

Provider Budgets

Budgets are currently enforced at the provider level. When a provider exceeds its defined limit, Octo Router's "Bouncer" layer (the pipeline) automatically skips that provider and moves to the next candidate in the fallback chain.

Configuration

Define budgets in the limits.providers section of your config.yaml:

limits:
  providers:
    openai:
      budget: 10.00  # Stop OpenAI requests after $10 cumulative cost
    anthropic:
      budget: 25.00
    gemini:
      budget: 50.00

[!NOTE] Budgets are cumulative since the last "Reset". By default, limits are reset when the process restarts unless using Redis.

Storage Backends

Octo Router supports two ways to track and store usage data:

Backend	Scope	Persistence	Recommended for
In-Memory	Single Instance	Lost on Reset	Local development, single-binary usage.
Redis	Multi-Instance	Persistent	Production, high-availability deployments.

To use Redis, configure the redis section:

redis:
  addr: "localhost:6379"
  db: 0

Tracking Costs

Every successful API response includes a usage block and a total cost_usd field, following the OpenAI metadata standard.

{
  "message": "Hello! I can help with that.",
  "role": "assistant",
  "provider": "openai",
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 350,
    "total_tokens": 500
  },
  "cost_usd": 0.00750
}

Response Headers

Octo Router also exposes the cost of the request in the X-Request-Cost HTTP header for easy monitoring without parsing the JSON body.

Resetting Budgets

If you are using the Admin API, you can reset usage for a specific provider to resume traffic:

curl -X POST "http://localhost:8080/admin/budgets/reset?provider=openai" \
     -H "Authorization: Bearer <your-admin-key>"