Browse docs

Core Concepts

Token costs

Token cost is driven by input and output token volume, including cached input rates when available.

How request cost is computed

At a high level: input cost + output cost = total request cost. Some models also apply lower pricing to cached input tokens.

text
uncached_input_cost = uncached_input_tokens * input_rate
cached_input_cost = cached_input_tokens * cached_input_rate
output_cost = output_tokens * output_rate
total_cost = uncached_input_cost + cached_input_cost + output_cost

Why high context usage increases spend

  • Later turns often include larger input context windows.
  • Larger input context means more input tokens billed each turn.
  • If output also grows, both sides of the cost formula rise.

Spend control

Use compact sessions, targeted attachments, subagents for exploration, and fresh sessions after milestone completion.