Core Concepts
Token costs
Token cost is driven by input and output token volume, including cached input rates when available.
How request cost is computed
At a high level: input cost + output cost = total request cost. Some models also apply lower pricing to cached input tokens.
text
uncached_input_cost = uncached_input_tokens * input_rate
cached_input_cost = cached_input_tokens * cached_input_rate
output_cost = output_tokens * output_rate
total_cost = uncached_input_cost + cached_input_cost + output_costWhy high context usage increases spend
- Later turns often include larger input context windows.
- Larger input context means more input tokens billed each turn.
- If output also grows, both sides of the cost formula rise.
Spend control
Use compact sessions, targeted attachments, subagents for exploration, and fresh sessions after milestone completion.