Core Concepts
Models and capabilities
Choose models by task type, latency tolerance, context size, and modality requirements like vision.
Model selection strategy
- Use stronger models for architectural changes, complex refactors, and ambiguous tasks.
- Use faster/cheaper models for short iterations, triage, and lightweight edits.
- For visual tasks, select models that support vision capabilities.
Current curated model guidance
- Gemini 3 Flash: strong speed and helpful for design/UI iteration loops.
- Kimi K2.5 and MiniMax M2.5: strong value picks for cost-conscious throughput.
- Grok Code Fast 1: low-cost fast iteration when precision requirements are lower.
- Claude Sonnet 4.6 and GPT-5.4 remain available in the desktop app through managed local runners rather than the server runtime.
Defaults
Current server-runtime defaults are Kimi K2.5 for free accounts and Gemini 3 Flash for paid accounts.
Capabilities and costs (per 1M tokens)
- Gemini 3 Flash: input $0.5, output $3, cached input $0.05, context 1M, vision support.
- Kimi K2.5: input $0.6, output $2.5, cached input $0.1, context 128k, vision support.
- MiniMax M2.5: input $0.3, output $1.2, cached input $0.03, cache-creation input $0.38, context 205k.
- Managed local-runner models such as Claude Sonnet 4.6 and GPT-5.4 keep their own pricing metadata in the desktop app.
Cost data changes over time
Use this as practical guidance, then confirm current values in-app before making strict budget decisions.