Paste your prompt, pick your knobs, see the exact cost across every major model. Accounts for cached-input pricing, batch-mode discounts, and context-window limits.
Cache hit: OpenAI (50% off) & Anthropic (90% off) charge reduced rates for cached input tokens (prefix reuse on multi-turn or RAG). Batch: OpenAI & Anthropic both offer 50% off for non-realtime batch processing with 24h turnaround.
| Model | Context | Input / Output per 1M | Cost / Call | Monthly | vs Baseline |
|---|
Pricing source: Official provider pages, verified April 2026. Rates are US dollars per 1,000,000 tokens. Cache hit % applies to input tokens only (OpenAI 50% off cached, Anthropic 90% off cache reads). Batch mode applies 50% off both input and output on OpenAI & Anthropic endpoints that support it. Self-hosted Llama rates are rough cloud-GPU inference estimates, not a direct API price. Rows highlighted in red exceed the model's context window and aren't usable for this prompt size. For production billing, always double-check with the provider's official calculator — volume discounts and enterprise rates are not modeled here.
Estimate the cost of API calls to language models based on token count. Compare pricing across models like GPT-4, Claude, and Gemini to budget your AI application costs before you build.
A token is roughly 3/4 of a word. The word 'hamburger' is two tokens. Spaces and punctuation also count. Most models charge per 1,000 or 1 million tokens.
Pricing changes frequently. Generally, smaller models (GPT-4o-mini, Claude Haiku, Gemini Flash) cost 10-50x less than flagship models for simpler tasks.