LLM API Cost Calculator

Paste your prompt, pick your knobs, see the exact cost across every major model. Accounts for cached-input pricing, batch-mode discounts, and context-window limits.

Your Prompt

auto from text above
expected response size
scale multiplier
for savings column
Input 1,200
Output 500
Total 1,700 tokens
× 1 call

Discounts & Mode

0%

Cache hit: OpenAI (50% off) & Anthropic (90% off) charge reduced rates for cached input tokens (prefix reuse on multi-turn or RAG). Batch: OpenAI & Anthropic both offer 50% off for non-realtime batch processing with 24h turnaround.

← Multi-model Token Counter
Model Context Input / Output per 1M Cost / Call Monthly vs Baseline

Pricing source: Official provider pages, verified April 2026. Rates are US dollars per 1,000,000 tokens. Cache hit % applies to input tokens only (OpenAI 50% off cached, Anthropic 90% off cache reads). Batch mode applies 50% off both input and output on OpenAI & Anthropic endpoints that support it. Self-hosted Llama rates are rough cloud-GPU inference estimates, not a direct API price. Rows highlighted in red exceed the model's context window and aren't usable for this prompt size. For production billing, always double-check with the provider's official calculator — volume discounts and enterprise rates are not modeled here.

About This Tool

Estimate the cost of API calls to language models based on token count. Compare pricing across models like GPT-4, Claude, and Gemini to budget your AI application costs before you build.

How to Use

  1. Enter or paste the text you plan to send as a prompt.
  2. Select the AI model to check pricing against.
  3. See the estimated cost per request and at different usage volumes.

Frequently Asked Questions

What is a token in AI language models?

A token is roughly 3/4 of a word. The word 'hamburger' is two tokens. Spaces and punctuation also count. Most models charge per 1,000 or 1 million tokens.

Which AI model is cheapest per token?

Pricing changes frequently. Generally, smaller models (GPT-4o-mini, Claude Haiku, Gemini Flash) cost 10-50x less than flagship models for simpler tasks.