AI API cost calculator · Token-level

What you'll actually pay for inference.

Token volume, model choice, latency requirements, and wrapper margin (Harvey, Glean, Hebbia) all change the bill. Fourteen probing questions. Output: side-by-side $/mo across OpenAI, Claude, Gemini, and self-hosted, plus the wrapper margin if you're going through one. The 8-page memo includes a 12-month cost projection, model-mix recommendation, and the questions to negotiate with on your next renewal.

📋  14 questions · ~3 min 🔓  No signup to see result 📩  8-page memo for your email

How this is calculated

Per-token published pricing across the four major providers is the input table. We adjust for: reasoning depth (basic chat vs Claude Opus / GPT-5 / Gemini Pro tier), context window (long-context surcharges where they apply), multi-modal (vision / audio token weights), streaming overhead, tool-calling round-trips, and wrapper margin if you're going through Harvey / Glean / Hebbia / Cresta / Sierra.

Self-hosted estimates assume Llama-class open weights on AWS Inferentia 2 / Trainium with batched inference — typical effective $/MTok ranges from $0.30 (small models, high throughput) to $4 (70B+, low throughput).

What this does NOT estimate

For a procurement-ready model-mix recommendation, see the email memo or book a call.

FAQ

How accurate is this estimate?

±25% on the headline figure. Real bills vary based on prompt patterns, batch sizes, and provider rate-limit step functions. Treat as a planning anchor.

Is the wrapper margin really 30-60%?

For Harvey AI we have public reference data showing 50%+ margin on AmLaw 200 deployments. Glean and Hebbia run similar. The memo includes the source data + how to verify against your own contract.

When does self-host pay back?

Above ~$80K/yr OpenAI/Claude bills, Inferentia 2 + Llama 3.1 70B starts beating direct API on $/MTok. Below that, direct API wins on operational simplicity.

What's in the memo?

12-month cost projection (3 scenarios), model-mix recommendation, prompt-pattern optimisations that cut spend 20-40%, vendor-negotiation questions for renewal. About 8 pages.