Inference APIs Pricing Comparison
60 providers compared by pricing model, free tiers, hosting options, and headquarters. Last updated April 2026.
41 with free tiers ยท 9 open source ยท 9 self-hostable ยท 20 European
| Provider | Pricing Model | Starting Price | Free Tier | Hosting | Open Source | HQ |
|---|---|---|---|---|---|---|
| Subscription | 49 SEK/month | ✓ | Cloud | — | ๐ธ๐ช Sweden | |
| Hourly | $1.24/hour | — | Cloud | — | ๐ธ๐ช Sweden | |
| Usage Based | Free (10 EUR credits) | ✓ | — | — | ๐ฉ๐ช Germany | |
| Pay-per-use | Pay-per-token | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $1/1M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | |
| — | — | — | — | — | ๐บ๐ธ United States | |
| Pay-per-use | ~$0.63/hr (T4 GPU) | ✓ | Cloud + Self-hosted | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.15/hr (T4 GPU) | ✓ | Cloud + Self-hosted | ✓ | ๐บ๐ธ United States | |
| — | — | ✓ | — | ✓ | ๐บ๐ธ United States | |
| Freemium | โฌ25/mo | ✓ | Cloud | — | ๐ธ๐ช Sweden | |
| Freemium | Free tier available | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | ~$1.10/hr (A10 GPU) | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Freemium | $0.011/1K neurons | ✓ | Cloud | — | ๐บ๐ธ United States | |
| — | — | ✓ | Cloud | — | — | |
| Freemium | $0.04/1M tokens | ✓ | Cloud + Self-hosted | — | ๐จ๐ฆ Canada | |
| Pay-per-use | $6.50/hr (GH200 GPU) | — | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | Pay-per-use + 5% gateway fee | ✓ | Cloud | — | ๐ฆ๐น Austria | |
| Pay-per-use | $0.02/M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.028/1M tokens (cache hit) | ✓ | Cloud | ✓ | ๐จ๐ณ China | |
| Subscription + Usage | Free (1K req/mo), 39 EUR/mo (Plus) | — | — | — | ๐ณ๐ฑ Netherlands | |
| Contact Sales | — | — | — | — | ๐ธ๐ช Sweden | |
| Pay-per-use | $0.02/megapixel | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.10/1M tokens | — | Cloud + Self-hosted | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.08/hr | ✓ | Cloud | — | ๐ฉ๐ช Germany | |
| Freemium | Free | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Freemium | $0.05/1M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.033/hr (CPU) | ✓ | Cloud | ✓ | ๐บ๐ธ United States | |
| Pay-per-use | $0.15/hr | — | Cloud | — | ๐ฌ๐ง United Kingdom | |
| Pay-per-use | Free | ✓ | — | — | ๐ฑ๐บ Luxembourg | |
| Pay-per-use | $0.02/M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Usage Based | Free (10M tokens) | ✓ | — | — | ๐ฉ๐ช Germany | |
| Pay-per-use | $0.58/GPU/hr (V100) | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Freemium | $3/300 credits | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Freemium | $0.10/1M tokens | — | Cloud + Self-hosted | ✓ | ๐ซ๐ท France | |
| Pay-per-use | $30/mo free credits | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Freemium | Free | ✓ | — | — | ๐บ๐ธ United States | |
| Pay-per-use | — | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $2.00/hr (H100) | ✓ | Cloud | — | ๐ณ๐ฑ Netherlands | |
| Pay-per-use | $0.03/M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.01/M tokens | ✓ | Cloud | — | ๐ฌ๐ง United Kingdom | |
| — | — | ✓ | — | — | ๐บ๐ธ United States | |
| Freemium | Free (open-source) | ✓ | Cloud + Self-hosted | ✓ | ๐บ๐ธ United States | |
| Pay-per-use | $0.05/1M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Freemium | Free (25+ free models) | ✓ | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.91/hr (L4 GPU) | ✓ | Cloud | — | ๐ซ๐ท France | |
| Freemium | Free | ✓ | — | — | ๐จ๐ญ Switzerland | |
| Pay-per-use | Per-second GPU billing | — | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.06/hr | — | Cloud | — | ๐บ๐ธ United States | |
| Freemium | $5 free credit | ✓ | Cloud + Self-hosted | — | ๐บ๐ธ United States | |
| Pay-per-use | โฌ0.20/M tokens | ✓ | Cloud | — | ๐ซ๐ท France | |
| Open Source | — | — | — | ✓ | — | |
| Pay-per-use | $0.0015/image | — | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | ~โฌ2.70/GPU-hr | — | Cloud | — | ๐ฉ๐ช Germany | |
| Usage Based | — | — | — | — | ๐ฎ๐ช Ireland | |
| — | — | — | — | ✓ | — | |
| Pay-per-use | Pay-per-token | — | Cloud + Self-hosted | — | ๐บ๐ธ United States | |
| Pay-per-use | ~$0.06/GPU/hr | — | Cloud | — | ๐บ๐ธ United States | |
| Pay-per-use | $0.14/hr | — | Cloud | — | ๐ซ๐ฎ Finland | |
| Free | Free (open-source) | ✓ | Self-hosted | ✓ | ๐บ๐ธ United States | |
| — | — | — | — | — | — |
Providers with free tiers
These inference apis providers offer free credits, free tiers, or open-source self-hosting options to get started without upfront costs.
Swedish GPU infrastructure and LLM hosting platform with API-first deployment...
From: 49 SEK/month
Managed API access to foundation models on AWS with built-in fine-tuning and ...
From: Pay-per-token
Claude API for building AI applications with Opus, Sonnet, and Haiku models
From: $1/1M tokens
AI inference platform for deploying and serving ML models with autoscaling an...
From: ~$0.63/hr (T4 GPU)
Open-source serverless GPU cloud with sub-second cold starts and auto-scaling
From: $0.15/hr (T4 GPU)
Show all 41 providers with free tiers
BentoML is the platform for software engineers to build AI products.
Ultra-fast inference on custom wafer-scale hardware with OpenAI-compatible API
From: Free tier available
Serverless GPU infrastructure for deploying AI models with sub-5 second cold ...
From: ~$1.10/hr (A10 GPU)
Run AI models at the edge on Cloudflare's global network with serverless infe...
From: $0.011/1K neurons
Unified AI API gateway providing access to 600+ models from OpenAI, Anthropic...
Cohereโs world-class LLMs help enterprises build powerful, secure application...
From: $0.04/1M tokens
European AI inference gateway with smart routing across EU providers
From: Pay-per-use + 5% gateway fee
Run the top AI models using a simple API, pay per use. Low cost, scalable and...
From: $0.02/M tokens
Cost-effective inference API with OpenAI-compatible endpoints and open-weight...
From: $0.028/1M tokens (cache hit)
Build the next generation of creativity with fal. Lightning fast inference.
From: $0.02/megapixel
European GPU cloud for AI training and inference powered by 100% green energy
From: $0.08/hr
Google's API for Gemini models with text, image, video, and audio capabilities
From: Free
Groq is on a mission to set the standard for GenAI inference speed, helping r...
From: $0.05/1M tokens
The open-source AI platform with 500K+ models, inference endpoints, and fine-...
From: $0.033/hr (CPU)
European sovereign AI inference with OpenAI-compatible APIs hosted in EU data...
From: Free
High-throughput inference API with OpenAI-compatible access to open-source mo...
From: $0.02/M tokens
Search APIs for embeddings, reranking, and web-to-markdown conversion
From: Free (10M tokens)
GPU cloud for AI training and inference with on-demand and cluster options
From: $0.58/GPU/hr (V100)
Multi-LLM API orchestration platform for comparing and blending AI models
From: $3/300 credits
Run generative AI models, large-scale batch jobs, job queues, and much more.
From: $30/mo free credits
Access, finetune, deploy LLMs using our affordable and scalable APIs.
OctoAI delivers production-grade GenAI solutions running on the most efficien...
Unified API gateway for 300+ AI models across 60+ providers with automatic fa...
From: Free (25+ free models)
European cloud provider with AI inference, training, and deployment services
From: $0.91/hr (L4 GPU)
Custom AI chip inference platform with purpose-built hardware for high-throug...
From: $5 free credit
High-throughput LLM inference engine with PagedAttention for efficient GPU me...
From: Free (open-source)
Frequently asked questions
What is the cheapest AI inference API?
DeepInfra consistently ranks among the cheapest per-token providers on current open-source frontier models like gpt-oss-120B, Kimi K2, and Qwen3.5. On Artificial Analysis as of April 2026, gpt-oss-120B on DeepInfra is listed at roughly $0.08 per 1M blended tokens. Pricing shifts month to month, so the comparison table above is worth checking before committing to a provider for high-volume workloads.
What is the fastest AI inference API?
It depends on whether sustained throughput or first-token latency matters more. Cerebras reports around 3000 tokens/sec on gpt-oss-120B using WSE hardware, the highest measured throughput in the category as of April 2026. Groq uses custom LPU hardware and runs the same model at ~476 tokens/sec on Artificial Analysis, with a consistently low time-to-first-token (0.6-0.9s) that matters for interactive chat. Both trade off a narrower model catalog than GPU-based providers.
Which AI inference APIs offer a free tier?
Cerebras and Groq both offer free usage with daily token limits, useful for prototyping. Most of the serverless providers (DeepInfra, Together, Fireworks, Novita) hand out free credits on signup rather than a permanent free tier. The "Free tier" filter above lists every provider with a free option.
Which inference providers are OpenAI-compatible?
DeepInfra, Together.ai, Fireworks, Novita, OpenRouter, and Groq all expose a drop-in OpenAI-compatible endpoint. Switching between them usually means changing the base URL and API key, nothing more. Replicate uses its own API format, and raw GPU providers like RunPod and Modal are not endpoints at all, they host whatever gets deployed to them.
Are there EU-hosted AI inference APIs?
Yes. Nebius, Scaleway, and Mistral La Plateforme run inference inside the EU, which matters for GDPR-sensitive workloads. Use the "European" filter above to see the full list, or visit the European providers page for hosting region details.
What is the best alternative to the OpenAI API?
For the highest sustained throughput on open-source models, Cerebras. For the lowest first-token latency, Groq. For the lowest cost per token, DeepInfra. For fine-tuning on the same platform as inference, Together.ai or Fireworks. For routing across providers from a single API, OpenRouter.
How to choose an inference API provider
The right provider depends on workload type, latency requirements, and budget. Most providers use pay-per-token pricing for LLMs and per-second GPU billing for custom models. Token-based pricing varies by model, so the cheapest provider for one model may not be cheapest for another.
Free tiers are useful for prototyping but often come with rate limits. For production, compare per-token costs for your specific model, cold start latency, rate limits, and whether the provider supports the models you need.
Teams with data residency requirements should check hosting options and provider headquarters. European providers like AiQu, Airon, AKI.IO keep data within EU jurisdiction. See the full European AI Infrastructure directory. Self-hostable options like Baseten and Beam give full control over data location.
For a deeper analysis, read AI Inference API Providers Compared on the blog. Pricing changes frequently, so verify current rates on each provider's website. Submit a correction.
See how these tools fit into a full stack
Browse all Inference APIs tools or explore the full AI Infrastructure Landscape.
Is your product missing?