Inference APIs Pricing Comparison
49 providers compared by pricing model, free tiers, hosting options, and headquarters. Last updated March 2026.
37 with free tiers ยท 7 open source ยท 9 self-hostable ยท 14 European
| Provider | Pricing Model | Starting Price | Free Tier | Hosting | Open Source | HQ | Founded |
|---|---|---|---|---|---|---|---|
| Pay-per-use | Pay-per-token | ✓ | Cloud | — | ๐บ๐ธ United States | 2023 | |
| Pay-per-use | $1/1M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | 2021 | |
| — | — | — | — | — | ๐บ๐ธ United States | — | |
| Pay-per-use | ~$0.63/hr (T4 GPU) | ✓ | Cloud + Self-hosted | — | ๐บ๐ธ United States | 2019 | |
| Pay-per-use | $0.15/hr (T4 GPU) | ✓ | Cloud + Self-hosted | ✓ | ๐บ๐ธ United States | 2021 | |
| — | — | ✓ | — | ✓ | ๐บ๐ธ United States | — | |
| Freemium | โฌ25/mo | ✓ | Cloud | — | ๐ธ๐ช Sweden | 2024 | |
| Freemium | Free tier available | ✓ | Cloud | — | ๐บ๐ธ United States | 2015 | |
| Pay-per-use | ~$1.10/hr (A10 GPU) | ✓ | Cloud | — | ๐บ๐ธ United States | 2021 | |
| Freemium | $0.011/1K neurons | ✓ | Cloud | — | ๐บ๐ธ United States | 2009 | |
| Freemium | $0.04/1M tokens | ✓ | Cloud + Self-hosted | — | ๐จ๐ฆ Canada | 2019 | |
| Pay-per-use | $6.50/hr (GH200 GPU) | — | Cloud | — | ๐บ๐ธ United States | 2017 | |
| Pay-per-use | Pay-per-use + 5% gateway fee | ✓ | Cloud | — | ๐ฆ๐น Austria | 2021 | |
| Pay-per-use | $0.02/M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | 2022 | |
| Pay-per-use | $0.028/1M tokens (cache hit) | ✓ | Cloud | ✓ | ๐จ๐ณ China | 2023 | |
| Pay-per-use | $0.02/megapixel | ✓ | Cloud | — | ๐บ๐ธ United States | 2021 | |
| Pay-per-use | $0.10/1M tokens | — | Cloud + Self-hosted | — | ๐บ๐ธ United States | 2022 | |
| Pay-per-use | $0.08/hr | ✓ | Cloud | — | ๐ฉ๐ช Germany | 2018 | |
| Freemium | Free | ✓ | Cloud | — | ๐บ๐ธ United States | 2023 | |
| Freemium | $0.05/1M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | 2016 | |
| Pay-per-use | $0.033/hr (CPU) | ✓ | Cloud | ✓ | ๐บ๐ธ United States | 2016 | |
| Pay-per-use | $0.15/hr | — | Cloud | — | ๐ฌ๐ง United Kingdom | 2020 | |
| Pay-per-use | Free | ✓ | — | — | ๐ฑ๐บ Luxembourg | 2025 | |
| Pay-per-use | $0.02/M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | 2025 | |
| Pay-per-use | $0.58/GPU/hr (V100) | ✓ | Cloud | — | ๐บ๐ธ United States | 2012 | |
| Freemium | $3/300 credits | ✓ | Cloud | — | ๐บ๐ธ United States | — | |
| Freemium | $0.10/1M tokens | — | Cloud + Self-hosted | ✓ | ๐ซ๐ท France | 2023 | |
| Pay-per-use | $30/mo free credits | ✓ | Cloud | — | ๐บ๐ธ United States | 2021 | |
| Freemium | Free | ✓ | — | — | ๐บ๐ธ United States | 2022 | |
| Pay-per-use | — | ✓ | Cloud | — | ๐บ๐ธ United States | — | |
| Pay-per-use | $2.00/hr (H100) | ✓ | Cloud | — | ๐ณ๐ฑ Netherlands | 2024 | |
| Pay-per-use | $0.03/M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | 2023 | |
| Pay-per-use | $0.01/M tokens | ✓ | Cloud | — | ๐ฌ๐ง United Kingdom | 2024 | |
| — | — | ✓ | — | — | ๐บ๐ธ United States | — | |
| Freemium | Free (open-source) | ✓ | Cloud + Self-hosted | ✓ | ๐บ๐ธ United States | 2023 | |
| Pay-per-use | $0.05/1M tokens | ✓ | Cloud | — | ๐บ๐ธ United States | 2015 | |
| Freemium | Free (25+ free models) | ✓ | Cloud | — | ๐บ๐ธ United States | 2023 | |
| Pay-per-use | $0.91/hr (L4 GPU) | ✓ | Cloud | — | ๐ซ๐ท France | 1999 | |
| Freemium | Free | ✓ | — | — | ๐จ๐ญ Switzerland | 2023 | |
| Pay-per-use | Per-second GPU billing | — | Cloud | — | ๐บ๐ธ United States | 2019 | |
| Pay-per-use | $0.06/hr | — | Cloud | — | ๐บ๐ธ United States | 2022 | |
| Freemium | $5 free credit | ✓ | Cloud + Self-hosted | — | ๐บ๐ธ United States | 2017 | |
| Pay-per-use | โฌ0.20/M tokens | ✓ | Cloud | — | ๐ซ๐ท France | 1999 | |
| Pay-per-use | $0.0015/image | — | Cloud | — | ๐บ๐ธ United States | — | |
| Pay-per-use | ~โฌ2.70/GPU-hr | — | Cloud | — | ๐ฉ๐ช Germany | 2009 | |
| Pay-per-use | Pay-per-token | — | Cloud + Self-hosted | — | ๐บ๐ธ United States | 2022 | |
| Pay-per-use | ~$0.06/GPU/hr | — | Cloud | — | ๐บ๐ธ United States | 2018 | |
| Pay-per-use | $0.14/hr | — | Cloud | — | ๐ซ๐ฎ Finland | 2020 | |
| Free | Free (open-source) | ✓ | Self-hosted | ✓ | ๐บ๐ธ United States | 2023 |
Pricing units vary by provider type: per-token for LLM APIs, per-GPU-hour for compute platforms, per-request for media generation. Verify current rates on each provider's website.
Providers with free tiers
These inference apis providers offer free credits, free tiers, or open-source self-hosting options to get started without upfront costs.
Managed API access to foundation models on AWS with built-in fine-tuning and ...
From: Pay-per-token
Claude API for building AI applications with Opus, Sonnet, and Haiku models
From: $1/1M tokens
AI inference platform for deploying and serving ML models with autoscaling an...
From: ~$0.63/hr (T4 GPU)
Open-source serverless GPU cloud with sub-second cold starts and auto-scaling
From: $0.15/hr (T4 GPU)
BentoML is the platform for software engineers to build AI products.
Ultra-fast inference on custom wafer-scale hardware with OpenAI-compatible API
From: Free tier available
Serverless GPU infrastructure for deploying AI models with sub-5 second cold ...
From: ~$1.10/hr (A10 GPU)
Run AI models at the edge on Cloudflare's global network with serverless infe...
From: $0.011/1K neurons
Cohereโs world-class LLMs help enterprises build powerful, secure application...
From: $0.04/1M tokens
European AI inference gateway with smart routing across EU providers
From: Pay-per-use + 5% gateway fee
Run the top AI models using a simple API, pay per use. Low cost, scalable and...
From: $0.02/M tokens
Cost-effective inference API with OpenAI-compatible endpoints and open-weight...
From: $0.028/1M tokens (cache hit)
Build the next generation of creativity with fal. Lightning fast inference.
From: $0.02/megapixel
European GPU cloud for AI training and inference powered by 100% green energy
From: $0.08/hr
Google's API for Gemini models with text, image, video, and audio capabilities
From: Free
Groq is on a mission to set the standard for GenAI inference speed, helping r...
From: $0.05/1M tokens
The open-source AI platform with 500K+ models, inference endpoints, and fine-...
From: $0.033/hr (CPU)
European sovereign AI inference with OpenAI-compatible APIs hosted in EU data...
From: Free
High-throughput inference API with OpenAI-compatible access to open-source mo...
From: $0.02/M tokens
GPU cloud for AI training and inference with on-demand and cluster options
From: $0.58/GPU/hr (V100)
Multi-LLM API orchestration platform for comparing and blending AI models
From: $3/300 credits
Run generative AI models, large-scale batch jobs, job queues, and much more.
From: $30/mo free credits
Access, finetune, deploy LLMs using our affordable and scalable APIs.
OctoAI delivers production-grade GenAI solutions running on the most efficien...
Unified API gateway for 300+ AI models across 60+ providers with automatic fa...
From: Free (25+ free models)
European cloud provider with AI inference, training, and deployment services
From: $0.91/hr (L4 GPU)
Custom AI chip inference platform with purpose-built hardware for high-throug...
From: $5 free credit
High-throughput LLM inference engine with PagedAttention for efficient GPU me...
From: Free (open-source)
How to choose an inference API provider
The right provider depends on workload type, latency requirements, and budget. Most providers use pay-per-token pricing for LLMs and per-second GPU billing for custom models. Token-based pricing varies by model, so the cheapest provider for one model may not be cheapest for another.
Free tiers are useful for prototyping but often come with rate limits. For production, compare per-token costs for your specific model, cold start latency, rate limits, and whether the provider supports the models you need.
Teams with data residency requirements should check hosting options and provider headquarters. European providers like Berget AI, cohere, Cortecs AI keep data within EU jurisdiction. See the full European AI Infrastructure directory. Self-hostable options like Baseten and Beam give full control over data location.
For a deeper analysis, read AI Inference API Providers Compared on the blog. Pricing changes frequently, so verify current rates on each provider's website. Submit a correction.
Browse all Inference APIs tools or explore the full AI Infrastructure Landscape.
Is your product missing? ๐ Add it here โ