Home / Inference APIs / Compare

Inference APIs Pricing Comparison

60 providers compared by pricing model, free tiers, hosting options, and headquarters. Last updated April 2026.

41 with free tiers ยท 9 open source ยท 9 self-hostable ยท 20 European

Provider Pricing Model Starting Price Free Tier Hosting Open Source HQ
Subscription 49 SEK/month Cloud ๐Ÿ‡ธ๐Ÿ‡ช Sweden
Hourly $1.24/hour Cloud ๐Ÿ‡ธ๐Ÿ‡ช Sweden
Usage Based Free (10 EUR credits) ๐Ÿ‡ฉ๐Ÿ‡ช Germany
Pay-per-use Pay-per-token Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $1/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~$0.63/hr (T4 GPU) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.15/hr (T4 GPU) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium โ‚ฌ25/mo Cloud ๐Ÿ‡ธ๐Ÿ‡ช Sweden
Freemium Free tier available Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~$1.10/hr (A10 GPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $0.011/1K neurons Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Cloud
Freemium $0.04/1M tokens Cloud + Self-hosted ๐Ÿ‡จ๐Ÿ‡ฆ Canada
Pay-per-use $6.50/hr (GH200 GPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use Pay-per-use + 5% gateway fee Cloud ๐Ÿ‡ฆ๐Ÿ‡น Austria
Pay-per-use $0.02/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.028/1M tokens (cache hit) Cloud ๐Ÿ‡จ๐Ÿ‡ณ China
Subscription + Usage Free (1K req/mo), 39 EUR/mo (Plus) ๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands
Contact Sales ๐Ÿ‡ธ๐Ÿ‡ช Sweden
fal
Pay-per-use $0.02/megapixel Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.10/1M tokens Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.08/hr Cloud ๐Ÿ‡ฉ๐Ÿ‡ช Germany
Freemium Free Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $0.05/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.033/hr (CPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.15/hr Cloud ๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom
Pay-per-use Free ๐Ÿ‡ฑ๐Ÿ‡บ Luxembourg
Pay-per-use $0.02/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Usage Based Free (10M tokens) ๐Ÿ‡ฉ๐Ÿ‡ช Germany
Pay-per-use $0.58/GPU/hr (V100) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $3/300 credits Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $0.10/1M tokens Cloud + Self-hosted ๐Ÿ‡ซ๐Ÿ‡ท France
Pay-per-use $30/mo free credits Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium Free ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $2.00/hr (H100) Cloud ๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands
Pay-per-use $0.03/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.01/M tokens Cloud ๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom
๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium Free (open-source) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.05/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium Free (25+ free models) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.91/hr (L4 GPU) Cloud ๐Ÿ‡ซ๐Ÿ‡ท France
Freemium Free ๐Ÿ‡จ๐Ÿ‡ญ Switzerland
Pay-per-use Per-second GPU billing Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.06/hr Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $5 free credit Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use โ‚ฌ0.20/M tokens Cloud ๐Ÿ‡ซ๐Ÿ‡ท France
Open Source
Pay-per-use $0.0015/image Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~โ‚ฌ2.70/GPU-hr Cloud ๐Ÿ‡ฉ๐Ÿ‡ช Germany
Usage Based ๐Ÿ‡ฎ๐Ÿ‡ช Ireland
Pay-per-use Pay-per-token Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~$0.06/GPU/hr Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $0.14/hr Cloud ๐Ÿ‡ซ๐Ÿ‡ฎ Finland
Free Free (open-source) Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States
ℹ️ Pricing units vary by provider type: per-token for LLM APIs, per-GPU-hour for compute platforms, per-request for media generation. Verify current rates on each provider's website.

Providers with free tiers

These inference apis providers offer free credits, free tiers, or open-source self-hosting options to get started without upfront costs.

Swedish GPU infrastructure and LLM hosting platform with API-first deployment...

From: 49 SEK/month

European AI API for open-source models on EU infrastructure

From: Free (10 EUR credits)

Managed API access to foundation models on AWS with built-in fine-tuning and ...

From: Pay-per-token

Claude API for building AI applications with Opus, Sonnet, and Haiku models

From: $1/1M tokens

AI inference platform for deploying and serving ML models with autoscaling an...

From: ~$0.63/hr (T4 GPU)

Open-source serverless GPU cloud with sub-second cold starts and auto-scaling

From: $0.15/hr (T4 GPU)

Show all 41 providers with free tiers

BentoML is the platform for software engineers to build AI products.

EU-sovereign AI inference platform with OpenAI-compatible API

From: โ‚ฌ25/mo

Ultra-fast inference on custom wafer-scale hardware with OpenAI-compatible API

From: Free tier available

Serverless GPU infrastructure for deploying AI models with sub-5 second cold ...

From: ~$1.10/hr (A10 GPU)

Run AI models at the edge on Cloudflare's global network with serverless infe...

From: $0.011/1K neurons

Unified AI API gateway providing access to 600+ models from OpenAI, Anthropic...

Cohereโ€™s world-class LLMs help enterprises build powerful, secure application...

From: $0.04/1M tokens

European AI inference gateway with smart routing across EU providers

From: Pay-per-use + 5% gateway fee

Run the top AI models using a simple API, pay per use. Low cost, scalable and...

From: $0.02/M tokens

Cost-effective inference API with OpenAI-compatible endpoints and open-weight...

From: $0.028/1M tokens (cache hit)

fal

Build the next generation of creativity with fal. Lightning fast inference.

From: $0.02/megapixel

European GPU cloud for AI training and inference powered by 100% green energy

From: $0.08/hr

Google's API for Gemini models with text, image, video, and audio capabilities

From: Free

Groq is on a mission to set the standard for GenAI inference speed, helping r...

From: $0.05/1M tokens

The open-source AI platform with 500K+ models, inference endpoints, and fine-...

From: $0.033/hr (CPU)

European sovereign AI inference with OpenAI-compatible APIs hosted in EU data...

From: Free

High-throughput inference API with OpenAI-compatible access to open-source mo...

From: $0.02/M tokens

Search APIs for embeddings, reranking, and web-to-markdown conversion

From: Free (10M tokens)

GPU cloud for AI training and inference with on-demand and cluster options

From: $0.58/GPU/hr (V100)

Multi-LLM API orchestration platform for comparing and blending AI models

From: $3/300 credits

Run generative AI models, large-scale batch jobs, job queues, and much more.

From: $30/mo free credits

We rebuilt the modern AI software stack, from the ground up, to boost any AI ...

From: Free

Access, finetune, deploy LLMs using our affordable and scalable APIs.

Full-stack AI cloud with GPU infrastructure for training and inference

From: $2.00/hr (H100)

APIs, Serverless and GPU Instance In One AI Cloud

From: $0.03/M tokens

European AI hyperscaler with serverless inference and GPU cloud

From: $0.01/M tokens

OctoAI delivers production-grade GenAI solutions running on the most efficien...

Run large language models locally with a single command

From: Free (open-source)

API access to GPT, o-series reasoning, DALL-E, and Whisper models

From: $0.05/1M tokens

Unified API gateway for 300+ AI models across 60+ providers with automatic fa...

From: Free (25+ free models)

European cloud provider with AI inference, training, and deployment services

From: $0.91/hr (L4 GPU)

Fine-tune and deploy LLMs on your own infrastructure with full data sovereignty

From: Free

Custom AI chip inference platform with purpose-built hardware for high-throug...

From: $5 free credit

European serverless AI inference APIs, 100% hosted in Europe

From: โ‚ฌ0.20/M tokens

High-throughput LLM inference engine with PagedAttention for efficient GPU me...

From: Free (open-source)

Frequently asked questions

What is the cheapest AI inference API?

DeepInfra consistently ranks among the cheapest per-token providers on current open-source frontier models like gpt-oss-120B, Kimi K2, and Qwen3.5. On Artificial Analysis as of April 2026, gpt-oss-120B on DeepInfra is listed at roughly $0.08 per 1M blended tokens. Pricing shifts month to month, so the comparison table above is worth checking before committing to a provider for high-volume workloads.

What is the fastest AI inference API?

It depends on whether sustained throughput or first-token latency matters more. Cerebras reports around 3000 tokens/sec on gpt-oss-120B using WSE hardware, the highest measured throughput in the category as of April 2026. Groq uses custom LPU hardware and runs the same model at ~476 tokens/sec on Artificial Analysis, with a consistently low time-to-first-token (0.6-0.9s) that matters for interactive chat. Both trade off a narrower model catalog than GPU-based providers.

Which AI inference APIs offer a free tier?

Cerebras and Groq both offer free usage with daily token limits, useful for prototyping. Most of the serverless providers (DeepInfra, Together, Fireworks, Novita) hand out free credits on signup rather than a permanent free tier. The "Free tier" filter above lists every provider with a free option.

Which inference providers are OpenAI-compatible?

DeepInfra, Together.ai, Fireworks, Novita, OpenRouter, and Groq all expose a drop-in OpenAI-compatible endpoint. Switching between them usually means changing the base URL and API key, nothing more. Replicate uses its own API format, and raw GPU providers like RunPod and Modal are not endpoints at all, they host whatever gets deployed to them.

Are there EU-hosted AI inference APIs?

Yes. Nebius, Scaleway, and Mistral La Plateforme run inference inside the EU, which matters for GDPR-sensitive workloads. Use the "European" filter above to see the full list, or visit the European providers page for hosting region details.

What is the best alternative to the OpenAI API?

For the highest sustained throughput on open-source models, Cerebras. For the lowest first-token latency, Groq. For the lowest cost per token, DeepInfra. For fine-tuning on the same platform as inference, Together.ai or Fireworks. For routing across providers from a single API, OpenRouter.

How to choose an inference API provider

The right provider depends on workload type, latency requirements, and budget. Most providers use pay-per-token pricing for LLMs and per-second GPU billing for custom models. Token-based pricing varies by model, so the cheapest provider for one model may not be cheapest for another.

Free tiers are useful for prototyping but often come with rate limits. For production, compare per-token costs for your specific model, cold start latency, rate limits, and whether the provider supports the models you need.

Teams with data residency requirements should check hosting options and provider headquarters. European providers like AiQu, Airon, AKI.IO keep data within EU jurisdiction. See the full European AI Infrastructure directory. Self-hostable options like Baseten and Beam give full control over data location.

For a deeper analysis, read AI Inference API Providers Compared on the blog. Pricing changes frequently, so verify current rates on each provider's website. Submit a correction.

Browse all Inference APIs tools or explore the full AI Infrastructure Landscape.

Is your product missing?

Add it here →