Home / Inference APIs / Compare

Inference APIs Pricing Comparison

67 providers compared by pricing model, free tiers, hosting options, and headquarters. Last updated June 2026.

41 with free tiers · 7 open source · 8 self-hostable · 21 European

Provider	Pricing Model	Starting Price	Free Tier	Hosting	Open Source	HQ
AiQu	Subscription	49 SEK/month	✓	Cloud	—	🇸🇪 Sweden
Airon	Hourly	$1.24/hour	—	Cloud	—	🇸🇪 Sweden
AKI.IO	Usage Based	Free (10 EUR credits)	✓	—	—	🇩🇪 Germany
Amazon Bedrock	Pay-per-use	Pay-per-token	✓	Cloud	—	🇺🇸 United States
Anthropic Claude	Pay-per-use	$1/1M tokens	✓	Cloud	—	🇺🇸 United States
Anyscale	—	—	—	—	—	🇺🇸 United States
ARK Labs	—	—	✓	—	—	—
Baseten	Pay-per-use	~$0.63/hr (T4 GPU)	✓	Cloud + Self-hosted	—	🇺🇸 United States
Beam	Pay-per-use	$0.15/hr (T4 GPU)	✓	Cloud + Self-hosted	✓	🇺🇸 United States
BentoML	—	—	✓	—	✓	🇺🇸 United States
Berget AI	Freemium	€25/mo	✓	Cloud	—	🇸🇪 Sweden
Cerebras	Freemium	Free tier available	✓	Cloud	—	🇺🇸 United States
Cerebrium	Pay-per-use	~$1.10/hr (A10 GPU)	✓	Cloud	—	🇺🇸 United States
CheapestInference	Subscription	$33.15/mo	—	—	—	—
Cloudflare Workers AI	Freemium	$0.011/1K neurons	✓	Cloud	—	🇺🇸 United States
CodingPlanX	—	—	✓	Cloud	—	—
cohere	Freemium	$0.04/1M tokens	✓	Cloud + Self-hosted	—	🇨🇦 Canada
CoreWeave	Pay-per-use	$6.50/hr (GH200 GPU)	—	Cloud	—	🇺🇸 United States
Cortecs AI	Pay-per-use	Pay-per-use + 5% gateway fee	✓	Cloud	—	🇦🇹 Austria
deepinfra	Pay-per-use	$0.02/M tokens	✓	Cloud	—	🇺🇸 United States
DeepSeek	Pay-per-use	$0.028/1M tokens (cache hit)	✓	Cloud	✓	🇨🇳 China
EUrouter	Subscription + Usage	Free (1K req/mo), 39 EUR/mo (Plus)	—	—	—	🇳🇱 Netherlands
evroc	Contact Sales	—	—	—	—	🇸🇪 Sweden
fal	Pay-per-use	$0.02/megapixel	✓	Cloud	—	🇺🇸 United States
fireworks.ai	Pay-per-use	$0.10/1M tokens	—	Cloud + Self-hosted	—	🇺🇸 United States
General Compute	—	—	—	Cloud	—	🇺🇸 United States
Genesis Cloud	Pay-per-use	$0.08/hr	✓	Cloud	—	🇩🇪 Germany
Geodd	—	—	—	Cloud	—	🇺🇸 United States
Google Gemini API	Freemium	Free	✓	Cloud	—	🇺🇸 United States
Groq	Freemium	$0.05/1M tokens	✓	Cloud	—	🇺🇸 United States
Hyperstack	Pay-per-use	$0.15/hr	—	Cloud	—	🇬🇧 United Kingdom
Infercom	Pay-per-use	Free	✓	—	—	🇱🇺 Luxembourg
IonRouter	Pay-per-use	$0.02/M tokens	✓	Cloud	—	🇺🇸 United States
Jina AI	Usage Based	Free (10M tokens)	✓	—	—	🇩🇪 Germany
Lambda	Pay-per-use	$0.58/GPU/hr (V100)	✓	Cloud	—	🇺🇸 United States
LibertAI	—	—	—	—	—	—
L LLMBase	—	—	—	—	—	—
LLMWise	Freemium	$3/300 credits	✓	Cloud	—	🇺🇸 United States
Miapi	—	—	✓	—	—	—
Mistral	Freemium	$0.10/1M tokens	—	Cloud + Self-hosted	✓	🇫🇷 France
Modal	Pay-per-use	$30/mo free credits	✓	Cloud	—	🇺🇸 United States
Monster API	Pay-per-use	—	✓	Cloud	—	🇺🇸 United States
Nebius	Pay-per-use	$2.00/hr (H100)	✓	Cloud	—	🇳🇱 Netherlands
novita.ai	Pay-per-use	$0.03/M tokens	✓	Cloud	—	🇺🇸 United States
Nscale	Pay-per-use	$0.01/M tokens	✓	Cloud	—	🇬🇧 United Kingdom
OctoAI	—	—	✓	—	—	🇺🇸 United States
OpenAI	Pay-per-use	$0.05/1M tokens	✓	Cloud	—	🇺🇸 United States
OpenRouter	Freemium	Free (25+ free models)	✓	Cloud	—	🇺🇸 United States
Our Token	—	—	—	—	—	—
OVHcloud AI	Pay-per-use	$0.91/hr (L4 GPU)	✓	Cloud	—	🇫🇷 France
Packet.ai	Pay-per-use	$0.66/hr RTX PRO 6000 Blackwell	—	Cloud	—	🇸🇪 Sweden
Prem AI	Freemium	Free	✓	—	—	🇨🇭 Switzerland
Replicate	Pay-per-use	Per-second GPU billing	—	Cloud	—	🇺🇸 United States
RunPod	Pay-per-use	$0.06/hr	—	Cloud	—	🇺🇸 United States
SambaNova	Freemium	$5 free credit	✓	Cloud + Self-hosted	—	🇺🇸 United States
Scaleway	Pay-per-use	€0.20/M tokens	✓	Cloud	—	🇫🇷 France
SGLang	Open Source	—	—	—	✓	—
Synexa	Pay-per-use	$0.0015/image	—	Cloud	—	🇺🇸 United States
Taiga Cloud	Pay-per-use	~€2.70/GPU-hr	—	Cloud	—	🇩🇪 Germany
Tensorix	Usage Based	—	—	—	—	🇮🇪 Ireland
Theta EdgeCloud	—	—	—	—	✓	—
together.ai	Pay-per-use	Pay-per-token	—	Cloud + Self-hosted	—	🇺🇸 United States
Vast.ai	Pay-per-use	~$0.06/GPU/hr	—	Cloud	—	🇺🇸 United States
Vercel AI Gateway	—	—	✓	Cloud	—	🇺🇸 United States
Verda	Pay-per-use	$0.14/hr	—	Cloud	—	🇫🇮 Finland
vLLM	Free	Free (open-source)	✓	Self-hosted	✓	🇺🇸 United States
vMetal	—	—	—	—	—	—

ℹ️ Pricing units vary by provider type: per-token for LLM APIs, per-GPU-hour for compute platforms, per-request for media generation. Verify current rates on each provider's website.

Providers with free tiers

These inference apis providers offer free credits, free tiers, or open-source self-hosting options to get started without upfront costs.

AiQu

Swedish GPU infrastructure and LLM hosting platform with API-first deployment...

From: 49 SEK/month

AKI.IO

European AI API for open-source models on EU infrastructure

From: Free (10 EUR credits)

Amazon Bedrock

Managed API access to foundation models on AWS with built-in fine-tuning and ...

From: Pay-per-token

Anthropic Claude

Claude API for building AI applications with Opus, Sonnet, and Haiku models

From: $1/1M tokens

ARK Labs

Sovereign AI inference infrastructure for regulated EU environments, with het...

Baseten

AI inference platform for deploying and serving ML models with autoscaling an...

From: ~$0.63/hr (T4 GPU)

Show all 41 providers with free tiers

Beam

Open-source serverless GPU cloud with sub-second cold starts and auto-scaling

From: $0.15/hr (T4 GPU)

BentoML

BentoML is the platform for software engineers to build AI products.

Berget AI

EU-sovereign AI inference platform with OpenAI-compatible API

From: €25/mo

Cerebras

Ultra-fast inference on custom wafer-scale hardware with OpenAI-compatible API

From: Free tier available

Cerebrium

Serverless GPU infrastructure for deploying AI models with sub-5 second cold ...

From: ~$1.10/hr (A10 GPU)

Cloudflare Workers AI

Run AI models at the edge on Cloudflare's global network with serverless infe...

From: $0.011/1K neurons

CodingPlanX

Unified AI API gateway providing access to 600+ models from OpenAI, Anthropic...

cohere

Cohere’s world-class LLMs help enterprises build powerful, secure application...

From: $0.04/1M tokens

Cortecs AI

European AI inference gateway with smart routing across EU providers

From: Pay-per-use + 5% gateway fee

deepinfra

Run the top AI models using a simple API, pay per use. Low cost, scalable and...

From: $0.02/M tokens

DeepSeek

Cost-effective inference API with OpenAI-compatible endpoints and open-weight...

From: $0.028/1M tokens (cache hit)

fal

Build the next generation of creativity with fal. Lightning fast inference.

From: $0.02/megapixel

Genesis Cloud

European GPU cloud for AI training and inference powered by 100% green energy

From: $0.08/hr

Google Gemini API

Google's API for Gemini models with text, image, video, and audio capabilities

From: Free

Groq

LPU-powered inference API for LLMs, speech, and vision models with usage-base...

From: $0.05/1M tokens

Infercom

European sovereign AI inference with OpenAI-compatible APIs hosted in EU data...

From: Free

IonRouter

High-throughput inference API with OpenAI-compatible access to open-source mo...

From: $0.02/M tokens

Jina AI

Search APIs for embeddings, reranking, and web-to-markdown conversion

From: Free (10M tokens)

Lambda

GPU cloud for AI training and inference with on-demand and cluster options

From: $0.58/GPU/hr (V100)

LLMWise

Multi-LLM API orchestration platform for comparing and blending AI models

From: $3/300 credits

Miapi

Web-grounded AI answers API with citations, OpenAI-compatible, pay-per-query ...

Modal

Run generative AI models, large-scale batch jobs, job queues, and much more.

From: $30/mo free credits

Monster API

Access, finetune, deploy LLMs using our affordable and scalable APIs.

Nebius

Full-stack AI cloud with GPU infrastructure for training and inference

From: $2.00/hr (H100)

novita.ai

APIs, Serverless and GPU Instance In One AI Cloud

From: $0.03/M tokens

Nscale

European AI hyperscaler with serverless inference and GPU cloud

From: $0.01/M tokens

OctoAI

OctoAI delivers production-grade GenAI solutions running on the most efficien...

OpenAI

API access to GPT, o-series reasoning, DALL-E, and Whisper models

From: $0.05/1M tokens

OpenRouter

Unified API for 400+ AI models across 60+ providers, OpenAI SDK-compatible, p...

From: Free (25+ free models)

OVHcloud AI

European cloud provider with AI inference, training, and deployment services

From: $0.91/hr (L4 GPU)

Prem AI

Fine-tune and deploy LLMs on your own infrastructure with full data sovereignty

From: Free

SambaNova

Custom AI chip inference platform with purpose-built hardware for high-throug...

From: $5 free credit

Scaleway

European serverless AI inference APIs, 100% hosted in Europe

From: €0.20/M tokens

Vercel AI Gateway

Unified API for hundreds of AI models, with built-in rate limiting and key ma...

vLLM

High-throughput LLM inference engine with PagedAttention for efficient GPU me...

From: Free (open-source)

Frequently asked questions

What is the cheapest AI inference API?

DeepInfra consistently ranks among the cheapest per-token providers on current open-source frontier models like gpt-oss-120B, Kimi K2, and Qwen3.5. On Artificial Analysis as of April 2026, gpt-oss-120B on DeepInfra is listed at roughly $0.08 per 1M blended tokens. Pricing shifts month to month, so the comparison table above is worth checking before committing to a provider for high-volume workloads.

What is the fastest AI inference API?

It depends on whether sustained throughput or first-token latency matters more. Cerebras reports around 3000 tokens/sec on gpt-oss-120B using WSE hardware, the highest measured throughput in the category as of April 2026. Groq uses custom LPU hardware and runs the same model at ~476 tokens/sec on Artificial Analysis, with a consistently low time-to-first-token (0.6-0.9s) that matters for interactive chat. Both trade off a narrower model catalog than GPU-based providers.

Which AI inference APIs offer a free tier?

Cerebras and Groq both offer free usage with daily token limits, useful for prototyping. Most of the serverless providers (DeepInfra, Together, Fireworks, Novita) hand out free credits on signup rather than a permanent free tier. The "Free tier" filter above lists every provider with a free option.

Which inference providers are OpenAI-compatible?

DeepInfra, Together.ai, Fireworks, Novita, OpenRouter, and Groq all expose a drop-in OpenAI-compatible endpoint. Switching between them usually means changing the base URL and API key, nothing more. Replicate uses its own API format, and raw GPU providers like RunPod and Modal are not endpoints at all, they host whatever gets deployed to them.

Are there EU-hosted AI inference APIs?

Yes. Nebius, Scaleway, and Mistral La Plateforme run inference inside the EU, which matters for GDPR-sensitive workloads. Use the "European" filter above to see the full list, or visit the European providers page for hosting region details.

What is the best alternative to the OpenAI API?

For the highest sustained throughput on open-source models, Cerebras. For the lowest first-token latency, Groq. For the lowest cost per token, DeepInfra. For fine-tuning on the same platform as inference, Together.ai or Fireworks. For routing across providers from a single API, OpenRouter.

How to choose an inference API provider

The right provider depends on workload type, latency requirements, and budget. Most providers use pay-per-token pricing for LLMs and per-second GPU billing for custom models. Token-based pricing varies by model, so the cheapest provider for one model may not be cheapest for another.

Free tiers are useful for prototyping but often come with rate limits. For production, compare per-token costs for your specific model, cold start latency, rate limits, and whether the provider supports the models you need.

Teams with data residency requirements should check hosting options and provider headquarters. European providers like AiQu, Airon, AKI.IO keep data within EU jurisdiction. See the full European AI Infrastructure directory. Self-hostable options like Baseten and Beam give full control over data location.

For a deeper analysis, read AI Inference API Providers Compared on the blog. Pricing changes frequently, so verify current rates on each provider's website. Submit a correction.

See how these tools fit into a full stack

🚀 Indie & Early Startup Stack 🤖 AI Agent Stack 🖥️ Self-Hosted Stack

Browse all Inference APIs tools or explore the full AI Infrastructure Landscape.

Is your product missing?

Add it here →