Home / Inference APIs / Compare

Inference APIs Pricing Comparison

49 providers compared by pricing model, free tiers, hosting options, and headquarters. Last updated March 2026.

37 with free tiers ยท 7 open source ยท 9 self-hostable ยท 14 European

Provider Pricing Model Starting Price Free Tier Hosting Open Source HQ Founded
Pay-per-use Pay-per-token Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2023
Pay-per-use $1/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2021
๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~$0.63/hr (T4 GPU) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States 2019
Pay-per-use $0.15/hr (T4 GPU) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States 2021
๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium โ‚ฌ25/mo Cloud ๐Ÿ‡ธ๐Ÿ‡ช Sweden 2024
Freemium Free tier available Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2015
Pay-per-use ~$1.10/hr (A10 GPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2021
Freemium $0.011/1K neurons Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2009
Freemium $0.04/1M tokens Cloud + Self-hosted ๐Ÿ‡จ๐Ÿ‡ฆ Canada 2019
Pay-per-use $6.50/hr (GH200 GPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2017
Pay-per-use Pay-per-use + 5% gateway fee Cloud ๐Ÿ‡ฆ๐Ÿ‡น Austria 2021
Pay-per-use $0.02/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2022
Pay-per-use $0.028/1M tokens (cache hit) Cloud ๐Ÿ‡จ๐Ÿ‡ณ China 2023
fal
Pay-per-use $0.02/megapixel Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2021
Pay-per-use $0.10/1M tokens Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States 2022
Pay-per-use $0.08/hr Cloud ๐Ÿ‡ฉ๐Ÿ‡ช Germany 2018
Freemium Free Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2023
Freemium $0.05/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2016
Pay-per-use $0.033/hr (CPU) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2016
Pay-per-use $0.15/hr Cloud ๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom 2020
Pay-per-use Free ๐Ÿ‡ฑ๐Ÿ‡บ Luxembourg 2025
Pay-per-use $0.02/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2025
Pay-per-use $0.58/GPU/hr (V100) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2012
Freemium $3/300 credits Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium $0.10/1M tokens Cloud + Self-hosted ๐Ÿ‡ซ๐Ÿ‡ท France 2023
Pay-per-use $30/mo free credits Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2021
Freemium Free ๐Ÿ‡บ๐Ÿ‡ธ United States 2022
Pay-per-use Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use $2.00/hr (H100) Cloud ๐Ÿ‡ณ๐Ÿ‡ฑ Netherlands 2024
Pay-per-use $0.03/M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2023
Pay-per-use $0.01/M tokens Cloud ๐Ÿ‡ฌ๐Ÿ‡ง United Kingdom 2024
๐Ÿ‡บ๐Ÿ‡ธ United States
Freemium Free (open-source) Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States 2023
Pay-per-use $0.05/1M tokens Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2015
Freemium Free (25+ free models) Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2023
Pay-per-use $0.91/hr (L4 GPU) Cloud ๐Ÿ‡ซ๐Ÿ‡ท France 1999
Freemium Free ๐Ÿ‡จ๐Ÿ‡ญ Switzerland 2023
Pay-per-use Per-second GPU billing Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2019
Pay-per-use $0.06/hr Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2022
Freemium $5 free credit Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States 2017
Pay-per-use โ‚ฌ0.20/M tokens Cloud ๐Ÿ‡ซ๐Ÿ‡ท France 1999
Pay-per-use $0.0015/image Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States
Pay-per-use ~โ‚ฌ2.70/GPU-hr Cloud ๐Ÿ‡ฉ๐Ÿ‡ช Germany 2009
Pay-per-use Pay-per-token Cloud + Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States 2022
Pay-per-use ~$0.06/GPU/hr Cloud ๐Ÿ‡บ๐Ÿ‡ธ United States 2018
Pay-per-use $0.14/hr Cloud ๐Ÿ‡ซ๐Ÿ‡ฎ Finland 2020
Free Free (open-source) Self-hosted ๐Ÿ‡บ๐Ÿ‡ธ United States 2023

Pricing units vary by provider type: per-token for LLM APIs, per-GPU-hour for compute platforms, per-request for media generation. Verify current rates on each provider's website.

Providers with free tiers

These inference apis providers offer free credits, free tiers, or open-source self-hosting options to get started without upfront costs.

Managed API access to foundation models on AWS with built-in fine-tuning and ...

From: Pay-per-token

Claude API for building AI applications with Opus, Sonnet, and Haiku models

From: $1/1M tokens

AI inference platform for deploying and serving ML models with autoscaling an...

From: ~$0.63/hr (T4 GPU)

Open-source serverless GPU cloud with sub-second cold starts and auto-scaling

From: $0.15/hr (T4 GPU)

BentoML is the platform for software engineers to build AI products.

EU-sovereign AI inference platform with OpenAI-compatible API

From: โ‚ฌ25/mo

Ultra-fast inference on custom wafer-scale hardware with OpenAI-compatible API

From: Free tier available

Serverless GPU infrastructure for deploying AI models with sub-5 second cold ...

From: ~$1.10/hr (A10 GPU)

Run AI models at the edge on Cloudflare's global network with serverless infe...

From: $0.011/1K neurons

Cohereโ€™s world-class LLMs help enterprises build powerful, secure application...

From: $0.04/1M tokens

European AI inference gateway with smart routing across EU providers

From: Pay-per-use + 5% gateway fee

Run the top AI models using a simple API, pay per use. Low cost, scalable and...

From: $0.02/M tokens

Cost-effective inference API with OpenAI-compatible endpoints and open-weight...

From: $0.028/1M tokens (cache hit)

fal

Build the next generation of creativity with fal. Lightning fast inference.

From: $0.02/megapixel

European GPU cloud for AI training and inference powered by 100% green energy

From: $0.08/hr

Google's API for Gemini models with text, image, video, and audio capabilities

From: Free

Groq is on a mission to set the standard for GenAI inference speed, helping r...

From: $0.05/1M tokens

The open-source AI platform with 500K+ models, inference endpoints, and fine-...

From: $0.033/hr (CPU)

European sovereign AI inference with OpenAI-compatible APIs hosted in EU data...

From: Free

High-throughput inference API with OpenAI-compatible access to open-source mo...

From: $0.02/M tokens

GPU cloud for AI training and inference with on-demand and cluster options

From: $0.58/GPU/hr (V100)

Multi-LLM API orchestration platform for comparing and blending AI models

From: $3/300 credits

Run generative AI models, large-scale batch jobs, job queues, and much more.

From: $30/mo free credits

We rebuilt the modern AI software stack, from the ground up, to boost any AI ...

From: Free

Access, finetune, deploy LLMs using our affordable and scalable APIs.

Full-stack AI cloud with GPU infrastructure for training and inference

From: $2.00/hr (H100)

APIs, Serverless and GPU Instance In One AI Cloud

From: $0.03/M tokens

European AI hyperscaler with serverless inference and GPU cloud

From: $0.01/M tokens

OctoAI delivers production-grade GenAI solutions running on the most efficien...

Run large language models locally with a single command

From: Free (open-source)

API access to GPT, o-series reasoning, DALL-E, and Whisper models

From: $0.05/1M tokens

Unified API gateway for 300+ AI models across 60+ providers with automatic fa...

From: Free (25+ free models)

European cloud provider with AI inference, training, and deployment services

From: $0.91/hr (L4 GPU)

Fine-tune and deploy LLMs on your own infrastructure with full data sovereignty

From: Free

Custom AI chip inference platform with purpose-built hardware for high-throug...

From: $5 free credit

European serverless AI inference APIs, 100% hosted in Europe

From: โ‚ฌ0.20/M tokens

High-throughput LLM inference engine with PagedAttention for efficient GPU me...

From: Free (open-source)

How to choose an inference API provider

The right provider depends on workload type, latency requirements, and budget. Most providers use pay-per-token pricing for LLMs and per-second GPU billing for custom models. Token-based pricing varies by model, so the cheapest provider for one model may not be cheapest for another.

Free tiers are useful for prototyping but often come with rate limits. For production, compare per-token costs for your specific model, cold start latency, rate limits, and whether the provider supports the models you need.

Teams with data residency requirements should check hosting options and provider headquarters. European providers like Berget AI, cohere, Cortecs AI keep data within EU jurisdiction. See the full European AI Infrastructure directory. Self-hostable options like Baseten and Beam give full control over data location.

For a deeper analysis, read AI Inference API Providers Compared on the blog. Pricing changes frequently, so verify current rates on each provider's website. Submit a correction.

Browse all Inference APIs tools or explore the full AI Infrastructure Landscape.

Is your product missing? ๐Ÿ‘€ Add it here โ†’