Groq
LPU-powered inference API for LLMs, speech, and vision models with usage-based pricing
Groq runs inference on custom LPU (Language Processing Unit) chips designed from scratch for token generation. The hardware trades general-purpose flexibility for deterministic, low-latency performance on transformer workloads. GroqCloud exposes this through an OpenAI-compatible API supporting Llama 3.3 70B, Qwen3 32B, GPT-OSS 20B, Whisper, and several TTS models. Pricing is per-token with no subscriptions. Prompt caching and a batch API each offer 50% discounts. A free API key is available to get started. Enterprise customers can deploy on-premises via GroqRack. SOC 2, GDPR, and HIPAA compliant.
Pricing: Per token usage
Resources
Groq builds its own silicon, the LPU (Language Processing Unit), specifically for running inference on large language models. Unlike GPUs, which are general-purpose, the LPU is a fixed-function chip optimized for the sequential nature of autoregressive token generation. This gives Groq consistently high throughput, around 800-1,000 tokens per second on models like Llama 3.3 70B and GPT-OSS 20B.
The cloud API (GroqCloud) is OpenAI SDK-compatible. Supported model families include Llama 3.1/3.3, Qwen3, GPT-OSS, plus Whisper for speech-to-text and several TTS voices. Pricing is straightforward per-token with no subscriptions. Llama 3.1 8B runs at $0.05/M input tokens, while Llama 3.3 70B is $0.59/M input. Prompt caching and the batch API each cut costs by 50%.
Groq offers a free API key with rate limits for getting started. For enterprise, GroqRack provides on-premises deployment of the same LPU hardware. The platform is SOC 2, GDPR, and HIPAA compliant.
The main trade-off is model selection. Because Groq runs on custom hardware, only models that have been ported to the LPU are available. The catalog is smaller than GPU-based providers like Together AI or Fireworks, though it covers the most popular open-weight models. If you need a niche or fine-tuned model, Groq may not support it.
Groq Alternatives
Explore 67 products in the Inference APIs category. View all Groq alternatives.
Ollama
Run large language models locally with a single command
OpenRouter
Unified API for 400+ AI models across 60+ providers, OpenAI SDK-compatible, pay-as-you-go
Compare
Is your product missing?