Icon for Groq

Groq

Free Trial

LPU-powered inference API for LLMs, speech, and vision models with usage-based pricing

Groq runs inference on custom LPU (Language Processing Unit) chips designed from scratch for token generation. The hardware trades general-purpose flexibility for deterministic, low-latency performance on transformer workloads. GroqCloud exposes this through an OpenAI-compatible API supporting Llama 3.3 70B, Qwen3 32B, GPT-OSS 20B, Whisper, and several TTS models. Pricing is per-token with no subscriptions. Prompt caching and a batch API each offer 50% discounts. A free API key is available to get started. Enterprise customers can deploy on-premises via GroqRack. SOC 2, GDPR, and HIPAA compliant.

Pricing: Per token usage

Hosting Cloud
Pricing Freemium, from $0.05/1M tokens
HQ 🇺🇸 United States
Founded 2016
License PROPRIETARY
Compliance SOC 2 · HIPAA · GDPR · SSO
Screenshot of Groq webpage

Groq builds its own silicon, the LPU (Language Processing Unit), specifically for running inference on large language models. Unlike GPUs, which are general-purpose, the LPU is a fixed-function chip optimized for the sequential nature of autoregressive token generation. This gives Groq consistently high throughput, around 800-1,000 tokens per second on models like Llama 3.3 70B and GPT-OSS 20B.

The cloud API (GroqCloud) is OpenAI SDK-compatible. Supported model families include Llama 3.1/3.3, Qwen3, GPT-OSS, plus Whisper for speech-to-text and several TTS voices. Pricing is straightforward per-token with no subscriptions. Llama 3.1 8B runs at $0.05/M input tokens, while Llama 3.3 70B is $0.59/M input. Prompt caching and the batch API each cut costs by 50%.

Groq offers a free API key with rate limits for getting started. For enterprise, GroqRack provides on-premises deployment of the same LPU hardware. The platform is SOC 2, GDPR, and HIPAA compliant.

The main trade-off is model selection. Because Groq runs on custom hardware, only models that have been ported to the LPU are available. The catalog is smaller than GPU-based providers like Together AI or Fireworks, though it covers the most popular open-weight models. If you need a niche or fine-tuned model, Groq may not support it.

Is your product missing?

Add it here →