How much does CheapestInference cost?

CheapestInference pricing: Monthly subscriptions.

Does CheapestInference have a free tier?

No, CheapestInference does not currently offer a free tier.

Is CheapestInference open source?

No, CheapestInference is not open source.

Home / Inference APIs / CheapestInference

CheapestInference

Flat-rate unlimited inference on open-weight models, sold in daily 8-hour windows

CheapestInference sells flat-rate access to open-weight models instead of per-token billing. Models are grouped into two pools: Frontier (Kimi K2.7, Kimi K2.6, GLM 5.2, MiniMax M3, all with 1M-token context) and Core (DeepSeek V4 Flash, MiMo v2.5). Access is sold in daily 8-hour windows, and a subscription runs unlimited tokens within its window with one concurrent request.

Core starts at $6.99/mo per daily window and Frontier at $52/mo; reserving all three daily windows gives 24/7 access. The API is OpenAI and Anthropic SDK compatible, so it works with Claude Code, and an x402 endpoint lets autonomous agents subscribe and pay with USDC on Base without a human. The pool lineup evolves as new open-weight models ship.

Pricing: Monthly subscriptions

Pricing Subscription, from $6.99/mo (Core pool)

Visit website →

Pricing

What CheapestInference is

CheapestInference offers a flat monthly price for unlimited-token access to open-weight models, an alternative to the per-token pricing most inference APIs use. It currently serves Kimi K2.6, GLM 4.7, and MiniMax M2.5.

How the pricing works

Access is divided into three 8-hour daily windows by region (Asia 00-08 UTC, Europe 08-16 UTC, Americas 16-24 UTC). A subscription covers one window, and multiple windows can be combined for fuller coverage. Within a window there is no token counter and no rate limit. The listed price is $33.15/month per window (billed annually at $397.80, a 15% discount), verified June 2026.

The trade-off

Each subscription allows one concurrent request. Tokens are unlimited sequentially, so the model suits steady single-stream workloads more than highly parallel ones. Teams needing concurrency would subscribe to multiple plans or use a per-token provider.

Compatibility and payments

The API is OpenAI SDK compatible. Plans and keys are fully manageable via API, and an x402 endpoint lets agents self-subscribe and pay with USDC on Base. Inference is processed in memory and not stored, and the provider states it does not train on user data.