Icon for CheapestInference

CheapestInference

Flat-rate unlimited inference on open-weight models (Kimi K2.6, GLM 4.7, MiniMax M2.5) sold in 8-hour daily windows

CheapestInference sells flat-rate access to three open-weight models (Kimi K2.6, GLM 4.7, MiniMax M2.5) instead of per-token billing. Access is sold in 8-hour daily windows (Asia, Europe, or Americas time zones), and a subscription runs unlimited tokens within its window with no token counter or rate limit.

The key constraint is one concurrent request per subscription, so throughput is unlimited sequentially but not in parallel. The API is OpenAI SDK compatible. Prompts and completions are processed in memory and not stored, and the company states it does not train on user data.

An x402 endpoint lets autonomous agents discover pricing, pay with USDC on Base, and provision their own API key without a human. Plans and API keys can be created programmatically.

Pricing: Monthly subscriptions

Pricing Subscription, from $33.15/mo
Screenshot of CheapestInference webpage

What CheapestInference is

CheapestInference offers a flat monthly price for unlimited-token access to open-weight models, an alternative to the per-token pricing most inference APIs use. It currently serves Kimi K2.6, GLM 4.7, and MiniMax M2.5.

How the pricing works

Access is divided into three 8-hour daily windows by region (Asia 00-08 UTC, Europe 08-16 UTC, Americas 16-24 UTC). A subscription covers one window, and multiple windows can be combined for fuller coverage. Within a window there is no token counter and no rate limit. The listed price is $33.15/month per window (billed annually at $397.80, a 15% discount), verified June 2026.

The trade-off

Each subscription allows one concurrent request. Tokens are unlimited sequentially, so the model suits steady single-stream workloads more than highly parallel ones. Teams needing concurrency would subscribe to multiple plans or use a per-token provider.

Compatibility and payments

The API is OpenAI SDK compatible. Plans and keys are fully manageable via API, and an x402 endpoint lets agents self-subscribe and pay with USDC on Base. Inference is processed in memory and not stored, and the provider states it does not train on user data.

Is your product missing?

Add it here →