CheapestInference
Flat-rate unlimited inference on open-weight models (Kimi K2.6, GLM 4.7, MiniMax M2.5) sold in 8-hour daily windows
CheapestInference sells flat-rate access to three open-weight models (Kimi K2.6, GLM 4.7, MiniMax M2.5) instead of per-token billing. Access is sold in 8-hour daily windows (Asia, Europe, or Americas time zones), and a subscription runs unlimited tokens within its window with no token counter or rate limit.
The key constraint is one concurrent request per subscription, so throughput is unlimited sequentially but not in parallel. The API is OpenAI SDK compatible. Prompts and completions are processed in memory and not stored, and the company states it does not train on user data.
An x402 endpoint lets autonomous agents discover pricing, pay with USDC on Base, and provision their own API key without a human. Plans and API keys can be created programmatically.
Pricing: Monthly subscriptions
What CheapestInference is
CheapestInference offers a flat monthly price for unlimited-token access to open-weight models, an alternative to the per-token pricing most inference APIs use. It currently serves Kimi K2.6, GLM 4.7, and MiniMax M2.5.
How the pricing works
Access is divided into three 8-hour daily windows by region (Asia 00-08 UTC, Europe 08-16 UTC, Americas 16-24 UTC). A subscription covers one window, and multiple windows can be combined for fuller coverage. Within a window there is no token counter and no rate limit. The listed price is $33.15/month per window (billed annually at $397.80, a 15% discount), verified June 2026.
The trade-off
Each subscription allows one concurrent request. Tokens are unlimited sequentially, so the model suits steady single-stream workloads more than highly parallel ones. Teams needing concurrency would subscribe to multiple plans or use a per-token provider.
Compatibility and payments
The API is OpenAI SDK compatible. Plans and keys are fully manageable via API, and an x402 endpoint lets agents self-subscribe and pay with USDC on Base. Inference is processed in memory and not stored, and the provider states it does not train on user data.
CheapestInference Alternatives
Explore 68 products in the Inference APIs category. View all CheapestInference alternatives.
Genesis Cloud
European GPU cloud for AI training and inference powered by 100% green energy
Nebius
Full-stack AI cloud with GPU infrastructure for training and inference
Lambda
GPU cloud for AI training and inference with on-demand and cluster options
CoreWeave
GPU cloud infrastructure built for large-scale AI training and inference workloads
Is your product missing?