Icon for llama.cpp

llama.cpp

Open Source Free Trial

LLM inference in C/C++ with broad hardware support and aggressive quantization

llama.cpp is a C/C++ inference engine for large language models, designed to run efficiently on CPUs, GPUs, and Apple Silicon. It pioneered the GGUF quantization format and the broader local-LLM tooling space. Supports most popular open-source models including Llama, Mistral, Qwen, Gemma, and Phi. Used as the underlying engine by Ollama, LM Studio, GPT4All, and many other local-LLM tools. Maintained by the ggml-org community.

Pricing: Free

Hosting Self-hosted
License MIT
GitHub 112,998 stars
Screenshot of llama.cpp webpage

Is your product missing?

Add it here →