llama.cpp
LLM inference in C/C++ with broad hardware support and aggressive quantization
llama.cpp is a C/C++ inference engine for large language models, designed to run efficiently on CPUs, GPUs, and Apple Silicon. It pioneered the GGUF quantization format and the broader local-LLM tooling space. Supports most popular open-source models including Llama, Mistral, Qwen, Gemma, and Phi. Used as the underlying engine by Ollama, LM Studio, GPT4All, and many other local-LLM tools. Maintained by the ggml-org community.
Pricing: Free
llama.cpp Alternatives
Explore 31 products in the Frameworks & Stacks category. View all llama.cpp alternatives.
Mastra
TypeScript-first AI framework for building agents, RAG pipelines, and workflows
vLLM
High-throughput LLM inference engine with PagedAttention for efficient GPU memory usage
Ollama
Run large language models locally with a single command
Dify
Easily build and operate generative AI applications. Create Assistants API and GPTs based on any LLMs.
Work on llama.cpp? Feature it at the top of Frameworks & Stacks.
Is your product missing?