Home / Frameworks & Stacks / llama.cpp

llama.cpp

Open Source Free Trial

LLM inference in C/C++ with broad hardware support and aggressive quantization

llama.cpp is a C/C++ inference engine for large language models, designed to run efficiently on CPUs, GPUs, and Apple Silicon. It pioneered the GGUF quantization format and the broader local-LLM tooling space. Supports most popular open-source models including Llama, Mistral, Qwen, Gemma, and Phi. Used as the underlying engine by Ollama, LM Studio, GPT4All, and many other local-LLM tools. Maintained by the ggml-org community.

Pricing: Free

Hosting Self-hosted

License MIT

GitHub 112,998 stars

Visit website →

Screenshot of llama.cpp webpage

llama.cpp Alternatives

Explore 32 products in the Frameworks & Stacks category. View all llama.cpp alternatives.

Atomic Chat

Open-source local AI chat app for running open-weight models on desktop and mobile

Open Source Free Trial

LiteLLM

Unified OpenAI-compatible proxy for 100+ LLM providers with cost tracking and load balancing

Open Source Free Trial

Mastra

TypeScript-first AI framework for building agents, RAG pipelines, and workflows

Open Source Free Trial

vLLM

High-throughput LLM inference engine with PagedAttention for efficient GPU memory usage

Open Source Free Trial From Free (open-source)

LM Studio

Desktop app for discovering, downloading, and running local LLMs with a built-in API server

Free Trial

Ollama

Run large language models locally with a single command

Open Source From Free (open-source)

View all Frameworks & Stacks tools ≫

Work on llama.cpp? Feature it at the top of Frameworks & Stacks.

Is your product missing?

Add it here →