Icon for Ollama

Ollama

Open Source

Run large language models locally with a single command

Ollama makes it easy to run open-source LLMs locally on your machine. It handles model downloading, quantization, and serving with an OpenAI-compatible API. Supports Llama, Mistral, Gemma, Phi, and many other model families. Popular for local development, testing, and offline AI applications.

Pricing: Free

Hosting Cloud + Self-hosted
Pricing Freemium, from Free (open-source)
HQ 🇺🇸 United States
Founded 2023
License MIT
GitHub 172,291 stars
Screenshot of Ollama webpage

What is Ollama?

Ollama is a tool for running open-source large language models locally on macOS, Linux, and Windows. It handles model downloading, quantization, GPU acceleration, and serves models behind an OpenAI-compatible API on localhost:11434. A single ollama run llama3 command pulls the model and starts an interactive session, which is most of why it caught on.

How it works

Under the hood Ollama wraps llama.cpp as its inference engine and adds model packaging (the Modelfile format, similar in spirit to a Dockerfile), a CLI, and an HTTP API. Models live in a local cache, downloaded on first use. Quantized GGUF weights mean a 7B model fits comfortably in 8 GB of RAM, and larger models scale with available VRAM.

Key features

  • One-command install and one-command model pulls
  • Supports Llama, Mistral, Qwen, Gemma, Phi, DeepSeek, and many other model families from a curated library
  • OpenAI-compatible REST API, plus official Python and JavaScript clients
  • Custom Modelfile definitions for prompts, parameters, and system messages
  • Works on Apple Silicon, Nvidia GPUs, AMD GPUs, and CPU-only setups

Pricing

Free and open source under the MIT license. No paid tier.

Who should use it

Developers prototyping locally, teams that need offline inference, anyone who wants to test a model before committing to a hosted API, and those building applications where data residency or cost rules out third-party providers. For sustained production serving with high concurrency, vLLM or llama.cpp directly tend to be a better fit. For desktop GUI workflows, LM Studio or Jan are closer alternatives.

Is your product missing?

Add it here →