≫ Home / Vector databases / Choosing a Vector Database

Abstract network of connected data nodes

Choosing a Vector Database for RAG

November 19, 2025 / Arvid Andersson

Retrieval-augmented generation (RAG) is the standard pattern for grounding LLM responses in your own data. At the core of most RAG pipelines is a vector database that stores embeddings and handles similarity search. The options range from managed cloud services to lightweight libraries you embed in your application. This post covers the trade-offs.

Why a dedicated vector database?

You can add vector search to PostgreSQL with pgvector, or use built-in vector features in Elasticsearch. For many applications, that's enough. A dedicated vector database becomes valuable when you need to handle millions of high-dimensional vectors with consistent low-latency queries, or when you need features like filtering, multi-tenancy, and hybrid search (combining vector similarity with keyword matching) at scale.

Managed cloud services

Pinecone is the most established managed vector database. It handles indexing, scaling, and infrastructure. You interact through an API, and Pinecone manages the rest. It supports metadata filtering, namespaces for multi-tenancy, and scales to billions of vectors. The pricing is based on pod type and storage, which can get expensive at scale but means zero operational overhead.

Zilliz offers a managed cloud version of Milvus (the open-source vector database). If you want the flexibility of Milvus without managing the infrastructure, Zilliz provides that. It supports the same feature set: multiple index types, hybrid search, and GPU-accelerated indexing.

Turbopuffer is a newer entrant focused on speed and cost efficiency. It uses a tiered storage architecture that keeps hot data in memory and cold data on disk, delivering low query latency at a fraction of the cost of keeping everything in RAM. Worth evaluating if query performance and cost per vector are your primary constraints.

Open-source, self-hosted

Qdrant is written in Rust and designed for production workloads. It supports filtering during search (not just post-filtering), which matters for multi-tenant applications where you need to scope results by user or organization. Qdrant also offers a managed cloud service, so you can start self-hosted and move to managed later.

Weaviate stands out for its built-in vectorization modules. Instead of generating embeddings externally and storing them, you can configure Weaviate to call an embedding model (OpenAI, Cohere, HuggingFace) as part of the ingestion pipeline. It also supports hybrid search combining vector and BM25 keyword scoring. Available as open-source or managed cloud.

Milvus is one of the oldest purpose-built vector databases. It supports multiple index types (IVF, HNSW, DiskANN), GPU-accelerated search, and scales to billions of vectors. The trade-off is operational complexity, as Milvus has several components (etcd, MinIO, message queue) that need to be managed in production.

Lightweight and embedded

Chroma is designed for simplicity. It runs in-process (no separate server needed), making it ideal for prototyping, local development, and small-scale RAG applications. You can get started with a few lines of Python. For production, Chroma offers a client-server mode and a cloud service.

LanceDB takes a serverless approach built on the Lance columnar format. It runs embedded in your application (Python, JavaScript, Rust) with no server to manage. Data is stored in object storage (S3, GCS), making it cost-effective for large datasets. LanceDB is a good fit for applications where you want vector search without running a separate database service.

PostgreSQL-based

pgvecto.rs is a PostgreSQL extension (written in Rust) that adds vector similarity search to your existing Postgres database. If your data is already in PostgreSQL, this avoids adding another database to your stack. It supports HNSW and IVF indexes with good performance up to a few million vectors. For many RAG applications, especially those with moderate scale, keeping vectors in the same database as the rest of your data is the simplest architecture.

Comparison

Database	Type	Best for	Open source
Pinecone	Managed cloud	Zero-ops, large scale	No
Qdrant	Self-hosted / managed	Filtering, multi-tenancy	Yes (Apache 2.0)
Weaviate	Self-hosted / managed	Built-in vectorization, hybrid search	Yes (BSD-3)
Milvus	Self-hosted	GPU-accelerated, billion-scale	Yes (Apache 2.0)
Chroma	Embedded / managed	Prototyping, small-scale RAG	Yes (Apache 2.0)
LanceDB	Embedded / serverless	Serverless, object storage	Yes (Apache 2.0)
Turbopuffer	Managed cloud	Speed + cost efficiency	No
pgvecto.rs	PostgreSQL extension	Existing Postgres users	Yes (Apache 2.0)

How to choose

If you're already on PostgreSQL and your dataset is under a few million vectors, start with pgvector or pgvecto.rs. You avoid adding infrastructure and keep your data in one place. For prototyping and local development, Chroma gets you started fastest.

For production RAG at scale, the choice depends on your operational preferences. Pinecone if you want fully managed with no infrastructure to think about. Qdrant or Weaviate if you want open-source with the option to self-host or use managed cloud. Milvus if you need GPU acceleration or have very large datasets.

The embedding model you choose matters as much as the database. Most vector databases are agnostic to the embedding model, so you can switch models without changing your database. Focus on getting the retrieval quality right first, then optimize the infrastructure.

Browse all Vector database tools on Infrabase.ai

Is your product missing?

Add it here →