Choosing a Vector Database for RAG
/ Arvid Andersson
Retrieval-augmented generation (RAG) is the standard pattern for grounding LLM responses in your own data. At the core of most RAG pipelines is a vector database that stores embeddings and handles similarity search. The options range from managed cloud services to lightweight libraries you embed in your application. This post covers the trade-offs.
Why a dedicated vector database?
You can add vector search to PostgreSQL with pgvector, or use built-in vector features in Elasticsearch. For many applications, that's enough. A dedicated vector database becomes valuable when you need to handle millions of high-dimensional vectors with consistent low-latency queries, or when you need features like filtering, multi-tenancy, and hybrid search (combining vector similarity with keyword matching) at scale.
Managed cloud services
Pinecone is the most established managed vector database. It handles indexing, scaling, and infrastructure. You interact through an API, and Pinecone manages the rest. It supports metadata filtering, namespaces for multi-tenancy, and scales to billions of vectors. The pricing is based on pod type and storage, which can get expensive at scale but means zero operational overhead.
Zilliz offers a managed cloud version of Milvus (the open-source vector database). If you want the flexibility of Milvus without managing the infrastructure, Zilliz provides that. It supports the same feature set: multiple index types, hybrid search, and GPU-accelerated indexing.
Turbopuffer is a newer entrant focused on speed and cost efficiency. It uses a tiered storage architecture that keeps hot data in memory and cold data on disk, delivering low query latency at a fraction of the cost of keeping everything in RAM. Worth evaluating if query performance and cost per vector are your primary constraints.
Open-source, self-hosted
Qdrant is written in Rust and designed for production workloads. It supports filtering during search (not just post-filtering), which matters for multi-tenant applications where you need to scope results by user or organization. Qdrant also offers a managed cloud service, so you can start self-hosted and move to managed later.
Weaviate stands out for its built-in vectorization modules. Instead of generating embeddings externally and storing them, you can configure Weaviate to call an embedding model (OpenAI, Cohere, HuggingFace) as part of the ingestion pipeline. It also supports hybrid search combining vector and BM25 keyword scoring. Available as open-source or managed cloud.
Milvus is one of the oldest purpose-built vector databases. It supports multiple index types (IVF, HNSW, DiskANN), GPU-accelerated search, and scales to billions of vectors. The trade-off is operational complexity, as Milvus has several components (etcd, MinIO, message queue) that need to be managed in production.
Lightweight and embedded
Chroma is designed for simplicity. It runs in-process (no separate server needed), making it ideal for prototyping, local development, and small-scale RAG applications. You can get started with a few lines of Python. For production, Chroma offers a client-server mode and a cloud service.
LanceDB takes a serverless approach built on the Lance columnar format. It runs embedded in your application (Python, JavaScript, Rust) with no server to manage. Data is stored in object storage (S3, GCS), making it cost-effective for large datasets. LanceDB is a good fit for applications where you want vector search without running a separate database service.
PostgreSQL-based
pgvecto.rs is a PostgreSQL extension (written in Rust) that adds vector similarity search to your existing Postgres database. If your data is already in PostgreSQL, this avoids adding another database to your stack. It supports HNSW and IVF indexes with good performance up to a few million vectors. For many RAG applications, especially those with moderate scale, keeping vectors in the same database as the rest of your data is the simplest architecture.
Comparison
| Database | Type | Best for | Open source |
|---|---|---|---|
| Pinecone | Managed cloud | Zero-ops, large scale | No |
| Qdrant | Self-hosted / managed | Filtering, multi-tenancy | Yes (Apache 2.0) |
| Weaviate | Self-hosted / managed | Built-in vectorization, hybrid search | Yes (BSD-3) |
| Milvus | Self-hosted | GPU-accelerated, billion-scale | Yes (Apache 2.0) |
| Chroma | Embedded / managed | Prototyping, small-scale RAG | Yes (Apache 2.0) |
| LanceDB | Embedded / serverless | Serverless, object storage | Yes (Apache 2.0) |
| Turbopuffer | Managed cloud | Speed + cost efficiency | No |
| pgvecto.rs | PostgreSQL extension | Existing Postgres users | Yes (Apache 2.0) |
How to choose
If you're already on PostgreSQL and your dataset is under a few million vectors, start with pgvector or pgvecto.rs. You avoid adding infrastructure and keep your data in one place. For prototyping and local development, Chroma gets you started fastest.
For production RAG at scale, the choice depends on your operational preferences. Pinecone if you want fully managed with no infrastructure to think about. Qdrant or Weaviate if you want open-source with the option to self-host or use managed cloud. Milvus if you need GPU acceleration or have very large datasets.
The embedding model you choose matters as much as the database. Most vector databases are agnostic to the embedding model, so you can switch models without changing your database. Focus on getting the retrieval quality right first, then optimize the infrastructure.
Browse all Vector database tools on Infrabase.ai
Is your product missing? 👀 Add it here →