Best Vector Database 2026: Pinecone, Weaviate, Qdrant, Milvus

Best Vector Database: Overview
Quick Comparison Table
Pinecone
Weaviate
Qdrant
Milvus
ChromaDB
PostgreSQL pgvector
Distance Metrics and Search
Performance Benchmarks
Pricing Analysis
Migration and Interoperability
Selection Guide
FAQ
Related Resources
Sources

Best Vector Database: Overview

Vector databases store embeddings (dense vectors representing text, images, documents) for semantic search. Essential for RAG, recommendations, image retrieval, anomaly detection.

Pinecone: Managed, serverless, simple, pricey. Weaviate, Qdrant, Milvus: Self-host, more control, more headaches. ChromaDB: Lightweight prototyping. pgvector: Postgres extension, straightforward, not at scale.

Pick by what matters: Pinecone handles 500M-1B vectors without thinking. Qdrant hits sub-10ms latency. Milvus goes to trillion. pgvector for small teams.

Quick Comparison Table

Database	Hosting	Vectors @ 1ms Latency	Starting Cost	Best For
Pinecone	SaaS	500M-1B	Free tier / usage-based	Scale, ease of use, managed
Weaviate	Self-host	50M-500M	$500-$5K/mo	GraphQL, multimodal, hybrid search
Qdrant	Both	100M	$100/mo (cloud)	Latency, filtering, balanced
Milvus	Self-host	1B+	$3K-$7K/mo	Scale, cost-optimized, trillion-scale
ChromaDB	Embedded	10M	Free	Development, prototyping
pgvector	Self-host	50M	$25-$100/mo	Postgres users, transactions

Data from vendor benchmarks, official documentation, and DeployBase testing (March 2026).

Pinecone

Managed SaaS. Zero ops. Serverless. REST API, Python SDK. Production-used.

Strengths:

Ease: Create → insert → query. No infra. No DevOps. Managed backups, scaling, monitoring, failover.
Scale: 500M-1B vectors, multiple indexes. Scales transparently. Pay-as-you-go (with minimums).
Hybrid Search (Pinecone 3.0). Supports sparse-dense hybrid search (BM25 keyword search + embedding similarity). Critical for RAG accuracy (keyword precision + semantic recall).
Metadata Filtering. Query with arbitrary metadata filters: {user_id: "user_42", doc_type: "contract", created_after: "2026-01-01"}. No separate filtering pipeline.
Availability SLA. 99.95% uptime guarantee. Production support available. Multi-region deployment for disaster recovery.
Serverless Model. No capacity planning. Spike in traffic? Pinecone scales automatically. No request throttling on Starter+ tiers.

Weaknesses

Cost. Serverless pricing starts free but scales quickly. At production volume (100M+ vectors, high QPS), monthly spend reaches $500-$3,000+. Adds up for billion-scale workloads.
Latency. P99 latency typically 50-200ms. Acceptable for batch search, not <10ms real-time use cases.
Vendor Lock-in. Proprietary API. Exporting vectors requires dump-to-file + custom pipeline. Switching to another database is expensive in engineering time.
Limited Customization. Cannot modify indexing algorithms (HNSW, IVF variants), distance metrics, or hardware allocation. Black box.
Pricing Opacity. Costs scale non-linearly. Different regions have different pricing. Metadata storage costs extra. Easy to hit unexpected bills.

Pricing Detail

Pinecone uses serverless pricing based on reads, writes, and storage. As of March 2026:

Tier	Base Cost	Storage	Notes
Free	$0/mo	2GB (~250K vectors)	Development and prototyping
Serverless (Pay-as-you-go)	Usage-based	Unlimited	$0.04/1M reads, $2/1M writes, $0.33/GB/mo storage
Enterprise	Custom	Custom	Reserved capacity, SLA guarantees

For 100M vectors (1536-dim, ~600GB):

Storage: ~$200/month
Reads (10M/month): ~$0.40
Total estimate: ~$200-$400/month depending on query volume

For 1B vectors at scale:

Storage: ~$2,000/month
High-QPS deployments require enterprise reserved capacity pricing

Metadata and hybrid search (BM25) incur additional charges. List prices are negotiable for large commitments.

Use Cases

SaaS products embedding RAG. Need billion-scale without ops overhead. Absorb the cost.
Rapid prototyping. Create index in minutes, not days. Focus on product, not infrastructure.
Multi-tenant systems. Pinecone namespaces isolate tenant data elegantly. Security/compliance built-in.
Search applications. Semantic product search, document retrieval. Hybrid search improves relevance.

Weaviate

Overview

Open-source vector database with optional managed cloud. Supports vector and structured data. GraphQL API. Horizontal scaling via Kubernetes.

Strengths

Hybrid Search. Combines vector similarity with traditional structured search. GraphQL queries are expressive: find documents with embedding similarity + metadata filters + text search in one query.
Multimodal Support. Native support for image, audio, text embeddings in single database. Cross-modal search (find similar images to a text query).
Flexible Deployment. Self-host on Kubernetes, or use Weaviate Cloud Services (managed SaaS). Choose at any time.
Custom Models. Integrate custom embedding models or classification models. Not locked to specific embedding API.
Active Community. Open-source, 10K+ GitHub stars. Frequent updates, rich ecosystem. Slack community support.
GraphQL API. GraphQL is powerful but adds cognitive overhead vs REST. Familiar to frontend teams, less so to DevOps.

Weaknesses

Operational Complexity. Self-hosted deployment requires Kubernetes expertise. Backup strategy, scaling logic, monitoring all fall on team. Kubernetes is not simple.
Latency. Self-hosted Weaviate on single node: 100-500ms for 10M vectors. Horizontal scaling helps but adds complexity. Not optimized for <10ms latency.
Memory Overhead. Stores all vectors in memory for fast search. 100M vectors @ 1536 dims = ~600GB RAM (single node). Multi-node setups expensive.
GraphQL Overhead. GraphQL queries are powerful but slower than direct API calls. ~10-20% latency overhead vs REST due to parsing and execution.

Pricing

Self-Hosted: Free (pay for infrastructure only).

Single node: $500-$2K/month (cloud VM, storage, network, operator time). Kubernetes cluster (3 nodes, HA): $3K-$10K/month (nodes, persistent volumes, networking, ops labor).

Weaviate Cloud Services (managed): $250/month starting tier (10M vectors). Scales to $2K-$5K/month for 100M vectors.

Use Cases

Complex queries mixing vectors and metadata. "Find similar academic papers tagged 'machine learning' published after 2025."
Multimodal search. Text + image embeddings in one system. Cross-modal queries.
Teams with strong DevOps. Self-hosting is acceptable operational burden.
GraphQL-first applications. Teams comfortable with GraphQL (SPA frontends, Node.js backends).

Qdrant

Overview

Lightweight, fast vector database written in Rust. Explicitly optimized for latency. Self-hosted or managed cloud. Apache 2.0 open-source license.

Strengths

Speed. Latency optimized. P95 <10ms on 100M vectors with proper hardware. Best-in-class latency among all databases.
Filtering. Complex metadata filters (nested, ranges, arrays, full-text on metadata). Not just simple key-value matching.
Resource Efficiency. Lower CPU/RAM footprint than Weaviate or Milvus. SIMD-optimized search. Rust architecture (no GC pauses).
gRPC API. Binary protocol is 2-3x faster than REST/GraphQL. Low-latency over network.
Mature Distributed Mode. Newer than Milvus but production-ready. Raft consensus for HA.

Weaknesses

Smaller Ecosystem. Fewer integrations vs Pinecone/Weaviate. Community is smaller (less Stack Overflow help).
Single-Node Sweet Spot. Horizontal scaling exists but newer/less mature. Best performance on single, powerful node (not distributed). For multi-node, Milvus is more mature.
Disk-Backed Index. Stores index on disk (not pure in-memory). Slightly slower than RAM-only at extreme scale, but much cheaper at scale.

Pricing

Self-Hosted: Free (infrastructure cost only).

Single node: $2K-$3K/month (compute) or $20K upfront to buy server. Multi-node cluster: $5K-$15K/month.

Qdrant Cloud: $100/month starting tier. Scales to $500-$1K/month for 100M vectors.

Use Cases

Real-time search with <50ms SLA. Product search, recommendation engines, chatbot retrieval.
Cost-optimized self-hosting. Lower resource footprint = lower cloud bills than Weaviate/Milvus.
Conversational AI. Fast retrieval enables responsive chatbot interactions.
High-QPS serving. Qdrant handles 10K-50K QPS on proper hardware.

Milvus

Overview

Open-source vector database optimized for massive scale. Horizontal scaling via Kubernetes. Used internally by Alibaba for trillion-scale search. Apache 2.0 license.

Strengths

Massive Scale. Handles 1B+ vectors. Designed for trillion-scale workloads. Multiple indexes (IVF, HNSW, DiskANN). Fine-tune for use case.
Cost Efficiency. Open-source + commodity hardware. Cost per vector negligible at extreme scale. Self-hosting is cheapest at 1B+ vectors.
High Throughput. Serves 100K+ QPS on large clusters. Built for data center scale.
Index Variety. IVF (coarse + fine quantization), HNSW (graph-based), DiskANN (disk-friendly). Choose index based on latency/recall trade-off.

Weaknesses

Operational Complexity. Kubernetes required. Assumes strong DevOps team. Scaling, backup, monitoring are manual.
Latency Variability. Not optimized for low-latency <10ms search. P50 100ms, P99 500ms+ on distributed clusters. Eventual consistency (not strong).
Learning Curve. Complex distributed system. Steep learning curve for teams new to Kubernetes.
Consistency Model. Eventual consistency. Not ACID. Suitable for search, not transactional systems.

Pricing

Self-Hosted: Free (infrastructure cost).

3-node cluster: $3K-$5K/month compute, $500-$2K storage, $200-$500 network = $3.7K-$7.5K/month.

At 1B vectors, cost per vector: $0.000004 (negligible). Breaks even vs Pinecone at 50M+ vectors.

Use Cases

Trillion-scale applications. 1B+ documents (news archives, legal databases, research corpus, scientific papers).
Cost-optimized data centers. Teams with existing Kubernetes infrastructure.
Bulk ingestion pipelines. Insert millions of vectors per day. Milvus handles high throughput.

ChromaDB

Overview

Lightweight, embedded vector database. Designed for LLM applications. No server to manage. Python API.

Strengths

Simplicity. Pip install, use in Python. No infrastructure. Works on laptop.
Development Speed. Perfect for prototyping RAG systems. Up and running in minutes.
Free. Open-source, no licensing cost.
Default Embeddings. Bundles sentence-transformers; generates embeddings on the fly. No external embedding API needed.

Weaknesses

Scale Ceiling. 10M vectors max. Beyond that, performance degrades. Single process/thread bottleneck.
No Network API. Embedded only. Cannot be shared across services without containerization.
Single-Node Only. No horizontal scaling.
Limited Filtering. Basic metadata filtering. No complex nested filters or full-text search on metadata.

Pricing

Free. Install: pip install chromadb.

Use Cases

LLM app prototyping. Build RAG MVP in <1 hour.
Small teams. <1M vectors, academic/hobbyist projects.
Offline applications. No network; embedded in app.

PostgreSQL pgvector

Overview

PostgreSQL extension adding vector type and similarity search. Use existing Postgres infrastructure. Open-source.

Strengths

Simplicity. If app already uses Postgres, add vectors without new database. Familiar SQL interface.
ACID Transactions. Strong consistency. Transactions, constraints, triggers work with vectors.
Ecosystem. Use Postgres tools: point-in-time recovery, replication, monitoring (DataGrip, pgAdmin).
Cost. No new vendor. Postgres hosting: $10-$100/month on AWS RDS.

Weaknesses

Performance. pgvector is not optimized for large-scale search. 10M vectors: 100-500ms queries. 100M vectors: timeouts.
Performance Ceiling. IVFFlat provides approximate search but slower than purpose-built databases. HNSW support added in pgvector 0.5+ and is more performant, but still lags dedicated vector DBs at scale.
Horizontal Scaling. Postgres sharding is manual. No built-in distributed vector search.
Memory Overhead. Vector index stored in memory; 100M vectors = 600GB RAM.

Pricing

AWS RDS Postgres (managed): $25-$100/month small instances, up to $500+/month for large. Or on-prem: hardware cost $5K-$50K upfront.

At $100/month, can handle ~50M vectors comfortably.

Use Cases

Small-scale semantic search. <50M vectors, latency not critical.
Existing Postgres users. Avoid learning new database. Add vectors to existing Postgres.
Transactional consistency required. ACID guarantees matter (rare for search, common for inventory systems).

Distance Metrics and Search

Vector Distance Metrics

Cosine Similarity: Angle between vectors. Scale-invariant (useful for embeddings). Default for semantic search.
Euclidean Distance: Straight-line distance. Sensitive to magnitude. Good for coordinates.
Dot Product: Inner product. Fast computation, scale-dependent. Useful for normalized embeddings.
Hamming Distance: Bit-level comparison. For binary vectors, extremely fast.

Most vector databases default to cosine. Pinecone, Qdrant, Weaviate all support multiple metrics.

Index Algorithms

IVF (Inverted File): Coarse + fine quantization. Fast for approximate search. Large vectors divided into partitions.
HNSW (Hierarchical Navigable Small World): Graph-based. Lower latency. More memory overhead. Good for <100M vectors.
DiskANN: Disk-friendly. Large vectors fit on disk. Slower but scales to billions.
LSH (Locality-Sensitive Hashing): Hash-based. Fast but less accurate. Rarely used now.

Choice depends on scale, latency, and available RAM. Qdrant and Milvus let developers choose; Pinecone decides for developers.

Performance Benchmarks

Latency (P95, single query, 100M vectors)

Database	Latency
Qdrant (HNSW)	8ms
Milvus (IVF optimized)	15ms
Weaviate (GraphQL)	50ms
Pinecone	100ms
pgvector (exact)	200ms
ChromaDB	N/A (max 10M vectors)

Qdrant is fastest. Pinecone acceptable for most use cases. pgvector slow for large vectors.

Throughput (queries per second, 100M vectors, batch search)

Database	QPS
Milvus	100K
Qdrant	50K
Weaviate	10K
Pinecone	5K
pgvector	<1K

Milvus handles massive throughput. pgvector is single-threaded bottleneck.

Ingestion Speed (vectors per second, bulk insert)

Database	Vectors/sec
Milvus	100K
Qdrant	50K
Weaviate	10K
Pinecone	5K (throttled)
pgvector	1K

Bulk ingestion: Milvus wins. Pinecone throttles batch uploads (rate limiting).

Pricing Analysis

Cost Per Million Vectors/Month

Database	Cost	Notes
Pinecone Standard	$0.12-$0.24	SaaS, managed
Qdrant Cloud	$0.05-$0.10	Managed, minimum $100/mo
Weaviate Cloud	$0.03-$0.08	Managed, minimum $250/mo
Milvus Self	$0.001-$0.01	Infrastructure only
pgvector	$0.02-$0.05	RDS cost
ChromaDB	$0	Free

Breakeven analysis at 100M vectors:

Pinecone Standard: $12K-$24K/year
Qdrant Cloud: $6K-$12K/year
Weaviate Cloud: $3.6K-$9.6K/year
Milvus Self: $1.2K-$12K/year (ops cost 10-20% of compute)
pgvector: $2.4K-$6K/year

Milvus is cheapest at scale, requires DevOps. Pinecone is middle ground (cost + ease).

Migration and Interoperability

Exporting from One Database to Another

Vectors + metadata are portable. Migration steps:

Dump vectors + metadata from source (SQL query or export API)
Transform to target schema (renaming fields, reformatting)
Bulk insert into target database

Timeline: 1-2 days engineering for 100M vectors. No automated tool; custom scripts.

Vector Embedding Compatibility

Embeddings are deterministic (same text → same vector). Changing embedding model requires re-embedding entire dataset.

Example: 100M vectors embedded with OpenAI text-embedding-3-small (1536 dims). Switch to open-source nomic-embed-text-v1.5 (768 dims)? Must re-embed 100M vectors (cost: $50-$100 in API calls or 24 hours on GPU).

Selection Guide

For Maximum Scale, Lowest Cost

Use Milvus. Handles 1B+ vectors. Cost per vector approaches zero at scale. Requires Kubernetes + DevOps team.

For Ease of Use + Scale

Use Pinecone. Serverless scaling, no ops. Costs 10-100x Milvus but worth it for small teams. Managed backups, monitoring, HA.

For Real-Time <10ms Latency

Use Qdrant. P95 <10ms, HNSW index optimized. Balanced cost/performance. Self-host on single powerful node or use Qdrant Cloud.

For Existing Postgres Users

Use pgvector. Use existing infrastructure. Max ~50M vectors before latency issues.

For Hybrid Search (Keyword + Vector)

Use Weaviate. GraphQL queries combine BM25 + embeddings. Better relevance for RAG than vector-only search. Higher operational burden.

For Development/Prototyping

Use ChromaDB. Free, embedded, simple. Migrate to production database (Pinecone/Qdrant) later.

For Multimodal (Image + Text)

Use Weaviate or Qdrant. Both support image embeddings natively. Pinecone requires workarounds.

FAQ

Can I migrate between vector databases easily?

Yes, technically. Vectors are portable (same embedding = same vector). But process is manual: dump, transform, import. 1-2 days engineering per 100M vectors.

How do I choose embedding dimensions?

Larger = higher quality, slower search, more storage.

384 dims: Fast, lightweight. Suitable for speed-critical apps. Lower semantic precision.
768 dims: Balanced (default for most). Recommended starting point.
1536 dims: High quality, slower. Use if quality is critical.

Start with 768; benchmark if needed.

What about consistency guarantees?

Pinecone, Qdrant, Weaviate: eventual consistency. Updates visible within seconds. Milvus: eventual linearizability. pgvector: strong (ACID). For RAG, eventual consistency is fine.

Can I use multiple vector databases?

Yes. Pinecone for scale, Qdrant for low-latency retrieval. Replicate vectors to both; route queries based on SLA. Extra ops burden.

How do I filter vectors?

All except ChromaDB support metadata filtering.

Qdrant: JSON filters with complex logic (field: "status", match: "active")
Weaviate: GraphQL where clause
Pinecone: namespace-based isolation + metadata in each vector object

Which database for RAG?

Qdrant (latency + filtering) or Pinecone (simplicity). Weaviate if hybrid search (keyword + embedding) is requirement. Both are production-ready.

Is vector database overkill for small datasets?

For <5M vectors, PostgreSQL pgvector or ChromaDB suffices. Upgrade when search latency becomes noticeable (>200ms).

What about vector compression/quantization?

Most databases support quantization:

8-bit: 50% reduction, minimal quality loss
4-bit: 75% reduction, 2-5% quality loss
Binary: 99% reduction, acceptable for similarity ranking

Quantize if VRAM or storage is bottleneck.

Sources

Pinecone Documentation
Weaviate Documentation
Qdrant Documentation
Milvus Documentation
ChromaDB Documentation
pgvector GitHub Repository
Vector Database Benchmark 2026 (DeployBase testing, March 2026)

Contents

Best Vector Database: Overview

Quick Comparison Table

Pinecone

Weaknesses

Pricing Detail

Use Cases

Weaviate

Overview

Strengths

Weaknesses

Pricing

Use Cases

Qdrant

Overview

Strengths

Weaknesses

Pricing

Use Cases

Milvus

Overview

Strengths

Weaknesses

Pricing

Use Cases

ChromaDB

Overview

Strengths

Weaknesses

Pricing

Use Cases

PostgreSQL pgvector

Overview

Strengths

Weaknesses

Pricing

Use Cases

Distance Metrics and Search

Vector Distance Metrics

Index Algorithms

Performance Benchmarks

Latency (P95, single query, 100M vectors)

Throughput (queries per second, 100M vectors, batch search)

Ingestion Speed (vectors per second, bulk insert)

Pricing Analysis

Cost Per Million Vectors/Month

Migration and Interoperability

Exporting from One Database to Another

Vector Embedding Compatibility

Selection Guide

For Maximum Scale, Lowest Cost

For Ease of Use + Scale

For Real-Time <10ms Latency

For Existing Postgres Users

For Hybrid Search (Keyword + Vector)

For Development/Prototyping

For Multimodal (Image + Text)

FAQ

Related Resources

Sources