Contents
- Best Vector Database: Overview
- Quick Comparison Table
- Pinecone
- Weaviate
- Qdrant
- Milvus
- ChromaDB
- PostgreSQL pgvector
- Distance Metrics and Search
- Performance Benchmarks
- Pricing Analysis
- Migration and Interoperability
- Selection Guide
- FAQ
- Related Resources
- Sources
Best Vector Database: Overview
Vector databases store embeddings (dense vectors representing text, images, documents) for semantic search. Essential for RAG, recommendations, image retrieval, anomaly detection.
Pinecone: Managed, serverless, simple, pricey. Weaviate, Qdrant, Milvus: Self-host, more control, more headaches. ChromaDB: Lightweight prototyping. pgvector: Postgres extension, straightforward, not at scale.
Pick by what matters: Pinecone handles 500M-1B vectors without thinking. Qdrant hits sub-10ms latency. Milvus goes to trillion. pgvector for small teams.
Quick Comparison Table
| Database | Hosting | Vectors @ 1ms Latency | Starting Cost | Best For |
|---|---|---|---|---|
| Pinecone | SaaS | 500M-1B | Free tier / usage-based | Scale, ease of use, managed |
| Weaviate | Self-host | 50M-500M | $500-$5K/mo | GraphQL, multimodal, hybrid search |
| Qdrant | Both | 100M | $100/mo (cloud) | Latency, filtering, balanced |
| Milvus | Self-host | 1B+ | $3K-$7K/mo | Scale, cost-optimized, trillion-scale |
| ChromaDB | Embedded | 10M | Free | Development, prototyping |
| pgvector | Self-host | 50M | $25-$100/mo | Postgres users, transactions |
Data from vendor benchmarks, official documentation, and DeployBase testing (March 2026).
Pinecone
Managed SaaS. Zero ops. Serverless. REST API, Python SDK. Production-used.
Strengths:
-
Ease: Create → insert → query. No infra. No DevOps. Managed backups, scaling, monitoring, failover.
-
Scale: 500M-1B vectors, multiple indexes. Scales transparently. Pay-as-developers-go (with minimums).
-
Hybrid Search (Pinecone 3.0). Supports sparse-dense hybrid search (BM25 keyword search + embedding similarity). Critical for RAG accuracy (keyword precision + semantic recall).
-
Metadata Filtering. Query with arbitrary metadata filters:
{user_id: "user_42", doc_type: "contract", created_after: "2026-01-01"}. No separate filtering pipeline. -
Availability SLA. 99.95% uptime guarantee. Production support available. Multi-region deployment for disaster recovery.
-
Serverless Model. No capacity planning. Spike in traffic? Pinecone scales automatically. No request throttling on Starter+ tiers.
Weaknesses
-
Cost. Serverless pricing starts free but scales quickly. At production volume (100M+ vectors, high QPS), monthly spend reaches $500-$3,000+. Adds up for billion-scale workloads.
-
Latency. P99 latency typically 50-200ms. Acceptable for batch search, not <10ms real-time use cases.
-
Vendor Lock-in. Proprietary API. Exporting vectors requires dump-to-file + custom pipeline. Switching to another database is expensive in engineering time.
-
Limited Customization. Cannot modify indexing algorithms (HNSW, IVF variants), distance metrics, or hardware allocation. Black box.
-
Pricing Opacity. Costs scale non-linearly. Different regions have different pricing. Metadata storage costs extra. Easy to hit unexpected bills.
Pricing Detail
Pinecone uses serverless pricing based on reads, writes, and storage. As of March 2026:
| Tier | Base Cost | Storage | Notes |
|---|---|---|---|
| Free | $0/mo | 2GB (~250K vectors) | Development and prototyping |
| Serverless (Pay-as-you-go) | Usage-based | Unlimited | $0.04/1M reads, $2/1M writes, $0.33/GB/mo storage |
| Enterprise | Custom | Custom | Reserved capacity, SLA guarantees |
For 100M vectors (1536-dim, ~600GB):
- Storage: ~$200/month
- Reads (10M/month): ~$0.40
- Total estimate: ~$200-$400/month depending on query volume
For 1B vectors at scale:
- Storage: ~$2,000/month
- High-QPS deployments require enterprise reserved capacity pricing
Metadata and hybrid search (BM25) incur additional charges. List prices are negotiable for large commitments.
Use Cases
- SaaS products embedding RAG. Need billion-scale without ops overhead. Absorb the cost.
- Rapid prototyping. Create index in minutes, not days. Focus on product, not infrastructure.
- Multi-tenant systems. Pinecone namespaces isolate tenant data elegantly. Security/compliance built-in.
- Search applications. Semantic product search, document retrieval. Hybrid search improves relevance.
Weaviate
Overview
Open-source vector database with optional managed cloud. Supports vector and structured data. GraphQL API. Horizontal scaling via Kubernetes.
Strengths
-
Hybrid Search. Combines vector similarity with traditional structured search. GraphQL queries are expressive: find documents with embedding similarity + metadata filters + text search in one query.
-
Multimodal Support. Native support for image, audio, text embeddings in single database. Cross-modal search (find similar images to a text query).
-
Flexible Deployment. Self-host on Kubernetes, or use Weaviate Cloud Services (managed SaaS). Choose at any time.
-
Custom Models. Integrate custom embedding models or classification models. Not locked to specific embedding API.
-
Active Community. Open-source, 10K+ GitHub stars. Frequent updates, rich ecosystem. Slack community support.
-
GraphQL API. GraphQL is powerful but adds cognitive overhead vs REST. Familiar to frontend teams, less so to DevOps.
Weaknesses
-
Operational Complexity. Self-hosted deployment requires Kubernetes expertise. Backup strategy, scaling logic, monitoring all fall on team. Kubernetes is not simple.
-
Latency. Self-hosted Weaviate on single node: 100-500ms for 10M vectors. Horizontal scaling helps but adds complexity. Not optimized for <10ms latency.
-
Memory Overhead. Stores all vectors in memory for fast search. 100M vectors @ 1536 dims = ~600GB RAM (single node). Multi-node setups expensive.
-
GraphQL Overhead. GraphQL queries are powerful but slower than direct API calls. ~10-20% latency overhead vs REST due to parsing and execution.
Pricing
Self-Hosted: Free (pay for infrastructure only).
Single node: $500-$2K/month (cloud VM, storage, network, operator time). Kubernetes cluster (3 nodes, HA): $3K-$10K/month (nodes, persistent volumes, networking, ops labor).
Weaviate Cloud Services (managed): $250/month starting tier (10M vectors). Scales to $2K-$5K/month for 100M vectors.
Use Cases
- Complex queries mixing vectors and metadata. "Find similar academic papers tagged 'machine learning' published after 2025."
- Multimodal search. Text + image embeddings in one system. Cross-modal queries.
- Teams with strong DevOps. Self-hosting is acceptable operational burden.
- GraphQL-first applications. Teams comfortable with GraphQL (SPA frontends, Node.js backends).
Qdrant
Overview
Lightweight, fast vector database written in Rust. Explicitly optimized for latency. Self-hosted or managed cloud. Apache 2.0 open-source license.
Strengths
-
Speed. Latency optimized. P95 <10ms on 100M vectors with proper hardware. Best-in-class latency among all databases.
-
Filtering. Complex metadata filters (nested, ranges, arrays, full-text on metadata). Not just simple key-value matching.
-
Resource Efficiency. Lower CPU/RAM footprint than Weaviate or Milvus. SIMD-optimized search. Rust architecture (no GC pauses).
-
gRPC API. Binary protocol is 2-3x faster than REST/GraphQL. Low-latency over network.
-
Mature Distributed Mode. Newer than Milvus but production-ready. Raft consensus for HA.
Weaknesses
-
Smaller Ecosystem. Fewer integrations vs Pinecone/Weaviate. Community is smaller (less Stack Overflow help).
-
Single-Node Sweet Spot. Horizontal scaling exists but newer/less mature. Best performance on single, powerful node (not distributed). For multi-node, Milvus is more mature.
-
Disk-Backed Index. Stores index on disk (not pure in-memory). Slightly slower than RAM-only at extreme scale, but much cheaper at scale.
Pricing
Self-Hosted: Free (infrastructure cost only).
Single node: $2K-$3K/month (compute) or $20K upfront to buy server. Multi-node cluster: $5K-$15K/month.
Qdrant Cloud: $100/month starting tier. Scales to $500-$1K/month for 100M vectors.
Use Cases
- Real-time search with <50ms SLA. Product search, recommendation engines, chatbot retrieval.
- Cost-optimized self-hosting. Lower resource footprint = lower cloud bills than Weaviate/Milvus.
- Conversational AI. Fast retrieval enables responsive chatbot interactions.
- High-QPS serving. Qdrant handles 10K-50K QPS on proper hardware.
Milvus
Overview
Open-source vector database optimized for massive scale. Horizontal scaling via Kubernetes. Used internally by Alibaba for trillion-scale search. Apache 2.0 license.
Strengths
-
Massive Scale. Handles 1B+ vectors. Designed for trillion-scale workloads. Multiple indexes (IVF, HNSW, DiskANN). Fine-tune for use case.
-
Cost Efficiency. Open-source + commodity hardware. Cost per vector negligible at extreme scale. Self-hosting is cheapest at 1B+ vectors.
-
High Throughput. Serves 100K+ QPS on large clusters. Built for data center scale.
-
Index Variety. IVF (coarse + fine quantization), HNSW (graph-based), DiskANN (disk-friendly). Choose index based on latency/recall trade-off.
Weaknesses
-
Operational Complexity. Kubernetes required. Assumes strong DevOps team. Scaling, backup, monitoring are manual.
-
Latency Variability. Not optimized for low-latency <10ms search. P50 100ms, P99 500ms+ on distributed clusters. Eventual consistency (not strong).
-
Learning Curve. Complex distributed system. Steep learning curve for teams new to Kubernetes.
-
Consistency Model. Eventual consistency. Not ACID. Suitable for search, not transactional systems.
Pricing
Self-Hosted: Free (infrastructure cost).
3-node cluster: $3K-$5K/month compute, $500-$2K storage, $200-$500 network = $3.7K-$7.5K/month.
At 1B vectors, cost per vector: $0.000004 (negligible). Breaks even vs Pinecone at 50M+ vectors.
Use Cases
- Trillion-scale applications. 1B+ documents (news archives, legal databases, research corpus, scientific papers).
- Cost-optimized data centers. Teams with existing Kubernetes infrastructure.
- Bulk ingestion pipelines. Insert millions of vectors per day. Milvus handles high throughput.
ChromaDB
Overview
Lightweight, embedded vector database. Designed for LLM applications. No server to manage. Python API.
Strengths
-
Simplicity. Pip install, use in Python. No infrastructure. Works on laptop.
-
Development Speed. Perfect for prototyping RAG systems. Up and running in minutes.
-
Free. Open-source, no licensing cost.
-
Default Embeddings. Bundles sentence-transformers; generates embeddings on the fly. No external embedding API needed.
Weaknesses
-
Scale Ceiling. 10M vectors max. Beyond that, performance degrades. Single process/thread bottleneck.
-
No Network API. Embedded only. Cannot be shared across services without containerization.
-
Single-Node Only. No horizontal scaling.
-
Limited Filtering. Basic metadata filtering. No complex nested filters or full-text search on metadata.
Pricing
Free. Install: pip install chromadb.
Use Cases
- LLM app prototyping. Build RAG MVP in <1 hour.
- Small teams. <1M vectors, academic/hobbyist projects.
- Offline applications. No network; embedded in app.
PostgreSQL pgvector
Overview
PostgreSQL extension adding vector type and similarity search. Use existing Postgres infrastructure. Open-source.
Strengths
-
Simplicity. If app already uses Postgres, add vectors without new database. Familiar SQL interface.
-
ACID Transactions. Strong consistency. Transactions, constraints, triggers work with vectors.
-
Ecosystem. Use Postgres tools: point-in-time recovery, replication, monitoring (DataGrip, pgAdmin).
-
Cost. No new vendor. Postgres hosting: $10-$100/month on AWS RDS.
Weaknesses
-
Performance. pgvector is not optimized for large-scale search. 10M vectors: 100-500ms queries. 100M vectors: timeouts.
-
Performance Ceiling. IVFFlat provides approximate search but slower than purpose-built databases. HNSW support added in pgvector 0.5+ and is more performant, but still lags dedicated vector DBs at scale.
-
Horizontal Scaling. Postgres sharding is manual. No built-in distributed vector search.
-
Memory Overhead. Vector index stored in memory; 100M vectors = 600GB RAM.
Pricing
AWS RDS Postgres (managed): $25-$100/month small instances, up to $500+/month for large. Or on-prem: hardware cost $5K-$50K upfront.
At $100/month, can handle ~50M vectors comfortably.
Use Cases
- Small-scale semantic search. <50M vectors, latency not critical.
- Existing Postgres users. Avoid learning new database. Add vectors to existing Postgres.
- Transactional consistency required. ACID guarantees matter (rare for search, common for inventory systems).
Distance Metrics and Search
Vector Distance Metrics
- Cosine Similarity: Angle between vectors. Scale-invariant (useful for embeddings). Default for semantic search.
- Euclidean Distance: Straight-line distance. Sensitive to magnitude. Good for coordinates.
- Dot Product: Inner product. Fast computation, scale-dependent. Useful for normalized embeddings.
- Hamming Distance: Bit-level comparison. For binary vectors, extremely fast.
Most vector databases default to cosine. Pinecone, Qdrant, Weaviate all support multiple metrics.
Index Algorithms
- IVF (Inverted File): Coarse + fine quantization. Fast for approximate search. Large vectors divided into partitions.
- HNSW (Hierarchical Navigable Small World): Graph-based. Lower latency. More memory overhead. Good for <100M vectors.
- DiskANN: Disk-friendly. Large vectors fit on disk. Slower but scales to billions.
- LSH (Locality-Sensitive Hashing): Hash-based. Fast but less accurate. Rarely used now.
Choice depends on scale, latency, and available RAM. Qdrant and Milvus let developers choose; Pinecone decides for developers.
Performance Benchmarks
Latency (P95, single query, 100M vectors)
| Database | Latency |
|---|---|
| Qdrant (HNSW) | 8ms |
| Milvus (IVF optimized) | 15ms |
| Weaviate (GraphQL) | 50ms |
| Pinecone | 100ms |
| pgvector (exact) | 200ms |
| ChromaDB | N/A (max 10M vectors) |
Qdrant is fastest. Pinecone acceptable for most use cases. pgvector slow for large vectors.
Throughput (queries per second, 100M vectors, batch search)
| Database | QPS |
|---|---|
| Milvus | 100K |
| Qdrant | 50K |
| Weaviate | 10K |
| Pinecone | 5K |
| pgvector | <1K |
Milvus handles massive throughput. pgvector is single-threaded bottleneck.
Ingestion Speed (vectors per second, bulk insert)
| Database | Vectors/sec |
|---|---|
| Milvus | 100K |
| Qdrant | 50K |
| Weaviate | 10K |
| Pinecone | 5K (throttled) |
| pgvector | 1K |
Bulk ingestion: Milvus wins. Pinecone throttles batch uploads (rate limiting).
Pricing Analysis
Cost Per Million Vectors/Month
| Database | Cost | Notes |
|---|---|---|
| Pinecone Standard | $0.12-$0.24 | SaaS, managed |
| Qdrant Cloud | $0.05-$0.10 | Managed, minimum $100/mo |
| Weaviate Cloud | $0.03-$0.08 | Managed, minimum $250/mo |
| Milvus Self | $0.001-$0.01 | Infrastructure only |
| pgvector | $0.02-$0.05 | RDS cost |
| ChromaDB | $0 | Free |
Breakeven analysis at 100M vectors:
- Pinecone Standard: $12K-$24K/year
- Qdrant Cloud: $6K-$12K/year
- Weaviate Cloud: $3.6K-$9.6K/year
- Milvus Self: $1.2K-$12K/year (ops cost 10-20% of compute)
- pgvector: $2.4K-$6K/year
Milvus is cheapest at scale, requires DevOps. Pinecone is middle ground (cost + ease).
Migration and Interoperability
Exporting from One Database to Another
Vectors + metadata are portable. Migration steps:
- Dump vectors + metadata from source (SQL query or export API)
- Transform to target schema (renaming fields, reformatting)
- Bulk insert into target database
Timeline: 1-2 days engineering for 100M vectors. No automated tool; custom scripts.
Vector Embedding Compatibility
Embeddings are deterministic (same text → same vector). Changing embedding model requires re-embedding entire dataset.
Example: 100M vectors embedded with OpenAI text-embedding-3-small (1536 dims). Switch to open-source nomic-embed-text-v1.5 (768 dims)? Must re-embed 100M vectors (cost: $50-$100 in API calls or 24 hours on GPU).
Selection Guide
For Maximum Scale, Lowest Cost
Use Milvus. Handles 1B+ vectors. Cost per vector approaches zero at scale. Requires Kubernetes + DevOps team.
For Ease of Use + Scale
Use Pinecone. Serverless scaling, no ops. Costs 10-100x Milvus but worth it for small teams. Managed backups, monitoring, HA.
For Real-Time <10ms Latency
Use Qdrant. P95 <10ms, HNSW index optimized. Balanced cost/performance. Self-host on single powerful node or use Qdrant Cloud.
For Existing Postgres Users
Use pgvector. use existing infrastructure. Max ~50M vectors before latency issues.
For Hybrid Search (Keyword + Vector)
Use Weaviate. GraphQL queries combine BM25 + embeddings. Better relevance for RAG than vector-only search. Higher operational burden.
For Development/Prototyping
Use ChromaDB. Free, embedded, simple. Migrate to production database (Pinecone/Qdrant) later.
For Multimodal (Image + Text)
Use Weaviate or Qdrant. Both support image embeddings natively. Pinecone requires workarounds.
FAQ
Can I migrate between vector databases easily?
Yes, technically. Vectors are portable (same embedding = same vector). But process is manual: dump, transform, import. 1-2 days engineering per 100M vectors.
How do I choose embedding dimensions?
Larger = higher quality, slower search, more storage.
- 384 dims: Fast, lightweight. Suitable for speed-critical apps. Lower semantic precision.
- 768 dims: Balanced (default for most). Recommended starting point.
- 1536 dims: High quality, slower. Use if quality is critical.
Start with 768; benchmark if needed.
What about consistency guarantees?
Pinecone, Qdrant, Weaviate: eventual consistency. Updates visible within seconds. Milvus: eventual linearizability. pgvector: strong (ACID). For RAG, eventual consistency is fine.
Can I use multiple vector databases?
Yes. Pinecone for scale, Qdrant for low-latency retrieval. Replicate vectors to both; route queries based on SLA. Extra ops burden.
How do I filter vectors?
All except ChromaDB support metadata filtering.
- Qdrant: JSON filters with complex logic (
field: "status", match: "active") - Weaviate: GraphQL
whereclause - Pinecone: namespace-based isolation + metadata in each vector object
Which database for RAG?
Qdrant (latency + filtering) or Pinecone (simplicity). Weaviate if hybrid search (keyword + embedding) is requirement. Both are production-ready.
Is vector database overkill for small datasets?
For <5M vectors, PostgreSQL pgvector or ChromaDB suffices. Upgrade when search latency becomes noticeable (>200ms).
What about vector compression/quantization?
Most databases support quantization:
- 8-bit: 50% reduction, minimal quality loss
- 4-bit: 75% reduction, 2-5% quality loss
- Binary: 99% reduction, acceptable for similarity ranking
Quantize if VRAM or storage is bottleneck.