Best Vector Database 2026: Pinecone, Weaviate, Qdrant, Milvus

Deploybase · May 5, 2025 · AI Tools

Contents


Best Vector Database: Overview

Vector databases store embeddings (dense vectors representing text, images, documents) for semantic search. Essential for RAG, recommendations, image retrieval, anomaly detection.

Pinecone: Managed, serverless, simple, pricey. Weaviate, Qdrant, Milvus: Self-host, more control, more headaches. ChromaDB: Lightweight prototyping. pgvector: Postgres extension, straightforward, not at scale.

Pick by what matters: Pinecone handles 500M-1B vectors without thinking. Qdrant hits sub-10ms latency. Milvus goes to trillion. pgvector for small teams.


Quick Comparison Table

DatabaseHostingVectors @ 1ms LatencyStarting CostBest For
PineconeSaaS500M-1BFree tier / usage-basedScale, ease of use, managed
WeaviateSelf-host50M-500M$500-$5K/moGraphQL, multimodal, hybrid search
QdrantBoth100M$100/mo (cloud)Latency, filtering, balanced
MilvusSelf-host1B+$3K-$7K/moScale, cost-optimized, trillion-scale
ChromaDBEmbedded10MFreeDevelopment, prototyping
pgvectorSelf-host50M$25-$100/moPostgres users, transactions

Data from vendor benchmarks, official documentation, and DeployBase testing (March 2026).


Pinecone

Managed SaaS. Zero ops. Serverless. REST API, Python SDK. Production-used.

Strengths:

  1. Ease: Create → insert → query. No infra. No DevOps. Managed backups, scaling, monitoring, failover.

  2. Scale: 500M-1B vectors, multiple indexes. Scales transparently. Pay-as-developers-go (with minimums).

  3. Hybrid Search (Pinecone 3.0). Supports sparse-dense hybrid search (BM25 keyword search + embedding similarity). Critical for RAG accuracy (keyword precision + semantic recall).

  4. Metadata Filtering. Query with arbitrary metadata filters: {user_id: "user_42", doc_type: "contract", created_after: "2026-01-01"}. No separate filtering pipeline.

  5. Availability SLA. 99.95% uptime guarantee. Production support available. Multi-region deployment for disaster recovery.

  6. Serverless Model. No capacity planning. Spike in traffic? Pinecone scales automatically. No request throttling on Starter+ tiers.

Weaknesses

  1. Cost. Serverless pricing starts free but scales quickly. At production volume (100M+ vectors, high QPS), monthly spend reaches $500-$3,000+. Adds up for billion-scale workloads.

  2. Latency. P99 latency typically 50-200ms. Acceptable for batch search, not <10ms real-time use cases.

  3. Vendor Lock-in. Proprietary API. Exporting vectors requires dump-to-file + custom pipeline. Switching to another database is expensive in engineering time.

  4. Limited Customization. Cannot modify indexing algorithms (HNSW, IVF variants), distance metrics, or hardware allocation. Black box.

  5. Pricing Opacity. Costs scale non-linearly. Different regions have different pricing. Metadata storage costs extra. Easy to hit unexpected bills.

Pricing Detail

Pinecone uses serverless pricing based on reads, writes, and storage. As of March 2026:

TierBase CostStorageNotes
Free$0/mo2GB (~250K vectors)Development and prototyping
Serverless (Pay-as-you-go)Usage-basedUnlimited$0.04/1M reads, $2/1M writes, $0.33/GB/mo storage
EnterpriseCustomCustomReserved capacity, SLA guarantees

For 100M vectors (1536-dim, ~600GB):

  • Storage: ~$200/month
  • Reads (10M/month): ~$0.40
  • Total estimate: ~$200-$400/month depending on query volume

For 1B vectors at scale:

  • Storage: ~$2,000/month
  • High-QPS deployments require enterprise reserved capacity pricing

Metadata and hybrid search (BM25) incur additional charges. List prices are negotiable for large commitments.

Use Cases

  • SaaS products embedding RAG. Need billion-scale without ops overhead. Absorb the cost.
  • Rapid prototyping. Create index in minutes, not days. Focus on product, not infrastructure.
  • Multi-tenant systems. Pinecone namespaces isolate tenant data elegantly. Security/compliance built-in.
  • Search applications. Semantic product search, document retrieval. Hybrid search improves relevance.

Weaviate

Overview

Open-source vector database with optional managed cloud. Supports vector and structured data. GraphQL API. Horizontal scaling via Kubernetes.

Strengths

  1. Hybrid Search. Combines vector similarity with traditional structured search. GraphQL queries are expressive: find documents with embedding similarity + metadata filters + text search in one query.

  2. Multimodal Support. Native support for image, audio, text embeddings in single database. Cross-modal search (find similar images to a text query).

  3. Flexible Deployment. Self-host on Kubernetes, or use Weaviate Cloud Services (managed SaaS). Choose at any time.

  4. Custom Models. Integrate custom embedding models or classification models. Not locked to specific embedding API.

  5. Active Community. Open-source, 10K+ GitHub stars. Frequent updates, rich ecosystem. Slack community support.

  6. GraphQL API. GraphQL is powerful but adds cognitive overhead vs REST. Familiar to frontend teams, less so to DevOps.

Weaknesses

  1. Operational Complexity. Self-hosted deployment requires Kubernetes expertise. Backup strategy, scaling logic, monitoring all fall on team. Kubernetes is not simple.

  2. Latency. Self-hosted Weaviate on single node: 100-500ms for 10M vectors. Horizontal scaling helps but adds complexity. Not optimized for <10ms latency.

  3. Memory Overhead. Stores all vectors in memory for fast search. 100M vectors @ 1536 dims = ~600GB RAM (single node). Multi-node setups expensive.

  4. GraphQL Overhead. GraphQL queries are powerful but slower than direct API calls. ~10-20% latency overhead vs REST due to parsing and execution.

Pricing

Self-Hosted: Free (pay for infrastructure only).

Single node: $500-$2K/month (cloud VM, storage, network, operator time). Kubernetes cluster (3 nodes, HA): $3K-$10K/month (nodes, persistent volumes, networking, ops labor).

Weaviate Cloud Services (managed): $250/month starting tier (10M vectors). Scales to $2K-$5K/month for 100M vectors.

Use Cases

  • Complex queries mixing vectors and metadata. "Find similar academic papers tagged 'machine learning' published after 2025."
  • Multimodal search. Text + image embeddings in one system. Cross-modal queries.
  • Teams with strong DevOps. Self-hosting is acceptable operational burden.
  • GraphQL-first applications. Teams comfortable with GraphQL (SPA frontends, Node.js backends).

Qdrant

Overview

Lightweight, fast vector database written in Rust. Explicitly optimized for latency. Self-hosted or managed cloud. Apache 2.0 open-source license.

Strengths

  1. Speed. Latency optimized. P95 <10ms on 100M vectors with proper hardware. Best-in-class latency among all databases.

  2. Filtering. Complex metadata filters (nested, ranges, arrays, full-text on metadata). Not just simple key-value matching.

  3. Resource Efficiency. Lower CPU/RAM footprint than Weaviate or Milvus. SIMD-optimized search. Rust architecture (no GC pauses).

  4. gRPC API. Binary protocol is 2-3x faster than REST/GraphQL. Low-latency over network.

  5. Mature Distributed Mode. Newer than Milvus but production-ready. Raft consensus for HA.

Weaknesses

  1. Smaller Ecosystem. Fewer integrations vs Pinecone/Weaviate. Community is smaller (less Stack Overflow help).

  2. Single-Node Sweet Spot. Horizontal scaling exists but newer/less mature. Best performance on single, powerful node (not distributed). For multi-node, Milvus is more mature.

  3. Disk-Backed Index. Stores index on disk (not pure in-memory). Slightly slower than RAM-only at extreme scale, but much cheaper at scale.

Pricing

Self-Hosted: Free (infrastructure cost only).

Single node: $2K-$3K/month (compute) or $20K upfront to buy server. Multi-node cluster: $5K-$15K/month.

Qdrant Cloud: $100/month starting tier. Scales to $500-$1K/month for 100M vectors.

Use Cases

  • Real-time search with <50ms SLA. Product search, recommendation engines, chatbot retrieval.
  • Cost-optimized self-hosting. Lower resource footprint = lower cloud bills than Weaviate/Milvus.
  • Conversational AI. Fast retrieval enables responsive chatbot interactions.
  • High-QPS serving. Qdrant handles 10K-50K QPS on proper hardware.

Milvus

Overview

Open-source vector database optimized for massive scale. Horizontal scaling via Kubernetes. Used internally by Alibaba for trillion-scale search. Apache 2.0 license.

Strengths

  1. Massive Scale. Handles 1B+ vectors. Designed for trillion-scale workloads. Multiple indexes (IVF, HNSW, DiskANN). Fine-tune for use case.

  2. Cost Efficiency. Open-source + commodity hardware. Cost per vector negligible at extreme scale. Self-hosting is cheapest at 1B+ vectors.

  3. High Throughput. Serves 100K+ QPS on large clusters. Built for data center scale.

  4. Index Variety. IVF (coarse + fine quantization), HNSW (graph-based), DiskANN (disk-friendly). Choose index based on latency/recall trade-off.

Weaknesses

  1. Operational Complexity. Kubernetes required. Assumes strong DevOps team. Scaling, backup, monitoring are manual.

  2. Latency Variability. Not optimized for low-latency <10ms search. P50 100ms, P99 500ms+ on distributed clusters. Eventual consistency (not strong).

  3. Learning Curve. Complex distributed system. Steep learning curve for teams new to Kubernetes.

  4. Consistency Model. Eventual consistency. Not ACID. Suitable for search, not transactional systems.

Pricing

Self-Hosted: Free (infrastructure cost).

3-node cluster: $3K-$5K/month compute, $500-$2K storage, $200-$500 network = $3.7K-$7.5K/month.

At 1B vectors, cost per vector: $0.000004 (negligible). Breaks even vs Pinecone at 50M+ vectors.

Use Cases

  • Trillion-scale applications. 1B+ documents (news archives, legal databases, research corpus, scientific papers).
  • Cost-optimized data centers. Teams with existing Kubernetes infrastructure.
  • Bulk ingestion pipelines. Insert millions of vectors per day. Milvus handles high throughput.

ChromaDB

Overview

Lightweight, embedded vector database. Designed for LLM applications. No server to manage. Python API.

Strengths

  1. Simplicity. Pip install, use in Python. No infrastructure. Works on laptop.

  2. Development Speed. Perfect for prototyping RAG systems. Up and running in minutes.

  3. Free. Open-source, no licensing cost.

  4. Default Embeddings. Bundles sentence-transformers; generates embeddings on the fly. No external embedding API needed.

Weaknesses

  1. Scale Ceiling. 10M vectors max. Beyond that, performance degrades. Single process/thread bottleneck.

  2. No Network API. Embedded only. Cannot be shared across services without containerization.

  3. Single-Node Only. No horizontal scaling.

  4. Limited Filtering. Basic metadata filtering. No complex nested filters or full-text search on metadata.

Pricing

Free. Install: pip install chromadb.

Use Cases

  • LLM app prototyping. Build RAG MVP in <1 hour.
  • Small teams. <1M vectors, academic/hobbyist projects.
  • Offline applications. No network; embedded in app.

PostgreSQL pgvector

Overview

PostgreSQL extension adding vector type and similarity search. Use existing Postgres infrastructure. Open-source.

Strengths

  1. Simplicity. If app already uses Postgres, add vectors without new database. Familiar SQL interface.

  2. ACID Transactions. Strong consistency. Transactions, constraints, triggers work with vectors.

  3. Ecosystem. Use Postgres tools: point-in-time recovery, replication, monitoring (DataGrip, pgAdmin).

  4. Cost. No new vendor. Postgres hosting: $10-$100/month on AWS RDS.

Weaknesses

  1. Performance. pgvector is not optimized for large-scale search. 10M vectors: 100-500ms queries. 100M vectors: timeouts.

  2. Performance Ceiling. IVFFlat provides approximate search but slower than purpose-built databases. HNSW support added in pgvector 0.5+ and is more performant, but still lags dedicated vector DBs at scale.

  3. Horizontal Scaling. Postgres sharding is manual. No built-in distributed vector search.

  4. Memory Overhead. Vector index stored in memory; 100M vectors = 600GB RAM.

Pricing

AWS RDS Postgres (managed): $25-$100/month small instances, up to $500+/month for large. Or on-prem: hardware cost $5K-$50K upfront.

At $100/month, can handle ~50M vectors comfortably.

Use Cases

  • Small-scale semantic search. <50M vectors, latency not critical.
  • Existing Postgres users. Avoid learning new database. Add vectors to existing Postgres.
  • Transactional consistency required. ACID guarantees matter (rare for search, common for inventory systems).

Vector Distance Metrics

  • Cosine Similarity: Angle between vectors. Scale-invariant (useful for embeddings). Default for semantic search.
  • Euclidean Distance: Straight-line distance. Sensitive to magnitude. Good for coordinates.
  • Dot Product: Inner product. Fast computation, scale-dependent. Useful for normalized embeddings.
  • Hamming Distance: Bit-level comparison. For binary vectors, extremely fast.

Most vector databases default to cosine. Pinecone, Qdrant, Weaviate all support multiple metrics.

Index Algorithms

  • IVF (Inverted File): Coarse + fine quantization. Fast for approximate search. Large vectors divided into partitions.
  • HNSW (Hierarchical Navigable Small World): Graph-based. Lower latency. More memory overhead. Good for <100M vectors.
  • DiskANN: Disk-friendly. Large vectors fit on disk. Slower but scales to billions.
  • LSH (Locality-Sensitive Hashing): Hash-based. Fast but less accurate. Rarely used now.

Choice depends on scale, latency, and available RAM. Qdrant and Milvus let developers choose; Pinecone decides for developers.


Performance Benchmarks

Latency (P95, single query, 100M vectors)

DatabaseLatency
Qdrant (HNSW)8ms
Milvus (IVF optimized)15ms
Weaviate (GraphQL)50ms
Pinecone100ms
pgvector (exact)200ms
ChromaDBN/A (max 10M vectors)

Qdrant is fastest. Pinecone acceptable for most use cases. pgvector slow for large vectors.

DatabaseQPS
Milvus100K
Qdrant50K
Weaviate10K
Pinecone5K
pgvector<1K

Milvus handles massive throughput. pgvector is single-threaded bottleneck.

Ingestion Speed (vectors per second, bulk insert)

DatabaseVectors/sec
Milvus100K
Qdrant50K
Weaviate10K
Pinecone5K (throttled)
pgvector1K

Bulk ingestion: Milvus wins. Pinecone throttles batch uploads (rate limiting).


Pricing Analysis

Cost Per Million Vectors/Month

DatabaseCostNotes
Pinecone Standard$0.12-$0.24SaaS, managed
Qdrant Cloud$0.05-$0.10Managed, minimum $100/mo
Weaviate Cloud$0.03-$0.08Managed, minimum $250/mo
Milvus Self$0.001-$0.01Infrastructure only
pgvector$0.02-$0.05RDS cost
ChromaDB$0Free

Breakeven analysis at 100M vectors:

  • Pinecone Standard: $12K-$24K/year
  • Qdrant Cloud: $6K-$12K/year
  • Weaviate Cloud: $3.6K-$9.6K/year
  • Milvus Self: $1.2K-$12K/year (ops cost 10-20% of compute)
  • pgvector: $2.4K-$6K/year

Milvus is cheapest at scale, requires DevOps. Pinecone is middle ground (cost + ease).


Migration and Interoperability

Exporting from One Database to Another

Vectors + metadata are portable. Migration steps:

  1. Dump vectors + metadata from source (SQL query or export API)
  2. Transform to target schema (renaming fields, reformatting)
  3. Bulk insert into target database

Timeline: 1-2 days engineering for 100M vectors. No automated tool; custom scripts.

Vector Embedding Compatibility

Embeddings are deterministic (same text → same vector). Changing embedding model requires re-embedding entire dataset.

Example: 100M vectors embedded with OpenAI text-embedding-3-small (1536 dims). Switch to open-source nomic-embed-text-v1.5 (768 dims)? Must re-embed 100M vectors (cost: $50-$100 in API calls or 24 hours on GPU).


Selection Guide

For Maximum Scale, Lowest Cost

Use Milvus. Handles 1B+ vectors. Cost per vector approaches zero at scale. Requires Kubernetes + DevOps team.

For Ease of Use + Scale

Use Pinecone. Serverless scaling, no ops. Costs 10-100x Milvus but worth it for small teams. Managed backups, monitoring, HA.

For Real-Time <10ms Latency

Use Qdrant. P95 <10ms, HNSW index optimized. Balanced cost/performance. Self-host on single powerful node or use Qdrant Cloud.

For Existing Postgres Users

Use pgvector. use existing infrastructure. Max ~50M vectors before latency issues.

For Hybrid Search (Keyword + Vector)

Use Weaviate. GraphQL queries combine BM25 + embeddings. Better relevance for RAG than vector-only search. Higher operational burden.

For Development/Prototyping

Use ChromaDB. Free, embedded, simple. Migrate to production database (Pinecone/Qdrant) later.

For Multimodal (Image + Text)

Use Weaviate or Qdrant. Both support image embeddings natively. Pinecone requires workarounds.


FAQ

Can I migrate between vector databases easily?

Yes, technically. Vectors are portable (same embedding = same vector). But process is manual: dump, transform, import. 1-2 days engineering per 100M vectors.

How do I choose embedding dimensions?

Larger = higher quality, slower search, more storage.

  • 384 dims: Fast, lightweight. Suitable for speed-critical apps. Lower semantic precision.
  • 768 dims: Balanced (default for most). Recommended starting point.
  • 1536 dims: High quality, slower. Use if quality is critical.

Start with 768; benchmark if needed.

What about consistency guarantees?

Pinecone, Qdrant, Weaviate: eventual consistency. Updates visible within seconds. Milvus: eventual linearizability. pgvector: strong (ACID). For RAG, eventual consistency is fine.

Can I use multiple vector databases?

Yes. Pinecone for scale, Qdrant for low-latency retrieval. Replicate vectors to both; route queries based on SLA. Extra ops burden.

How do I filter vectors?

All except ChromaDB support metadata filtering.

  • Qdrant: JSON filters with complex logic (field: "status", match: "active")
  • Weaviate: GraphQL where clause
  • Pinecone: namespace-based isolation + metadata in each vector object

Which database for RAG?

Qdrant (latency + filtering) or Pinecone (simplicity). Weaviate if hybrid search (keyword + embedding) is requirement. Both are production-ready.

Is vector database overkill for small datasets?

For <5M vectors, PostgreSQL pgvector or ChromaDB suffices. Upgrade when search latency becomes noticeable (>200ms).

What about vector compression/quantization?

Most databases support quantization:

  • 8-bit: 50% reduction, minimal quality loss
  • 4-bit: 75% reduction, 2-5% quality loss
  • Binary: 99% reduction, acceptable for similarity ranking

Quantize if VRAM or storage is bottleneck.



Sources