Vector Databases — Pinecone vs pgvector vs Chroma — LLM Engineering in Production

In Lesson 5 you built a RAG pipeline using ChromaDB. That was the right choice for getting started fast. But production has different requirements: durability, scalability, high availability, monitoring, backups. The vector database you choose determines your operational burden for the next 2-3 years. This lesson gives you the information to make that decision well.

What Vector Databases Actually Do

A vector database stores high-dimensional vectors (embeddings) and retrieves the most similar ones to a query vector. That is it. Everything else — metadata filtering, persistence, replication, CRUD operations — is the packaging around that core operation.

The core operation is Approximate Nearest Neighbor (ANN) search, not exact K-Nearest Neighbor (KNN). Exact KNN compares the query vector against every stored vector. That is O(n) per query — fine for 10K vectors, unusable for 10M. ANN algorithms trade a small amount of accuracy for dramatically faster search by building index structures that narrow the search space.

ANN vs KNN

import numpy as np
import time

def exact_knn(query: np.ndarray, vectors: np.ndarray, k: int = 5) -> list[int]:
    """Exact KNN — compare against every vector. O(n)."""
    distances = np.linalg.norm(vectors - query, axis=1)
    return np.argsort(distances)[:k].tolist()

# With 1M vectors of dimension 1536:
# Exact KNN: ~500ms per query
# HNSW (ANN): ~2ms per query, 99%+ recall

The “approximate” part scares some people. In practice, modern ANN algorithms achieve 95-99%+ recall — meaning they find 95-99% of the true nearest neighbors. For RAG, this is more than enough. You are retrieving 5-20 chunks out of thousands or millions. Missing one marginally relevant chunk has negligible impact on answer quality.

Index Types: HNSW, IVF, Flat

Every vector database uses one or more of these index types. Understanding them helps you tune performance.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph. The top layer has few nodes spread far apart (for coarse navigation). Lower layers have more nodes packed closer together (for fine-grained search). Query traversal starts at the top and drills down.

Strengths: Fast queries (~1-5ms at 1M vectors), high recall (99%+), no training required. Weaknesses: High memory usage (stores the graph in RAM), slower inserts than IVF, memory grows linearly with data.

Layer 3: [A] -------------- [F]           (few nodes, big jumps)
Layer 2: [A] --- [C] --- [F] --- [H]      (more nodes, smaller jumps)  
Layer 1: [A]-[B]-[C]-[D]-[E]-[F]-[G]-[H]  (all nodes, fine-grained)

When to use: Default choice for most workloads. Best balance of speed and accuracy.

IVF (Inverted File Index)

IVF clusters vectors into buckets using k-means. At query time, it finds the nearest clusters and only searches vectors within those clusters.

Strengths: Lower memory than HNSW, faster inserts, tunable speed/accuracy tradeoff. Weaknesses: Requires training step (k-means on your data), lower recall than HNSW at the same speed.

Parameter	Effect
nlist (number of clusters)	More clusters = faster queries but lower recall
nprobe (clusters to search)	More probes = higher recall but slower queries

When to use: Very large datasets (10M+ vectors) where HNSW memory is a constraint.

Flat (Brute Force)

No index at all. Compares every vector. Exact results, guaranteed.

Strengths: Perfect recall, simple, no index build time. Weaknesses: O(n) query time. Unusable above ~100K vectors.

When to use: Small datasets, or as a baseline to measure ANN recall.

Index Comparison Summary

Index	Query Speed (1M vectors)	Memory	Recall	Build Time
Flat	~500ms	Low	100%	None
IVF	~5-20ms	Medium	90-98%	Minutes
HNSW	~1-5ms	High	95-99%+	Minutes-Hours

ChromaDB Deep Dive

ChromaDB is an in-process vector database designed for simplicity. It runs inside your Python process with zero configuration.

Setup and Basic Operations

pip install chromadb

import chromadb

# In-memory (for testing)
client = chromadb.Client()

# Persistent (for production)
client = chromadb.PersistentClient(path="./chroma_data")

# Create a collection
collection = client.get_or_create_collection(
    name="documents",
    metadata={
        "hnsw:space": "cosine",     # Distance metric: cosine, l2, or ip
        "hnsw:M": 16,               # HNSW connections per node (default 16)
        "hnsw:construction_ef": 100, # Build-time search width (default 100)
        "hnsw:search_ef": 50,        # Query-time search width (default 10)
    },
)

CRUD Operations

# CREATE — add documents
collection.add(
    ids=["doc1", "doc2", "doc3"],
    documents=[
        "How to reset your password",
        "Billing FAQ and payment methods",
        "API rate limits and quotas",
    ],
    metadatas=[
        {"category": "account", "priority": "high"},
        {"category": "billing", "priority": "medium"},
        {"category": "technical", "priority": "low"},
    ],
)

# READ — get by ID
results = collection.get(ids=["doc1", "doc2"])
print(results["documents"])  # ["How to reset your password", "Billing FAQ..."]

# UPDATE — upsert (update if exists, insert if not)
collection.upsert(
    ids=["doc1"],
    documents=["Updated: How to reset your password — new 2FA flow"],
    metadatas=[{"category": "account", "priority": "high", "updated": True}],
)

# DELETE
collection.delete(ids=["doc3"])

# DELETE with filter
collection.delete(where={"category": "billing"})

Similarity Search with Metadata Filtering

# Basic similarity search
results = collection.query(
    query_texts=["I forgot my login credentials"],
    n_results=5,
)

# Search with metadata filter
results = collection.query(
    query_texts=["payment issue"],
    n_results=5,
    where={"category": "billing"},  # Only search billing documents
)

# Complex filters
results = collection.query(
    query_texts=["urgent problem"],
    n_results=10,
    where={
        "$and": [
            {"category": {"$in": ["account", "billing"]}},
            {"priority": {"$eq": "high"}},
        ]
    },
)

# Access results
for i in range(len(results["documents"][0])):
    print(f"Score: {1 - results['distances'][0][i]:.3f}")
    print(f"  Doc: {results['documents'][0][i][:100]}")
    print(f"  Meta: {results['metadatas'][0][i]}")

Custom Embedding Functions

from chromadb.utils import embedding_functions

# OpenAI embeddings
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="sk-...",
    model_name="text-embedding-3-small",
)

# Sentence Transformers (local, free)
st_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2",
)

# Use with collection
collection = client.get_or_create_collection(
    name="documents",
    embedding_function=openai_ef,  # Applied automatically on add/query
)

ChromaDB Limitations

Single-process. No built-in clustering or replication. One process, one machine.
Memory-bound. HNSW index lives in RAM. At 1M vectors with 1536 dimensions, that is roughly 6GB RAM just for the index.
No authentication. Anyone who can connect to the Chroma server can read/write everything.
Limited filtering. Metadata filters work but are not as expressive as SQL WHERE clauses.
No backup tooling. You back up the data directory yourself.

Verdict: Excellent for prototyping, local development, and small production workloads (under 500K vectors). Move to pgvector or Pinecone when you need durability, auth, or scale.

pgvector Deep Dive

pgvector is a PostgreSQL extension that adds vector similarity search to your existing Postgres database. If you already run Postgres, this is the path of least resistance.

Installation

-- Enable the extension (requires pgvector installed on the server)
CREATE EXTENSION IF NOT EXISTS vector;

For Docker:

docker run -d \
  --name pgvector \
  -e POSTGRES_PASSWORD=password \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Schema Design

-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(1536),  -- Match your embedding model's dimensions
    metadata JSONB DEFAULT '{}',
    source VARCHAR(500),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Metadata index for filtering
CREATE INDEX idx_documents_metadata ON documents USING GIN (metadata);
CREATE INDEX idx_documents_source ON documents (source);

Index Creation: IVFFlat vs HNSW

pgvector supports two index types. Choose based on your workload.

-- HNSW index (recommended for most workloads)
-- Slower to build, faster queries, higher recall
CREATE INDEX idx_documents_embedding_hnsw 
ON documents 
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- IVFFlat index (for very large datasets)
-- Faster to build, requires training data, lower recall
CREATE INDEX idx_documents_embedding_ivf 
ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);  -- Rule of thumb: sqrt(num_rows) for lists

Index	Build Time (1M vectors)	Query Latency	Recall	Memory
None (flat)	N/A	~800ms	100%	Low
IVFFlat	~5 min	~10ms	90-95%	Medium
HNSW	~30 min	~3ms	98-99%	High

Python Integration

import psycopg2
import json
from openai import OpenAI

oai_client = OpenAI()

def get_connection():
    return psycopg2.connect(
        host="localhost",
        port=5432,
        dbname="ragdb",
        user="postgres",
        password="password",
    )

def insert_document(content: str, metadata: dict, source: str):
    """Embed and store a document."""
    # Generate embedding
    response = oai_client.embeddings.create(
        model="text-embedding-3-small",
        input=content,
    )
    embedding = response.data[0].embedding
    
    conn = get_connection()
    cur = conn.cursor()
    
    cur.execute(
        """
        INSERT INTO documents (content, embedding, metadata, source)
        VALUES (%s, %s::vector, %s, %s)
        RETURNING id
        """,
        (content, str(embedding), json.dumps(metadata), source),
    )
    
    doc_id = cur.fetchone()[0]
    conn.commit()
    cur.close()
    conn.close()
    
    return doc_id


def search_documents(
    query: str,
    n_results: int = 5,
    source_filter: str = None,
    metadata_filter: dict = None,
) -> list[dict]:
    """Search for similar documents using cosine similarity."""
    # Embed the query
    response = oai_client.embeddings.create(
        model="text-embedding-3-small",
        input=query,
    )
    query_embedding = response.data[0].embedding
    
    conn = get_connection()
    cur = conn.cursor()
    
    # Build query with optional filters
    sql = """
        SELECT id, content, metadata, source,
               1 - (embedding <=> %s::vector) as similarity
        FROM documents
        WHERE 1=1
    """
    params = [str(query_embedding)]
    
    if source_filter:
        sql += " AND source = %s"
        params.append(source_filter)
    
    if metadata_filter:
        for key, value in metadata_filter.items():
            sql += f" AND metadata->>'{key}' = %s"
            params.append(str(value))
    
    sql += " ORDER BY embedding <=> %s::vector LIMIT %s"
    params.extend([str(query_embedding), n_results])
    
    cur.execute(sql, params)
    
    results = []
    for row in cur.fetchall():
        results.append({
            "id": row[0],
            "content": row[1],
            "metadata": row[2],
            "source": row[3],
            "similarity": float(row[4]),
        })
    
    cur.close()
    conn.close()
    
    return results


# Usage
doc_id = insert_document(
    content="Our refund policy allows full refunds within 30 days of purchase.",
    metadata={"category": "policy", "version": "2.1"},
    source="policies/refund.md",
)

results = search_documents("Can I get my money back?", n_results=3)
for r in results:
    print(f"[{r['similarity']:.3f}] {r['content'][:100]}")

Query Performance Tuning

-- Increase HNSW search width for higher recall (slower queries)
SET hnsw.ef_search = 100;  -- Default is 40

-- Increase IVF probes for higher recall (slower queries)
SET ivfflat.probes = 10;  -- Default is 1

-- Check index usage
EXPLAIN ANALYZE
SELECT id, content, 1 - (embedding <=> '[0.1, 0.2, ...]'::vector) as similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;

pgvector Strengths and Limitations

Strengths:

No new infrastructure if you already run Postgres
Full SQL capabilities — joins, transactions, complex queries
Battle-tested Postgres ecosystem: backups, replication, monitoring
Metadata filtering is just SQL WHERE clauses
ACID transactions — consistent reads and writes

Limitations:

Performance ceiling around 5-10M vectors per table (depends on hardware)
HNSW index lives in shared memory — competes with other Postgres workloads
Horizontal scaling requires Postgres sharding (complex)
No built-in multi-tenancy or namespace isolation

Verdict: The right choice for most production teams. If you run Postgres, use pgvector. You avoid introducing new infrastructure, and Postgres operational patterns (backups, monitoring, replication) apply directly.

Pinecone Deep Dive

Pinecone is a fully managed vector database. You do not run servers, manage indexes, or worry about scaling. You get an API endpoint and it handles the rest.

Setup

pip install pinecone

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create an index
pc.create_index(
    name="knowledge-base",
    dimension=1536,             # Must match your embedding model
    metric="cosine",            # cosine, euclidean, or dotproduct
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1",
    ),
)

# Connect to the index
index = pc.Index("knowledge-base")

CRUD Operations

from openai import OpenAI

oai_client = OpenAI()

def embed(text: str) -> list[float]:
    response = oai_client.embeddings.create(
        model="text-embedding-3-small", input=text
    )
    return response.data[0].embedding

# UPSERT — insert or update vectors
index.upsert(
    vectors=[
        {
            "id": "doc1",
            "values": embed("How to reset your password"),
            "metadata": {
                "content": "How to reset your password",
                "category": "account",
                "source": "faq.md",
            },
        },
        {
            "id": "doc2",
            "values": embed("Billing FAQ and payment methods"),
            "metadata": {
                "content": "Billing FAQ and payment methods",
                "category": "billing",
                "source": "faq.md",
            },
        },
    ],
    namespace="production",  # Namespaces isolate data within an index
)

# QUERY — similarity search
results = index.query(
    vector=embed("forgot my login"),
    top_k=5,
    namespace="production",
    include_metadata=True,
    filter={
        "category": {"$in": ["account", "technical"]},
    },
)

for match in results.matches:
    print(f"[{match.score:.3f}] {match.metadata['content'][:100]}")

# FETCH — get by ID
fetched = index.fetch(ids=["doc1"], namespace="production")

# DELETE
index.delete(ids=["doc2"], namespace="production")

# DELETE by filter
index.delete(
    filter={"category": "billing"},
    namespace="production",
)

# Index statistics
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
print(f"Namespaces: {stats.namespaces}")

Batch Upsert with Error Handling

def batch_upsert(
    documents: list[dict],
    namespace: str = "production",
    batch_size: int = 100,
):
    """Upsert documents in batches with error handling."""
    import time
    
    total = len(documents)
    
    for i in range(0, total, batch_size):
        batch = documents[i:i + batch_size]
        
        # Embed batch
        texts = [doc["content"] for doc in batch]
        response = oai_client.embeddings.create(
            model="text-embedding-3-small",
            input=texts,
        )
        
        # Build vectors
        vectors = []
        for j, doc in enumerate(batch):
            vectors.append({
                "id": doc["id"],
                "values": response.data[j].embedding,
                "metadata": {
                    "content": doc["content"][:1000],  # Pinecone metadata limit: 40KB
                    **doc.get("metadata", {}),
                },
            })
        
        # Upsert with retry
        for attempt in range(3):
            try:
                index.upsert(vectors=vectors, namespace=namespace)
                break
            except Exception as e:
                if attempt == 2:
                    raise
                time.sleep(2 ** attempt)
        
        print(f"Upserted {min(i + batch_size, total)}/{total}")

Pinecone Pricing Model

Pinecone charges based on:

Storage: per GB of stored vectors
Read units: per query
Write units: per upsert

Serverless pricing (as of 2026):

Reads: ~$8 per million read units
Writes: ~$2 per million write units
Storage: ~$0.33 per GB/month

For 1M vectors at 1536 dimensions: roughly $2-5/month storage plus usage-based query costs. Affordable for most teams, but costs scale linearly with data and traffic.

Pinecone Strengths and Limitations

Strengths:

Zero operations — no servers, no tuning, no backups
Scales to billions of vectors
Fast queries (~10-50ms) with consistent latency
Namespaces for multi-tenancy
Good metadata filtering

Limitations:

Vendor lock-in — proprietary API, no self-hosted option
Metadata size limit: 40KB per vector
No full-text search (vectors only)
Limited query expressiveness compared to SQL
Cold start on serverless (first query after idle can be slow)

Verdict: Best choice for teams that want zero operational overhead and can accept vendor lock-in. Excellent at scale. Consider pgvector if you want to avoid lock-in.

Qdrant Deep Dive

Qdrant is a high-performance vector database built in Rust. It can be self-hosted or used as a managed cloud service. It offers the best self-hosted performance characteristics.

Self-Hosted Setup

# Docker
docker run -d \
  --name qdrant \
  -p 6333:6333 \
  -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage:z \
  qdrant/qdrant

Python Client

pip install qdrant-client

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct,
    Filter, FieldCondition, MatchValue,
)

client = QdrantClient(host="localhost", port=6333)

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,
        distance=Distance.COSINE,
    ),
)

# Insert vectors
from openai import OpenAI
oai = OpenAI()

def embed(text: str) -> list[float]:
    response = oai.embeddings.create(model="text-embedding-3-small", input=text)
    return response.data[0].embedding

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=embed("Password reset instructions"),
            payload={
                "content": "Password reset instructions",
                "category": "account",
                "source": "faq.md",
            },
        ),
        PointStruct(
            id=2,
            vector=embed("Billing and payment FAQ"),
            payload={
                "content": "Billing and payment FAQ",
                "category": "billing",
                "source": "faq.md",
            },
        ),
    ],
)

# Search with filter
results = client.query_points(
    collection_name="documents",
    query=embed("I can't log in"),
    limit=5,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="account"),
            ),
        ]
    ),
)

for point in results.points:
    print(f"[{point.score:.3f}] {point.payload['content']}")

Qdrant Strengths and Limitations

Strengths:

Built in Rust — excellent query performance and memory efficiency
Rich filtering (nested conditions, geo, range, full-text)
Payload (metadata) indexing for fast filtered queries
Quantization support for reduced memory usage
Self-hosted or managed cloud options

Limitations:

Smaller ecosystem than Postgres or Pinecone
Self-hosted means you manage infrastructure
Less mature tooling for backups and monitoring
Newer project — less battle-tested in production

Verdict: Best self-hosted option for teams that need high performance and are comfortable managing infrastructure. The Rust foundation gives it an edge on raw performance.

Head-to-Head Comparison

Factor	ChromaDB	pgvector	Pinecone	Qdrant
Setup	`pip install`	Postgres extension	API key	Docker/binary
Operations	Zero	Postgres ops	Zero (managed)	Self-managed
Max scale	~500K vectors	~5-10M per table	Billions	~100M self-hosted
Query latency (1M)	~5ms	~3-10ms	~10-50ms	~2-5ms
Insert speed	Fast	Medium	Medium	Fast
Metadata filtering	Basic	Full SQL	Good	Rich
Hybrid search	No	With pg_trgm	No	Yes
Authentication	None	Postgres auth	API key	API key
Backups	Manual	pg_dump	Automatic	Manual
Cost	Free	Postgres hosting	Pay-per-use	Free (self-hosted)
Vendor lock-in	Low	None	High	Low
Best for	Prototyping	Postgres shops	Zero-ops at scale	Self-hosted perf

Performance Benchmarks

These benchmarks use OpenAI text-embedding-3-small (1536 dimensions), cosine similarity, and a single-machine setup. Results are from a 4-core, 16GB RAM machine.

Insert Speed (vectors per second)

Database	100K vectors	1M vectors	10M vectors
ChromaDB	5,000/s	3,500/s	N/A (OOM)
pgvector (no index)	8,000/s	6,000/s	4,000/s
pgvector (HNSW)	2,000/s	1,200/s	800/s
Pinecone (serverless)	3,000/s	2,500/s	2,000/s
Qdrant	10,000/s	7,000/s	5,000/s

Note: pgvector with HNSW index is slow on inserts because the index updates on every write. Build the index after bulk loading for better performance.

Query Latency (p50, top-5 results)

Database	100K vectors	1M vectors	10M vectors
ChromaDB	2ms	5ms	N/A
pgvector (HNSW)	2ms	4ms	12ms
pgvector (IVFFlat)	5ms	10ms	25ms
Pinecone	15ms	20ms	30ms
Qdrant	1ms	3ms	8ms

Pinecone latency includes network round-trip (managed service). Local databases benefit from no network hop. In production, add 10-50ms for network latency to Pinecone numbers based on region.

Query Latency with Metadata Filter (1M vectors)

Database	No filter	1 filter	3 filters
ChromaDB	5ms	8ms	15ms
pgvector	4ms	6ms	10ms
Pinecone	20ms	25ms	30ms
Qdrant	3ms	4ms	6ms

Qdrant and pgvector handle metadata filtering most efficiently. ChromaDB degrades more noticeably with complex filters.

Migration Patterns

Eventually you will outgrow ChromaDB and need to migrate. Here are tested patterns.

ChromaDB to pgvector

import chromadb
import psycopg2
import json

def migrate_chroma_to_pgvector(
    chroma_path: str,
    collection_name: str,
    pg_connection_string: str,
):
    """Migrate all vectors from ChromaDB to pgvector."""
    
    # Connect to ChromaDB
    chroma = chromadb.PersistentClient(path=chroma_path)
    collection = chroma.get_collection(collection_name)
    
    # Get all data
    total = collection.count()
    print(f"Migrating {total} vectors from ChromaDB to pgvector...")
    
    batch_size = 1000
    conn = psycopg2.connect(pg_connection_string)
    cur = conn.cursor()
    
    for offset in range(0, total, batch_size):
        # Fetch batch from ChromaDB
        results = collection.get(
            limit=batch_size,
            offset=offset,
            include=["documents", "metadatas", "embeddings"],
        )
        
        # Insert into pgvector
        for i in range(len(results["ids"])):
            cur.execute(
                """
                INSERT INTO documents (content, embedding, metadata, source)
                VALUES (%s, %s::vector, %s, %s)
                ON CONFLICT DO NOTHING
                """,
                (
                    results["documents"][i],
                    str(results["embeddings"][i]),
                    json.dumps(results["metadatas"][i]),
                    results["metadatas"][i].get("source", "chromadb_migration"),
                ),
            )
        
        conn.commit()
        print(f"  Migrated {min(offset + batch_size, total)}/{total}")
    
    cur.close()
    conn.close()
    print("Migration complete. Now build the HNSW index:")
    print("  CREATE INDEX idx_embedding ON documents USING hnsw (embedding vector_cosine_ops);")

ChromaDB to Pinecone

import chromadb
from pinecone import Pinecone

def migrate_chroma_to_pinecone(
    chroma_path: str,
    collection_name: str,
    pinecone_api_key: str,
    pinecone_index_name: str,
    namespace: str = "default",
):
    """Migrate all vectors from ChromaDB to Pinecone."""
    
    chroma = chromadb.PersistentClient(path=chroma_path)
    collection = chroma.get_collection(collection_name)
    
    pc = Pinecone(api_key=pinecone_api_key)
    index = pc.Index(pinecone_index_name)
    
    total = collection.count()
    batch_size = 100  # Pinecone recommends smaller batches
    
    for offset in range(0, total, batch_size):
        results = collection.get(
            limit=batch_size,
            offset=offset,
            include=["documents", "metadatas", "embeddings"],
        )
        
        vectors = []
        for i in range(len(results["ids"])):
            metadata = results["metadatas"][i] or {}
            metadata["content"] = (results["documents"][i] or "")[:1000]
            
            vectors.append({
                "id": results["ids"][i],
                "values": results["embeddings"][i],
                "metadata": metadata,
            })
        
        index.upsert(vectors=vectors, namespace=namespace)
        print(f"  Migrated {min(offset + batch_size, total)}/{total}")
    
    print("Migration complete.")

Migration Checklist

Before migrating in production:

Verify embedding compatibility. If you used a custom embedding function in ChromaDB, use the same model and parameters in the target database.
Build indexes after bulk load. For pgvector, load all data first, then create the HNSW/IVF index. Building the index during inserts is 5-10x slower.
Test query parity. Run your test suite against both databases and compare results. They will not be identical (different ANN implementations), but should be close.
Run in parallel. Write to both databases during migration. Switch reads once you have confirmed parity.
Monitor latency after migration. The new database may have different performance characteristics under your specific query patterns.

Production Considerations

Backups

Database	Backup Strategy
ChromaDB	Copy the data directory (`cp -r ./chroma_data ./backup`)
pgvector	Standard Postgres: `pg_dump`, WAL archiving, pg_basebackup
Pinecone	Automatic (managed). Create collections from backup via API.
Qdrant	Snapshot API: `POST /collections/{name}/snapshots`

Monitoring

Key metrics to track for any vector database:

METRICS = {
    "query_latency_p50": "Should be < 50ms for good UX",
    "query_latency_p99": "Should be < 200ms",
    "insert_latency_p50": "Track for batch ingestion pipelines",
    "index_size_bytes": "Monitor for capacity planning",
    "total_vectors": "Track growth rate",
    "recall_at_k": "Measure with a ground truth test set",
    "filter_overhead_ms": "Time added by metadata filters",
    "error_rate": "Failed queries / total queries",
}

For pgvector, add standard Postgres metrics: connection pool utilization, shared buffer hit ratio, WAL write rate, replication lag.

Scaling Patterns

Vertical scaling (bigger machine):

Works up to ~5M vectors for pgvector, ~10M for Qdrant
Simple but has a ceiling

Horizontal scaling:

Pinecone: automatic (managed sharding)
pgvector: Postgres sharding (Citus, partitioning by tenant)
Qdrant: built-in sharding and replication
ChromaDB: not supported (single process)

Multi-tenant isolation:

Pinecone: namespaces (logical isolation within an index)
pgvector: separate schemas or row-level security
Qdrant: collections per tenant or payload-based filtering
ChromaDB: separate collections

Decision Framework

Use this flowchart to pick the right vector database for your team:

Are you prototyping or in early development?
├── YES → ChromaDB (zero setup, iterate fast)
└── NO → Do you already run PostgreSQL?
    ├── YES → Will you exceed 5M vectors?
    │   ├── YES → Consider Pinecone or Qdrant
    │   └── NO → pgvector (no new infrastructure)
    └── NO → Is operational simplicity your top priority?
        ├── YES → Pinecone (fully managed, zero ops)
        └── NO → Do you need self-hosted for compliance/privacy?
            ├── YES → Qdrant (best self-hosted performance)
            └── NO → Pinecone (default choice when not on Postgres)

The most common production path: start with ChromaDB during development, migrate to pgvector when you go to production (because you probably already run Postgres), and consider Pinecone or Qdrant only if you hit pgvector’s scale limits.

Code Examples Summary

Here is a unified interface that works with any of the four databases. This abstraction lets you swap implementations without changing your RAG pipeline.

from abc import ABC, abstractmethod

class VectorStore(ABC):
    """Unified interface for vector databases."""
    
    @abstractmethod
    def add(self, ids: list[str], texts: list[str], metadatas: list[dict]) -> None:
        """Add documents to the store."""
        pass
    
    @abstractmethod
    def search(self, query: str, n_results: int = 5, where: dict = None) -> list[dict]:
        """Search for similar documents."""
        pass
    
    @abstractmethod
    def delete(self, ids: list[str]) -> None:
        """Delete documents by ID."""
        pass
    
    @abstractmethod
    def count(self) -> int:
        """Return total document count."""
        pass


class ChromaVectorStore(VectorStore):
    def __init__(self, path: str, collection_name: str):
        import chromadb
        self.client = chromadb.PersistentClient(path=path)
        self.collection = self.client.get_or_create_collection(collection_name)
    
    def add(self, ids, texts, metadatas):
        self.collection.add(ids=ids, documents=texts, metadatas=metadatas)
    
    def search(self, query, n_results=5, where=None):
        params = {"query_texts": [query], "n_results": n_results}
        if where:
            params["where"] = where
        results = self.collection.query(**params)
        return [
            {"content": results["documents"][0][i], "metadata": results["metadatas"][0][i],
             "score": 1 - results["distances"][0][i]}
            for i in range(len(results["documents"][0]))
        ]
    
    def delete(self, ids):
        self.collection.delete(ids=ids)
    
    def count(self):
        return self.collection.count()


class PgVectorStore(VectorStore):
    def __init__(self, connection_string: str):
        import psycopg2
        self.conn_string = connection_string
        self._embed_client = OpenAI()
    
    def _embed(self, text: str) -> list[float]:
        resp = self._embed_client.embeddings.create(
            model="text-embedding-3-small", input=text
        )
        return resp.data[0].embedding
    
    def _get_conn(self):
        import psycopg2
        return psycopg2.connect(self.conn_string)
    
    def add(self, ids, texts, metadatas):
        conn = self._get_conn()
        cur = conn.cursor()
        for text_id, text, meta in zip(ids, texts, metadatas):
            embedding = self._embed(text)
            cur.execute(
                "INSERT INTO documents (id, content, embedding, metadata) "
                "VALUES (%s, %s, %s::vector, %s) ON CONFLICT (id) DO UPDATE "
                "SET content = EXCLUDED.content, embedding = EXCLUDED.embedding",
                (text_id, text, str(embedding), json.dumps(meta)),
            )
        conn.commit()
        cur.close()
        conn.close()
    
    def search(self, query, n_results=5, where=None):
        embedding = self._embed(query)
        conn = self._get_conn()
        cur = conn.cursor()
        sql = (
            "SELECT content, metadata, 1 - (embedding <=> %s::vector) as score "
            "FROM documents ORDER BY embedding <=> %s::vector LIMIT %s"
        )
        cur.execute(sql, (str(embedding), str(embedding), n_results))
        results = [
            {"content": row[0], "metadata": row[1], "score": float(row[2])}
            for row in cur.fetchall()
        ]
        cur.close()
        conn.close()
        return results
    
    def delete(self, ids):
        conn = self._get_conn()
        cur = conn.cursor()
        cur.execute("DELETE FROM documents WHERE id = ANY(%s)", (ids,))
        conn.commit()
        cur.close()
        conn.close()
    
    def count(self):
        conn = self._get_conn()
        cur = conn.cursor()
        cur.execute("SELECT COUNT(*) FROM documents")
        result = cur.fetchone()[0]
        cur.close()
        conn.close()
        return result


# Usage — swap implementations without changing RAG code
# store = ChromaVectorStore("./chroma_db", "knowledge_base")
# store = PgVectorStore("postgresql://user:pass@localhost/ragdb")

# Your RAG pipeline doesn't care which one it is
# results = store.search("How do I reset my password?", n_results=5)

Key Takeaways

ChromaDB for prototyping, pgvector for production. Most teams already run Postgres. Adding pgvector is one CREATE EXTENSION command. No new infrastructure, no new operational burden.
HNSW is the default index choice. It gives the best query speed and recall. Use IVFFlat only when memory is constrained and you have 10M+ vectors.
ANN search is “approximate” in theory, near-perfect in practice. At 99% recall, you miss 1 in 100 nearest neighbors. For RAG, this has no measurable impact on answer quality.
Pinecone eliminates operations but introduces lock-in. If your team is small and does not want to manage infrastructure, it is a good trade. If you value portability, stick with pgvector or Qdrant.
Qdrant is the performance leader for self-hosted. The Rust foundation gives it an edge on query latency and insert speed. Choose it when you need maximum performance and can manage your own infrastructure.
Build a unified abstraction layer. The VectorStore interface pattern lets you swap databases without rewriting your RAG pipeline. This is cheap insurance against changing requirements.
Migrate by running in parallel. Write to both old and new databases, compare results, then switch reads. Never do a hard cutover on a production system.
The vector database is not the bottleneck for most teams. Retrieval quality depends more on your chunking strategy and embedding model than on which database you use. Get those right first (Lessons 5 and 7), then optimize the database.