codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
Notes›data

pgvector and RAG

Published 2026-04-28· Updated 2026-05-18·0 views

pgvector and RAG

The trend of combining large language model responses with external knowledge has settled under the name RAG (Retrieval-Augmented Generation). At its core sits vector similarity search, and PostgreSQL's pgvector extension is the closest fit.

1. About pgvector

pgvector is a PostgreSQL extension that Andrew Kane published in 2021. It adds the vector type, distance operators, and vector indexes (IVFFlat and HNSW).

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE chunks (
  id BIGSERIAL PRIMARY KEY,
  doc_id BIGINT NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536) NOT NULL
);

Distance operators.

Operator Distance
<-> L2 (Euclidean)
<=> Cosine distance
<#> Negative inner product — smaller is more similar

Use ORDER BY embedding <=> $1 LIMIT 10 to fetch the K nearest.

2. Vector similarity

The starting assumption is that mapping text or images into fixed-dimensional real vectors places semantically close items close in the vector space. Three distance measures are common.

  • Cosine similarity — looks only at direction. Ignores length difference.
  • L2 (Euclidean) — accounts for length too.
  • Inner product — for normalized vectors, effectively the same as cosine.

Embedding API responses from OpenAI and Cohere are usually normalized, so cosine is often used. Model documentation often specifies the recommended distance.

3. IVFFlat

Brute force (Seq Scan plus distance calculation) is fast enough on small data. As data grows, ANN (Approximate Nearest Neighbor) indexes come in.

IVFFlat pre-partitions the vector space into K clusters (lists) and looks at only the nearest few clusters (probes) at search time.

CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

SET ivfflat.probes = 10;
SELECT id FROM chunks ORDER BY embedding <=> $1 LIMIT 10;
  • Fast build, low memory.
  • Accuracy and speed are decided by the balance of lists and probes.
  • Build the index after loading enough data so clustering quality is good.

4. HNSW

Yu. A. Malkov and D. A. Yashunin proposed the algorithm in their 2018 paper. It builds a multilayer graph to find nearest neighbors quickly.

CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

SET hnsw.ef_search = 40;
  • Search accuracy and speed are generally excellent.
  • Index size and build time are larger than IVFFlat.
  • Strong with incremental additions.

HNSW landed in pgvector 0.5 and later versions improved performance and memory.

5. Embedding models

Model Provider Dimension Memo
OpenAI text-embedding-3-small OpenAI API 1536 (or reducible) Multilingual, relatively cheap
OpenAI text-embedding-3-large OpenAI API 3072 High quality
Cohere embed-multilingual-v3.0 Cohere API 1024 Strong multilingual
BGE (BAAI) Open weights 384/768/1024 Self-hostable
nomic-embed-text Nomic 768 Open weights
Sentence-Transformers Open Various Many small models

Higher dimensions generally raise accuracy but storage and search costs rise proportionally. Models like Matryoshka embeddings allow trimming dimensions after the fact.

6. Comparison with dedicated vector DBs

System Origin and year Memo
Pinecone 2019, managed SaaS Simple ops, usage-based pricing
Qdrant 2021, Rust Self-hostable, strong filtering
Weaviate 2019, Go Modular, hybrid search
Milvus 2019, open Large scale, GPU acceleration
Chroma 2022, Python Local-dev friendly
Vespa open-sourced 2017 (Yahoo) Search, ranking, vectors combined
Elasticsearch / OpenSearch 2010 / 2021 Full-text plus vector (kNN)

Strengths of dedicated vector DBs.

  • Very large scale (billions or more) and distributed sharding.
  • Pre-filtering and metadata combinations are often faster.
  • Vector-specific ops tools (statistics, tuning).

Strengths of pgvector.

  • Combine vectors, relational data, and transactions inside one DB.
  • One operational surface (backups, auth, monitoring all in PostgreSQL).
  • Sufficient performance at small to medium scale.

The choice balances data scale, operational headcount, and accuracy needs.

7. RAG pipeline

source → chunking → embedding → storage → retrieval → reranking → LLM context → response

Chunking — split documents into small pieces. Too large and search aim blurs; too small and meaning scatters.

  • Fixed token or character count (e.g. 512 tokens with 50-token overlap).
  • Markdown headings or sentence boundaries.
  • Semantic-based (estimate boundaries with another model).

Embedding and storage — compute embeddings per chunk via a model API or a local model and store in a vector column. Keep metadata (document ID, source URL, date, tags) as separate columns.

Retrieval — embed the query with the same model and fetch the K nearest with <=>. Combining metadata filters (date, document type) is where the RDBMS WHERE shines.

8. Reranking and hybrid search

Vector search is fast but has limits in semantic accuracy. Fetch 30 to 50 results first, then rerank with a cross-encoder or LLM and feed the top 5 to 10 into the LLM context. Models like Cohere Rerank, BGE Reranker, and cross-encoder/ms-marco come up.

Hybrid search combines keyword search (BM25, tsvector) with vector search. In documents heavy on abbreviations and proper nouns, keyword search complements. Simple combination functions like RRF (Reciprocal Rank Fusion) are common.

9. Common pitfalls

Re-embedding obligation after model change — when embedding models change, the vector space changes too. Old vectors and new queries live in different spaces and comparisons are meaningless. Plan migrations early.

Dimension mismatch — putting a different dimension into a vector(1536) column errors out. Separate per-model dimensions into separate columns or tables.

Index build timing — building IVFFlat when there is little data leaves clustering incomplete. Recommended to build the index after some load.

Distance operator must match index — when an index was built with vector_cosine_ops, only <=> uses it. Other operators fall back to Seq Scan.

Meaning broken at chunk boundaries — cuts mid-sentence blur retrieval. Use overlap or align to boundaries.

Context window overflow — feeding all retrieval results to the LLM exceeds the token limit. Combine reranking with summarization.

Closing thoughts

In vector search, chunking, reranking, and metadata design are harder than the implementation. Index type and dimension choices are heavy to revisit, so make them carefully at the start.

Next

  • supabase
  • fcm-push

References: pgvector GitHub, pgvector indexing guide, HNSW paper, OpenAI Embeddings, Pinecone Learn — RAG, Cohere Rerank.

More in data

All in this category →
  • Keep DB seed sources outside the code tree
  • Supabase Storage — File Upload and Permissions
  • Kafka in Practice — Topic Design and Message Flow
  • Orchestrating multiple PostgreSQL pools
  • Backup and Restore
  • Image Pipeline