codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
EDU›Local LLM · pgvector · building a RAG chatbot›Step 2

Step 2

Embeddings — text to vectors

0 views

Embeddings — text to vectors

Embeddings map text to high-dimensional vectors so "price tag", "receipt", and "bill" sit close together in meaning space.

1. What embeddings answer

  • Word similarity (cat ↔ dog)
  • Sentence meaning ("I want a refund" ↔ "return request")
  • Cross-language (apple ↔ 사과, multilingual models only)

Traditional search (BM25 / TF-IDF) matches tokens and misses these.

2. Dimensions and models

Model Dims Notes
OpenAI text-embedding-3-small 1536 Cheap + strong
OpenAI text-embedding-3-large 3072 High quality, 3x cost
Gemini text-embedding-004 768 Free quota
bge-m3 (local) 1024 Multilingual
multilingual-e5-large 1024 Open, local-friendly

768–1024 dims is the pragmatic balance.

3. Gemini (free API)

import google.generativeai as genai
genai.configure(api_key="...")
resp = genai.embed_content(
    model="models/text-embedding-004",
    content="Audit log — logAdminAction pattern",
    task_type="retrieval_document",
)
vec = resp["embedding"]  # list[float] 768

task_type matters — index _document and queries as _query.

4. Cosine similarity

import numpy as np
def cosine(a, b):
    a, b = np.array(a), np.array(b)
    return float(a @ b / (np.linalg.norm(a) * np.linalg.norm(b)))

In PostgreSQL the <=> operator returns distance (1 − cos), so sort ASC.

5. Quality sanity check

Prepare 10 pairs with the same meaning and confirm average similarity ≥ 0.85. Below that, switch to a multilingual model.

6. Gotchas

  • Mixing query/doc task types
  • One long text without chunking (token limits 512–2048)
  • Not re-embedding when the model changes
  • Missing normalisation (HNSW assumes normalised vectors)

Closing

Embedding quality decides 60–70% of retrieval accuracy. An hour spent on model choice beats a day of prompt tuning.

Next

  • 03-pgvector-hnsw

← Step 1

Why local LLMs · getting started with LM Studio

Step 3 →

pgvector + HNSW setup