Local LLM · pgvector · building a RAG chatbot

Sometimes a single ChatGPT call is not enough. Internal docs, personal notes, data you cannot send outside. RAG (Retrieval Augmented Generation) lets an LLM answer only from materials you hand-pick.

Who it's for

Engineers running LLMs on local GPUs or on-prem without sending data out
Anyone who wants a chatbot that answers with citations from their own documents
People wanting a single track covering embeddings, vector search, and prompt design

What you can do afterwards

Run Gemma / Llama family models locally with LM Studio
Store embeddings in PostgreSQL + pgvector with HNSW indexes
Build a minimal FastAPI + LangChain pipeline (retrieve → prompt → generate)
Swap Gemini and local LLMs freely
Control system prompts, few-shot, and output schemas

Flow

[1] Local LLM ──▶ [2] Embeddings ──▶ [3] pgvector ──▶ [4] RAG pipeline
                                                            │
                                                            ▼
                            [7] vs SaaS RAG ◀── [6] Prompts ◀── [5] Cloud switch

The first half (1–4) is the mechanical "turn meaning into numbers and search." The second half (5–7) is the operational judgment on models, prompts, tools.

Steps

Why local LLMs · getting started with LM Studio — OpenAI-compatible endpoint · swapping models · VRAM
Embeddings — text to vectors — the math behind semantic search · 768 dims
pgvector + HNSW setup — install · index choice · cosine vs dot product
RAG pipeline — chunking · retrieve · top-k · rerank · prompt injection
Gemini · OpenAI-compatible APIs — switching local ↔ cloud · cost · latency
Prompt design — system prompts · few-shot · output schemas · hallucination
NotebookLM vs your own RAG — SaaS RAG comparison; choosing the right tool per slot

Prerequisites — python-data-pipeline + Python 3.13+ + uv + PostgreSQL 15+ + LM Studio.

Local LLM · pgvector · building a RAG chatbot

Who it's for

Engineers running LLMs on local GPUs or on-prem without sending data out
Anyone who wants a chatbot that answers with citations from their own documents
People wanting a single track covering embeddings, vector search, and prompt design

What you can do afterwards

Run Gemma / Llama family models locally with LM Studio
Store embeddings in PostgreSQL + pgvector with HNSW indexes
Build a minimal FastAPI + LangChain pipeline (retrieve → prompt → generate)
Swap Gemini and local LLMs freely
Control system prompts, few-shot, and output schemas

Flow

[1] Local LLM ──▶ [2] Embeddings ──▶ [3] pgvector ──▶ [4] RAG pipeline
                                                            │
                                                            ▼
                            [7] vs SaaS RAG ◀── [6] Prompts ◀── [5] Cloud switch

The first half (1–4) is the mechanical "turn meaning into numbers and search." The second half (5–7) is the operational judgment on models, prompts, tools.

Steps

Why local LLMs · getting started with LM Studio — OpenAI-compatible endpoint · swapping models · VRAM
Embeddings — text to vectors — the math behind semantic search · 768 dims
pgvector + HNSW setup — install · index choice · cosine vs dot product
RAG pipeline — chunking · retrieve · top-k · rerank · prompt injection
Gemini · OpenAI-compatible APIs — switching local ↔ cloud · cost · latency
Prompt design — system prompts · few-shot · output schemas · hallucination
NotebookLM vs your own RAG — SaaS RAG comparison; choosing the right tool per slot

Prerequisites — python-data-pipeline + Python 3.13+ + uv + PostgreSQL 15+ + LM Studio.

Local LLM · pgvector · building a RAG chatbot

Local LLM · pgvector · building a RAG chatbot

Who it's for

What you can do afterwards

Flow

Steps

Lessons

Why local LLMs · getting started with LM Studio

Embeddings — text to vectors

pgvector + HNSW setup

RAG pipeline

Gemini · OpenAI-compatible APIs

Prompt design

Step 7 — NotebookLM vs your own RAG

Other courses

Local LLM · pgvector · building a RAG chatbot

Local LLM · pgvector · building a RAG chatbot

Who it's for

What you can do afterwards

Flow

Steps

Lessons

Why local LLMs · getting started with LM Studio

Embeddings — text to vectors

pgvector + HNSW setup

RAG pipeline

Gemini · OpenAI-compatible APIs

Prompt design

Step 7 — NotebookLM vs your own RAG

Other courses