Local LLM · pgvector · building a RAG chatbot
Build a chatbot that answers from your own documents with LM Studio + pgvector + Gemini. Seven steps — embeddings, prompts, and a SaaS comparison.
- Difficulty
- advanced
- Lessons
- 7
Local LLM · pgvector · building a RAG chatbot
Sometimes a single ChatGPT call is not enough. Internal docs, personal notes, data you cannot send outside. RAG (Retrieval Augmented Generation) lets an LLM answer only from materials you hand-pick.
Who it's for
- Engineers running LLMs on local GPUs or on-prem without sending data out
- Anyone who wants a chatbot that answers with citations from their own documents
- People wanting a single track covering embeddings, vector search, and prompt design
What you can do afterwards
- Run Gemma / Llama family models locally with LM Studio
- Store embeddings in PostgreSQL + pgvector with HNSW indexes
- Build a minimal FastAPI + LangChain pipeline (retrieve → prompt → generate)
- Swap Gemini and local LLMs freely
- Control system prompts, few-shot, and output schemas
Flow
[1] Local LLM ──▶ [2] Embeddings ──▶ [3] pgvector ──▶ [4] RAG pipeline
│
▼
[7] vs SaaS RAG ◀── [6] Prompts ◀── [5] Cloud switch
The first half (1–4) is the mechanical "turn meaning into numbers and search." The second half (5–7) is the operational judgment on models, prompts, tools.
Steps
- Why local LLMs · getting started with LM Studio — OpenAI-compatible endpoint · swapping models · VRAM
- Embeddings — text to vectors — the math behind semantic search · 768 dims
- pgvector + HNSW setup — install · index choice · cosine vs dot product
- RAG pipeline — chunking · retrieve · top-k · rerank · prompt injection
- Gemini · OpenAI-compatible APIs — switching local ↔ cloud · cost · latency
- Prompt design — system prompts · few-shot · output schemas · hallucination
- NotebookLM vs your own RAG — SaaS RAG comparison; choosing the right tool per slot
Prerequisites — python-data-pipeline + Python 3.13+ + uv + PostgreSQL 15+ + LM Studio.
Lessons
Other courses
All courses →- Getting Started with a Dev Environment
- From HTML/CSS/JS to React, Next.js, Tailwind
- Build Your First Fullstack App with Next.js 16
- Backend with Spring Boot 4
- Python · FastAPI · Data Pipelines
- AI-native developer tooling — Claude Code · MCP · design tools
- Docker · Caddy · Cloud — 10 deploy options
- Central admin platform — many domains behind one hub
- Tauri 2 — desktop · mobile in one codebase
- Testing strategy and quality gates
- Web security foundations — JWT · OAuth · OWASP
- PostgreSQL in depth + Redis · Kafka
- Building public-data crawlers
- Monorepo · SSOT · layer separation thinking