codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
EDU›Local LLM · pgvector · building a RAG chatbot

Local LLM · pgvector · building a RAG chatbot

Build a chatbot that answers from your own documents with LM Studio + pgvector + Gemini. Seven steps — embeddings, prompts, and a SaaS comparison.

Start with Step 1 →
Difficulty
advanced
Lessons
7

Local LLM · pgvector · building a RAG chatbot

Sometimes a single ChatGPT call is not enough. Internal docs, personal notes, data you cannot send outside. RAG (Retrieval Augmented Generation) lets an LLM answer only from materials you hand-pick.

Who it's for

  • Engineers running LLMs on local GPUs or on-prem without sending data out
  • Anyone who wants a chatbot that answers with citations from their own documents
  • People wanting a single track covering embeddings, vector search, and prompt design

What you can do afterwards

  • Run Gemma / Llama family models locally with LM Studio
  • Store embeddings in PostgreSQL + pgvector with HNSW indexes
  • Build a minimal FastAPI + LangChain pipeline (retrieve → prompt → generate)
  • Swap Gemini and local LLMs freely
  • Control system prompts, few-shot, and output schemas

Flow

[1] Local LLM ──▶ [2] Embeddings ──▶ [3] pgvector ──▶ [4] RAG pipeline
                                                            │
                                                            ▼
                            [7] vs SaaS RAG ◀── [6] Prompts ◀── [5] Cloud switch

The first half (1–4) is the mechanical "turn meaning into numbers and search." The second half (5–7) is the operational judgment on models, prompts, tools.

Steps

  1. Why local LLMs · getting started with LM Studio — OpenAI-compatible endpoint · swapping models · VRAM
  2. Embeddings — text to vectors — the math behind semantic search · 768 dims
  3. pgvector + HNSW setup — install · index choice · cosine vs dot product
  4. RAG pipeline — chunking · retrieve · top-k · rerank · prompt injection
  5. Gemini · OpenAI-compatible APIs — switching local ↔ cloud · cost · latency
  6. Prompt design — system prompts · few-shot · output schemas · hallucination
  7. NotebookLM vs your own RAG — SaaS RAG comparison; choosing the right tool per slot

Prerequisites — python-data-pipeline + Python 3.13+ + uv + PostgreSQL 15+ + LM Studio.

Lessons

  1. 1

    Why local LLMs · getting started with LM Studio

    →
  2. 2

    Embeddings — text to vectors

    →
  3. 3

    pgvector + HNSW setup

    →
  4. 4

    RAG pipeline

    →
  5. 5

    Gemini · OpenAI-compatible APIs

    →
  6. 6

    Prompt design

    →
  7. 7

    Step 7 — NotebookLM vs your own RAG

    →

Other courses

All courses →
  • Getting Started with a Dev Environment
  • From HTML/CSS/JS to React, Next.js, Tailwind
  • Build Your First Fullstack App with Next.js 16
  • Backend with Spring Boot 4
  • Python · FastAPI · Data Pipelines
  • AI-native developer tooling — Claude Code · MCP · design tools
  • Docker · Caddy · Cloud — 10 deploy options
  • Central admin platform — many domains behind one hub
  • Tauri 2 — desktop · mobile in one codebase
  • Testing strategy and quality gates
  • Web security foundations — JWT · OAuth · OWASP
  • PostgreSQL in depth + Redis · Kafka
  • Building public-data crawlers
  • Monorepo · SSOT · layer separation thinking