codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
EDU›Local LLM · pgvector · building a RAG chatbot›Step 6

Step 6

Prompt design

0 views

Prompt design

One prompt tweak often outperforms swapping models. Four things reliably pay off.

1. System prompt — set the character

system = """You are an assistant answering from internal dev docs.

Rules:
- If the docs do not cover it, reply "Not found in documents"
- Answer in 3–5 sentences, most important first
- Use ```language``` code fences where code applies
- Cite sources as [doc-name]
"""

Put universal rules here; keep user prompts clean.

2. Few-shot

system = """Classify the user question. Categories: bug · feature · question · other.
Output one word only.

Examples:
Q: Login is broken
A: bug
Q: Can you add dark mode?
A: feature
Q: What is the refund policy?
A: question
"""

Two to five examples raise accuracy 10–30%.

3. Enforce output schema

system = """Answer ONLY as this JSON. No other text.
{ "category": "bug" | "feature" | "question" | "other",
  "priority": "high" | "medium" | "low",
  "summary": string (<= 100 chars) }
"""

Parse with Pydantic. Retry on JSON failure. Consider response_format={"type":"json_object"} when supported.

4. Anti-hallucination three

  1. "Only from these documents" — define the scope
  2. "Say you don't know" — escape path
  3. "Cite sources" — verifiable answers

5. Budget · tokens

def fit_context(chunks, max_tokens=4000):
    result, used = [], 0
    for c in chunks:
        t = len(c) // 2
        if used + t > max_tokens: break
        result.append(c); used += t
    return result

Leave 20% for the response. Use tiktoken for accurate counts.

6. A/B testing

PROMPTS = {"v1": "...", "v2": "..."}
selected = PROMPTS[os.environ.get("PROMPT_VERSION", "v2")]

Log responses per version, pick a winner for next release.

7. Gotchas

  • User input injected into the system prompt → prompt injection. Keep it in user role.
  • Skewed few-shot examples bias the model
  • JSON requested, prose returned → retry with "JSON only, no prose"
  • System prompts that are too long waste context and increase latency (300–800 chars is a good window)

Closing

Prompt design benefits from a second pair of eyes, just like code review. Ask "how would this prompt break under adversarial input?".

Next

  • security/02-oauth-state-pkce
  • ai/06-agents-overview

← Step 5

Gemini · OpenAI-compatible APIs

Step 7 →

Step 7 — NotebookLM vs your own RAG