codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
EDU›Building public-data crawlers›Step 6

Step 6

Observability · alerts

0 views

Observability · alerts

Crawlers break quietly — site redesigns, bans, network blips. Without dashboards and alerts you miss weeks of missing data.

1. What to collect

  • Success rate
  • Latency (p50/p95/p99)
  • Rows ingested per day / source
  • Block signals (403/429/CAPTCHA)
  • Queue lag

2. Structured logging

import json, time, logging
logger = logging.getLogger("crawler")

def log(level, event, **fields):
    logger.log(getattr(logging, level.upper()), json.dumps({
        "event": event, "ts": time.time(), **fields
    }))

log("info", "fetch_ok", url=url, status=200, latency_ms=320)
log("warn", "fetch_blocked", url=url, status=429)

3. PostgreSQL events table

CREATE TABLE crawl_events (
  id BIGSERIAL PRIMARY KEY,
  source VARCHAR NOT NULL,
  status INT NOT NULL,
  latency_ms INT,
  rows_inserted INT DEFAULT 0,
  error_type VARCHAR,
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX ON crawl_events (source, created_at DESC);

4. Aggregation

SELECT source,
  count(*) FILTER (WHERE status = 200) * 100.0 / NULLIF(count(*), 0) AS success_rate,
  count(*) AS total,
  avg(latency_ms) AS avg_latency
FROM crawl_events
WHERE created_at > now() - interval '24 hours'
GROUP BY source;

5. Alerts

if success_rate < 0.8:
    await send_slack(f"⚠️ {source} success {success_rate:.1%} (24h)")

async def send_slack(text):
    webhook = os.environ["SLACK_WEBHOOK_URL"]
    async with aiohttp.ClientSession() as s:
        await s.post(webhook, json={"text": text})

6. Alert hygiene

  • Don't alert on everything
  • Suppress repeats
  • Separate INFO/WARN/CRITICAL
  • After-hours policy for CRITICAL only

7. Daily summary

@scheduler.scheduled_job("cron", hour=9, minute=0, timezone="Asia/Seoul")
async def daily_summary():
    stats = await fetch_yesterday_stats()
    await send_slack(format_report(stats))

A one-liner every morning catches regressions early.

8. Prometheus + Grafana (optional)

from prometheus_client import Counter, Histogram
fetch_total = Counter("crawler_fetch_total", "requests", ["source", "status"])
fetch_latency = Histogram("crawler_fetch_latency_seconds", "latency", ["source"])

with fetch_latency.labels(source="nps").time():
    resp = await session.get(url)
fetch_total.labels(source="nps", status=resp.status).inc()

Worth it only with many crawlers.

9. Sentry

import sentry_sdk
sentry_sdk.init(dsn=os.environ["SENTRY_DSN"])

try:
    await crawl_job()
except Exception as e:
    sentry_sdk.capture_exception(e); raise

10. Healthcheck

@app.get("/health/crawler")
async def health():
    last = await db.fetchval("SELECT MAX(created_at) FROM crawl_events WHERE status=200")
    if (now() - last).total_seconds() / 3600 > 25:
        raise HTTPException(503, "crawler stale")
    return {"status": "ok", "last_success": last}

External uptime monitors poll this endpoint.

11. Gotchas

  • Alert fatigue
  • No alerts → silent failures
  • Paging on INFO → disturbed sleep
  • Only watching errors → miss gradual degradation

Closing

A one-line Slack summary is often the most valuable dashboard you'll ever build.

Next

  • security/06-headers-and-cors
  • quality/03-observability-minimal

← Step 5

Incremental collection · deduplication

🎉 You finished Building public-data crawlers

What's next? Pick another course below.

Next: Monorepo · SSOT · layer separation thinking →Browse all courses