codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
Notes›data

Where Kafka Fits

Published 2026-04-28· Updated 2026-05-18·0 views

Where Kafka Fits, Where It Does Not

Kafka is often called a "queue," but more precisely it is a distributed commit log. Its strengths show beyond queues. At the same time, it is overkill for a simple work queue.

1. About Kafka

Kafka is a distributed messaging and log system that began at LinkedIn. It started as an internal project in 2010, was incubated by the Apache Software Foundation in 2011, and became a top-level project in 2012. The 1.0 release came in 2017.

Event Year
Internal development begins (LinkedIn) 2010
Apache incubation 2011
Apache top-level project 2012
Kafka Streams introduced 0.10 (2016)
Exactly-once semantics 0.11 (2017)
1.0 GA 2017-11
KRaft (ZooKeeper-free mode) 3.3 (2022)
Non-ZooKeeper as default option 3.5+

The design intent from the start was "high throughput, retention extensible by adding disk, reprocessing possible." Some describe it not as a generalization of queues but as the discovery of a distributed log.

2. Topic, partition, consumer group

  • Topic — a logical channel for messages.
  • Partition — the unit of split that lets a topic be handled in parallel and distributed. Order is guaranteed within a partition.
  • Message — key, value, headers, offset. The partition is commonly determined by the key hash.
  • Offset — the position within a partition. Consumers record their progress.
  • Consumer group — consumers in the same group split partitions. Within a group, a partition is assigned to only one consumer.

Thanks to this model, different groups can read the same topic at their own progress. Unlike queues where "pulled means gone," Kafka keeps messages on disk until retention expires.

3. Retention policies

  • Time-based (retention.ms) — for example, 7 days.
  • Size-based (retention.bytes) — for example, 100 GB.
  • Compaction (compaction) — keep only the last value per key — used as key-value snapshot topics.

4. Delivery guarantees

Guarantee Configuration
at-most-once producer does not wait for ack plus consumer auto-commit. Loss possible.
at-least-once default. ack=all plus manual commit. Duplicates possible.
exactly-once producer's idempotent and transactional plus consumer's read-committed. Holds only within Kafka topics. With external systems, idempotent consumers are still recommended.

acks (producer), enable.idempotence, and isolation.level (consumer) are the core settings.

5. Storage, replication, KRaft

Each partition is replicated as leader plus followers. replication.factor is usually 3. Replicas inside the ISR (In-Sync Replicas) are synchronized with the leader. On leader failure, one replica from the ISR becomes the new leader.

Metadata management long used ZooKeeper. From 2022, KRaft (based on the Raft consensus algorithm) emerged so Kafka can run with only its own nodes, no ZooKeeper. Many report a smaller operational surface.

6. Where Kafka is strong

  • Event sourcing and CDC — preservation and replay of every change.
  • Places where multiple consumers read the same stream at different speeds — publish once, consume by many groups.
  • High-throughput log collection — hundreds of thousands of messages per second.
  • Entry to real-time analytics — Flink, Spark Streaming, Kafka Streams.
  • Backfill and reprocessing through message retention.

7. Where Kafka is overkill

  • Simple work queues (email sending, background processing) — RabbitMQ, Redis, SQS are simpler.
  • Short TTL, low throughput — Kafka's operational cost is not justified.
  • Workflows where humans want to look at each task — Airflow-family tools fit better.

8. Other candidates

System Origin and year Model Memo
RabbitMQ 2007, AMQP 0-9-1 based queues, exchanges, routing Routing, round-robin, DLQ. Message persistence and retention are not on Kafka's level.
NATS 2010, Derek Collison pub/sub, JetStream Light, low-latency. JetStream (2020) added persistence.
Redis Streams 2018, Redis 5.0 log + consumer group A model resembling a scaled-down Kafka. Fits places with small data volume.
AWS SQS 2006 simple queue Managed. FIFO queue option. Single message ≤ 256KB.
AWS Kinesis 2013 stream Managed with a model similar to Kafka. 24h to 365d retention.
Google Pub/Sub 2015 pub/sub Managed. Auto-scaling. Ordering option.
Apache Pulsar 2016, Yahoo (open source) tiered (broker + bookie) Multi-tenancy and geo-replication emphasized.

The deciding factor narrows down to one or two of the following.

  • Data retention duration (minutes or days).
  • Throughput (tens to hundreds of thousands per second).
  • Availability of a managed offering.
  • Whether routing and filtering is complex (RabbitMQ excels).
  • Whether multiple consumers read one topic at different speeds (Kafka-style models fit).

9. Topics, consumers, operations

Topic naming — the format <domain>.<entity>.<event> is common (for example, orders.created). Separate environments by prefix or by separate cluster. Manage schemas with a Schema Registry (Avro, Protobuf, JSON Schema).

Consumer design — idempotent processing is the baseline. A DLQ (Dead Letter Queue) sends repeatedly failing messages to a separate topic. For transient external dependencies (e.g. API 5xx), bundle retry plus backoff plus DLQ.

Partition count caps both throughput and consumer count. Setting it too small at first lets us increase it later, but the key-to-partition mapping changes and order assumptions can break.

Monitoring — lag (how far the consumer trails the leader's end), message rate, replication lag.

The practical patterns of topic design and Producer/Consumer implementation are covered in the kafka-topics note.

10. Common pitfalls

Order assumption — order is guaranteed within a partition, not across the topic. With multiple partitions there is no global order.

Changing partition count — increasing is possible, but the key → partition mapping changes. Messages with the same key may now go to a new partition, which can lead to operational accidents.

Consumer group rebalancing — partition reassignment happens when a new consumer joins or leaves. Processing may pause during that (cooperative rebalancing eases it).

Scope of exactly-once — only within Kafka. Consumers writing to an external DB still need idempotent design.

Operational resources — self-hosted Kafka without a managed offering is a heavy load on a small team. Consider managed offerings like Confluent Cloud, MSK, or Aiven.

Closing thoughts

Kafka is not always the answer to "do we need a queue?" It shines only where retention, reprocessing, and multi-consumer truly matter. For small teams, starting with Redis Streams or RabbitMQ and growing from there is safer for operations.

Next

  • kafka-topics
  • pgvector-rag
  • supabase

References: Apache Kafka official docs, Kafka design, KRaft guide, Confluent blog, RabbitMQ official, NATS JetStream, Apache Pulsar.

More in data

All in this category →
  • Keep DB seed sources outside the code tree
  • Supabase Storage — File Upload and Permissions
  • Kafka in Practice — Topic Design and Message Flow
  • Orchestrating multiple PostgreSQL pools
  • Backup and Restore
  • Image Pipeline