Building a real-time geospatial data platform (QuackNet) for crowdsourced network intelligence, I faced a common solo-dev dilemma: how to handle high-volume event ingestion without ops complexity?
My first instinct was to skip Kafka entirely—Redis Streams or NATS would be simpler. But after stress-testing the architecture, I realized I needed 100+ concurrent writers, fault-tolerant persistence, and partitioned parallelism. That's Kafka's job. The problem: traditional Kafka with ZooKeeper is overkill for one developer.
Enter KRaft mode: Kafka's new consensus protocol that eliminates ZooKeeper. Single-process broker+controller, one configuration file, and it just works. Here's what I learned shipping it.
Why Kafka at All?
The QuackNet pipeline is simple in theory:
- Mobile app submits WiFi/cellular/BLE/GPS scan data
- Backend API writes scan events to Kafka
- Consumer group reads events and batches them into ClickHouse (OLAP database)
- Frontend queries ClickHouse for analytics dashboards
Without Kafka, I'd either poll PostgreSQL constantly (wasteful) or have the API block until ClickHouse writes complete (slow, defeats the purpose of async).
Kafka lets the API fire-and-forget: write to Kafka in milliseconds, let the consumer do batch inserts into ClickHouse in parallel. If the consumer crashes, Kafka's topic partitions have the data—no events lost.
Also: multiple consumers. The same scan event feeds analytics, fraud detection, and push alerts. Kafka broadcasts to all subscribers. No code duplication.
The KRaft Trade-Off
Traditional Kafka setup:
- ZooKeeper cluster (3+ nodes for HA)
- Kafka broker nodes
- Monitoring, inter-node replication, metadata quorum
- If you mess it up, ZooKeeper becomes a black box
KRaft mode:
- Single broker+controller process (or multiple for HA)
- No external dependencies
- Raft consensus built into Kafka itself
- Simpler: one Docker container, one port
Trade-off: KRaft is newer (GA in Kafka 3.2+), fewer operators know it well, and edge cases may exist. For a solo project? Perfect. For Airbnb's infrastructure? Maybe stick with ZooKeeper.
Docker Compose Setup (What Actually Works)
I tried bitnami/kafka first. Broke immediately. The bitnami image uses confusing env var prefixes and old Kafka versions. After 2 hours of debugging, I switched to apache/kafka:3.8.1 (official). That worked.
version: '3.8'
services:
kafka:
image: apache/kafka:3.8.1
ports:
- "9092:9092" # broker (internal)
- "9094:9094" # advertised (external)
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: "broker,controller"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT"
KAFKA_INTER_BROKER_LISTENER_NAME: "PLAINTEXT"
KAFKA_ADVERTISED_LISTENERS: "PLAINTEXT://localhost:9094"
KAFKA_CONTROLLER_QUORUM_VOTERS: "1@kafka:9092"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
KAFKA_LOG_RETENTION_HOURS: 168
volumes:
- kafka-data:/var/kafka-logs
volumes:
kafka-data:Key settings:
KAFKA_NODE_ID: 1— Broker ID (must be positive integer)KAFKA_PROCESS_ROLES: "broker,controller"— This is the magic. Single process plays both roles.KAFKA_CONTROLLER_QUORUM_VOTERS: "1@kafka:9092"— Quorum of one (it votes for itself)KAFKA_LOG_RETENTION_HOURS: 168— Keep 7 days of data. Adjust for your scale.
Spin it up:
docker compose up -d kafka
docker exec kafka kafka-topics.sh --create \
--topic netintel.scan.events \
--partitions 6 \
--replication-factor 1 \
--bootstrap-server localhost:9092Done. One command, one container. No ZooKeeper cluster to manage.
The context.Background() Bug
Here's where I got burned. In my Go API handler, I was writing scans to Kafka asynchronously:
func (h *Handler) SubmitScan(c *gin.Context) {
go func() {
err := h.kafkaProducer.Produce(
c.Request.Context(), // ← WRONG!
"netintel.scan.events",
scanJSON,
)
}()
c.JSON(200, gin.H{"status": "accepted"})
}
The problem: c.Request.Context() cancels when the HTTP response is sent. By the time the goroutine tries to write to Kafka, the context is already dead. Silent failure, no error log—the event just got dropped.
The fix:
func (h *Handler) SubmitScan(c *gin.Context) {
go func() {
err := h.kafkaProducer.Produce(
context.Background(), // ← Detached context
"netintel.scan.events",
scanJSON,
)
}()
c.JSON(200, gin.H{"status": "accepted"})
}
context.Background() is never cancelled. It's the root context. Goroutines spawned with it will complete their writes to Kafka even after the HTTP response goes out.
context.Background(). For handler-scoped work (database queries), use c.Request.Context().
When to Use Kafka vs Alternatives
Kafka is powerful but adds complexity. Here's my mental model:
| Use Case | Kafka | Redis/NATS |
|---|---|---|
| Durability (persist to disk?) | Yes, auto | Optional, costly |
| Replay old events? | Yes | No |
| Partitioning / scaling? | Native | Manual sharding |
| Ops complexity? | Higher (KRaft helps) | Much simpler |
| Best for | Multi-consumer analytics pipelines | Task queues, caching |
For QuackNet: we need durability (scan data is valuable), replay capability (re-process events if our ClickHouse pipeline breaks), and multiple consumers (analytics, fraud detection, push alerts). Kafka wins.
If I were just building a job queue? Redis Streams or Bull would be faster to ship.
The Scaling Trap
One gotcha: don't run Kafka in development on your laptop if you're doing serious testing. Kafka's memory footprint is small, but if you generate millions of events, replication and compaction kick in. I once filled my 256GB SSD with Kafka logs because I didn't set KAFKA_LOG_RETENTION_HOURS correctly.
Solution:
- Use named Docker volumes with size limits (or just monitor them)
- Set retention aggressively in dev:
KAFKA_LOG_RETENTION_HOURS: 1 - Use
KAFKA_LOG_SEGMENT_BYTES: 10485760(10MB) to force compaction frequently
Is KRaft Production-Ready?
Yes, but with caveats.
Confluent (Kafka stewards) marked KRaft as GA in Kafka 3.2 (early 2022). It's been battle-tested by mid-tier companies for ~2 years. But the operator community is smaller than ZooKeeper-based Kafka.
For a solo founder with a real-time pipeline? Go for it. For a 100-person SaaS company running 50 Kafka clusters with SLA guarantees? Maybe hire someone who knows ZooKeeper inside-out.
I'm shipping QuackNet with KRaft. When we hit scaling issues (and we will), I'll have time to migrate. For now, it's one less thing to think about.
Conclusion
Kafka KRaft lets solo developers use a battle-tested, production-grade event streaming platform without ops overhead. No ZooKeeper cluster. One Docker image. Six environment variables.
The gotchas are real (context cancellation, image selection, config typos), but they're learnable. If you're building an analytics pipeline, fraud detection layer, or real-time dashboard, Kafka is worth the effort.
KRaft makes it accessible. That's a win for indie builders.