Mastering Event-Driven Architecture with Go and Apache Kafka #
In the landscape of modern backend development in 2025, the shift from monolithic, synchronous systems to decoupled, event-driven architectures (EDA) is not just a trend—it’s a necessity for scale. While HTTP REST and gRPC have their place, they introduce tight coupling and latency chains that can cripple high-throughput systems.
Enter the power couple: Go (Golang) and Apache Kafka.
Go’s concurrency model (goroutines and channels) is practically purpose-built for processing streams of data, while Kafka remains the undisputed king of distributed event streaming. In this guide, we are going to move beyond “Hello World.” We will build a robust producer and consumer system, handle graceful shutdowns, discuss library trade-offs, and implement best practices suitable for a production environment.
Why Event-Driven? #
Before writing code, it’s crucial to understand the why.
- Decoupling: Services don’t need to know who (or what) consumes their data.
- Scalability: You can scale consumers horizontally to handle backpressure without touching the producer.
- Resilience: If a consumer goes down, Kafka retains the messages. No data is lost.
Prerequisites & Environment Setup #
To follow this tutorial, ensure you have the following ready on your development machine:
- Go 1.22+: We are utilizing modern Go idioms.
- Docker & Docker Compose: To run a local Kafka cluster without installing Java manually.
- IDE: VS Code (with Go extension) or JetBrains GoLand.
1. Setting up the Kafka Cluster #
We will use a Kraft-mode Kafka setup (no Zookeeper required) to keep things lightweight and modern. Create a file named docker-compose.yml in your project root:
version: "3"
services:
kafka:
image: 'bitnami/kafka:latest'
ports:
- '9092:9092'
environment:
- KAFKA_CFG_NODE_ID=0
- KAFKA_CFG_PROCESS_ROLES=controller,broker
- KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
- KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
- KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true
volumes:
- kafka_data:/bitnami/kafka
volumes:
kafka_data:
driver: localRun the cluster:
docker-compose up -dChoosing the Right Go Kafka Library #
One of the first hurdles in the Go ecosystem is choosing a client library. Unlike Java, there isn’t one single “official” client. Here is a breakdown of the top contenders as of late 2025:
| Feature | segmentio/kafka-go |
confluent-kafka-go |
IBM/sarama |
|---|---|---|---|
| Type | Pure Go | CGO Wrapper (librdkafka) | Pure Go |
| Performance | High | Very High (Native C) | Medium/High |
| API Usability | Idiomatic, standard library feel | Slight C-style overhead | Low-level, complex config |
| Maintenance | Active | Active (Official Confluent) | Active (Community) |
| Best Use Case | Standard Microservices | High-freq Trading / Max Throughput | Legacy systems |
Decision: For this tutorial, we will use segmentio/kafka-go. It is pure Go (no C compiler dependencies), provides a beautiful API that leverages context.Context, and is more than fast enough for 95% of use cases.
Initialize your project:
mkdir go-kafka-eda
cd go-kafka-eda
go mod init github.com/yourname/go-kafka-eda
go get github.com/segmentio/kafka-goArchitecture Overview #
We will simulate a simple E-Commerce scenario:
- Order Service (Producer): Publishes an event when a user places an order.
- Inventory Service (Consumer): Listens for orders to reserve stock.
Step 1: The Producer Implementation #
In production, you rarely want to create a new connection for every message. We need a persistent Writer that handles batching and retries automatically.
Create a file producer/main.go.
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"time"
"github.com/segmentio/kafka-go"
)
// OrderEvent represents the payload we send to Kafka
type OrderEvent struct {
OrderID string `json:"order_id"`
Customer string `json:"customer"`
Amount float64 `json:"amount"`
Timestamp int64 `json:"timestamp"`
}
func main() {
// 1. Configure the Writer
// The Writer handles connection pooling, batching, and retries internally.
writer := &kafka.Writer{
Addr: kafka.TCP("localhost:9092"),
Topic: "orders-topic",
Balancer: &kafka.LeastBytes{}, // Distributes messages evenly
AllowAutoTopicCreation: true,
// Performance optimization: Batch settings
BatchSize: 100, // Flush when 100 messages are ready
BatchTimeout: 10 * time.Millisecond, // ...or every 10ms
Async: true, // Non-blocking writes (careful with error handling)
}
defer writer.Close()
fmt.Println("🚀 Producer started. Generating orders...")
// 2. Simulate generating orders
for i := 1; i <= 10; i++ {
event := OrderEvent{
OrderID: fmt.Sprintf("ORD-%d", i),
Customer: fmt.Sprintf("User-%d", i),
Amount: 99.99,
Timestamp: time.Now().Unix(),
}
payload, err := json.Marshal(event)
if err != nil {
log.Printf("Failed to marshal event: %v", err)
continue
}
// 3. Write Message
// We use a context with timeout to prevent hanging if Kafka is down
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
err = writer.WriteMessages(ctx, kafka.Message{
Key: []byte(event.OrderID), // Key ensures ordering for specific orders
Value: payload,
})
cancel() // Good practice to cancel context
if err != nil {
log.Printf("Failed to write message: %v", err)
} else {
log.Printf("✅ Sent: %s", event.OrderID)
}
time.Sleep(500 * time.Millisecond)
}
}Key Takeaways for the Producer: #
- Async vs Sync: We enabled
Async: true. In high-throughput systems, this is vital. However, it means errors might be reported asynchronously via aCompletioncallback (not shown here for simplicity) or logged silently. For critical financial data, considerAsync: false. - Message Keys: We set
Key: []byte(event.OrderID). This ensures that all updates for the same order land in the same Kafka partition, guaranteeing order of processing.
Step 2: The Consumer Implementation #
Consuming is where the complexity lies. We need to handle context cancellation (for graceful shutdowns) and commit offsets effectively to ensure “at-least-once” delivery.
Create a file consumer/main.go.
package main
import (
"context"
"encoding/json"
"fmt"
"log"
"os"
"os/signal"
"syscall"
"github.com/segmentio/kafka-go"
)
type OrderEvent struct {
OrderID string `json:"order_id"`
Customer string `json:"customer"`
Amount float64 `json:"amount"`
Timestamp int64 `json:"timestamp"`
}
func main() {
// 1. Configure the Reader
// Using a GroupID ensures we are part of a Consumer Group.
// Kafka handles load balancing if we spin up multiple instances of this app.
reader := kafka.NewReader(kafka.ReaderConfig{
Brokers: []string{"localhost:9092"},
Topic: "orders-topic",
GroupID: "inventory-service-group",
MinBytes: 10e3, // 10KB
MaxBytes: 10e6, // 10MB
CommitInterval: 0, // We will commit explicitly
StartOffset: kafka.FirstOffset,
})
// 2. Setup Graceful Shutdown
// We want to finish processing the current message before quitting.
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
go func() {
<-sigChan
fmt.Println("\n⚠️ Shutdown signal received. Stopping consumer...")
cancel() // Cancel the context to stop the reader loop
}()
fmt.Println("🎧 Inventory Service listening for orders...")
// 3. Consumption Loop
for {
// FetchMessage blocks until a message is received or context is cancelled
m, err := reader.FetchMessage(ctx)
if err != nil {
if ctx.Err() != nil {
// Context cancelled, exit loop gracefully
break
}
log.Printf("Error fetching message: %v", err)
continue
}
// 4. Process the Message
processOrder(ctx, m)
// 5. Explicit Commit
// Only commit AFTER successful processing.
// If the app crashes during processing, the message is re-delivered.
if err := reader.CommitMessages(ctx, m); err != nil {
log.Printf("Failed to commit message: %v", err)
}
}
if err := reader.Close(); err != nil {
log.Printf("failed to close reader: %v", err)
}
fmt.Println("👋 Consumer stopped gracefully.")
}
func processOrder(ctx context.Context, m kafka.Message) {
var event OrderEvent
if err := json.Unmarshal(m.Value, &event); err != nil {
log.Printf("Error parsing JSON: %v", err)
return
}
fmt.Printf("📦 Processing Order: %s | Customer: %s | Partition: %d\n",
event.OrderID, event.Customer, m.Partition)
// Simulate business logic latency
// time.Sleep(100 * time.Millisecond)
}Key Takeaways for the Consumer: #
- Graceful Shutdown: Notice the
signal.Notify. In Kubernetes, pods are killed frequently. If you kill a consumer mid-process, you might leave data in an inconsistent state. Thecontextpropagation ensures we stop fetching new messages but (ideally) finish the current one. FetchMessagevsReadMessage:ReadMessage: Auto-commits offsets. Easier, but risky. If your app crashes after reading but before processing, the message is lost.FetchMessage+CommitMessages: Manual control. This is the Production Standard. We process, then commit.
Running the Application #
- Start Kafka: Ensure Docker is running.
- Start the Consumer:
go run consumer/main.go - Start the Producer (in a new terminal):
go run producer/main.go
You should see the Producer firing off logs and the Consumer immediately picking them up and processing them.
Common Pitfalls and Solutions #
1. The “Poison Pill” Message #
What happens if a message contains malformed JSON? In our code above, we log the error and return. However, if the error was transient (e.g., DB down), and we returned without committing, the message would be re-delivered infinitely, blocking the partition.
Solution:
- Permanent Errors (Invalid JSON): Log it, commit the offset (skip it), and perhaps send it to a “Dead Letter Queue” (a separate Kafka topic for bad messages).
- Transient Errors (DB Down): Retry locally with exponential backoff. Do not commit until successful.
2. Rebalancing Storms #
If your processing logic takes too long, Kafka might think the consumer is dead and trigger a rebalance (assigning partitions to other consumers). This stops the world for your group.
Solution:
- Tune
HeartbeatIntervalandSessionTimeoutin theReaderConfig. - Ensure your processing loop is faster than the session timeout, or process messages asynchronously (though this complicates offset management).
Performance Tuning Checklist #
| Setting | Default | Recommendation for High Throughput |
|---|---|---|
BatchSize (Producer) |
1 | 100 - 1000 (depends on message size) |
BatchTimeout (Producer) |
1s | 10ms - 50ms (reduce latency) |
MinBytes (Consumer) |
1 | 10KB - 100KB (reduces network syscalls) |
Compression |
None | Snappy or Zstd (saves bandwidth significantly) |
Conclusion #
Building event-driven systems with Go and Kafka allows you to create architectures that are loosely coupled yet highly cohesive. By utilizing segmentio/kafka-go, we gain access to a clean, idiomatic Go API that handles the heavy lifting of connection management.
Remember, the code provided here is a solid foundation. As you move to production, focus heavily on your observability (metrics/logging) and your error handling strategies (Dead Letter Queues).
Next Steps:
- Implement a Dead Letter Queue for failed messages.
- Add Prometheus metrics to monitor lag (the delay between production and consumption).
- Explore Schema Registry to enforce data contracts between services.
Happy coding!
Disclaimer: Code examples are intended for educational purposes. Always review security configurations (SSL/SASL) before deploying to production environments.