基于Docker-Compose的Kafka单机与集群部署指南

一、部署前准备与环境要求

1.1 基础环境配置

Docker Compose部署Kafka需满足以下条件:

  • Docker版本≥20.10.0(推荐最新稳定版)
  • Docker Compose版本≥1.29.0
  • 服务器资源要求:
    • 单机版:2核CPU/4GB内存/20GB磁盘
    • 集群版(3节点):4核CPU/8GB内存/50GB磁盘
  • 操作系统:Linux(Ubuntu 20.04+或CentOS 7+)

1.2 网络与存储规划

建议配置独立网络:

  1. docker network create kafka-net --driver bridge --subnet 172.20.0.0/16

存储方案选择:

  • 开发环境:使用Docker卷(推荐)
  • 生产环境:绑定主机目录或使用NFS

二、单机版部署方案

2.1 基础配置文件

创建docker-compose-single.yml

  1. version: '3.8'
  2. services:
  3. zookeeper:
  4. image: confluentinc/cp-zookeeper:7.3.0
  5. container_name: zookeeper
  6. environment:
  7. ZOOKEEPER_CLIENT_PORT: 2181
  8. ZOOKEEPER_TICK_TIME: 2000
  9. ports:
  10. - "2181:2181"
  11. networks:
  12. - kafka-net
  13. kafka:
  14. image: confluentinc/cp-kafka:7.3.0
  15. container_name: kafka
  16. depends_on:
  17. - zookeeper
  18. ports:
  19. - "9092:9092"
  20. environment:
  21. KAFKA_BROKER_ID: 1
  22. KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
  23. KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
  24. KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
  25. KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
  26. KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
  27. networks:
  28. - kafka-net
  29. networks:
  30. kafka-net:
  31. external: true

2.2 关键配置解析

  1. Zookeeper配置

    • ZOOKEEPER_TICK_TIME:基础时间单位(ms),影响心跳检测
    • 内存限制建议:-Xmx1g -Xms1g(通过JVM_OPTS环境变量设置)
  2. Kafka配置

    • KAFKA_BROKER_ID:必须唯一标识
    • KAFKA_ADVERTISED_LISTENERS:客户端连接地址
    • KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR:单机版必须设为1

2.3 启动与验证

  1. docker-compose -f docker-compose-single.yml up -d

验证命令:

  1. # 创建测试topic
  2. docker exec -it kafka kafka-topics --create --topic test --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
  3. # 发送消息
  4. docker exec -it kafka bash -c "echo 'test message' | kafka-console-producer --topic test --bootstrap-server localhost:9092"
  5. # 消费消息
  6. docker exec -it kafka kafka-console-consumer --topic test --from-beginning --bootstrap-server localhost:9092

三、集群版部署方案

3.1 三节点集群配置

创建docker-compose-cluster.yml

  1. version: '3.8'
  2. services:
  3. zookeeper:
  4. image: confluentinc/cp-zookeeper:7.3.0
  5. container_name: zookeeper
  6. environment:
  7. ZOOKEEPER_SERVER_ID: 1
  8. ZOOKEEPER_CLIENT_PORT: 2181
  9. ZOOKEEPER_TICK_TIME: 2000
  10. ZOOKEEPER_INIT_LIMIT: 5
  11. ZOOKEEPER_SYNC_LIMIT: 2
  12. ZOOKEEPER_SERVERS: zookeeper:2888:3888
  13. ports:
  14. - "2181:2181"
  15. networks:
  16. - kafka-net
  17. kafka1:
  18. image: confluentinc/cp-kafka:7.3.0
  19. container_name: kafka1
  20. depends_on:
  21. - zookeeper
  22. ports:
  23. - "9092:9092"
  24. environment:
  25. KAFKA_BROKER_ID: 1
  26. KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
  27. KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
  28. KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka1:19092,EXTERNAL://${HOST_IP}:9092
  29. KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
  30. KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
  31. KAFKA_MIN_INSYNC_REPLICAS: 2
  32. networks:
  33. - kafka-net
  34. kafka2:
  35. image: confluentinc/cp-kafka:7.3.0
  36. container_name: kafka2
  37. depends_on:
  38. - zookeeper
  39. ports:
  40. - "9093:9093"
  41. environment:
  42. KAFKA_BROKER_ID: 2
  43. KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
  44. KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
  45. KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka2:19093,EXTERNAL://${HOST_IP}:9093
  46. KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
  47. networks:
  48. - kafka-net
  49. kafka3:
  50. image: confluentinc/cp-kafka:7.3.0
  51. container_name: kafka3
  52. depends_on:
  53. - zookeeper
  54. ports:
  55. - "9094:9094"
  56. environment:
  57. KAFKA_BROKER_ID: 3
  58. KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
  59. KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
  60. KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka3:19094,EXTERNAL://${HOST_IP}:9094
  61. KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
  62. networks:
  63. - kafka-net

3.2 集群配置要点

  1. Zookeeper集群模式

    • 需要配置ZOOKEEPER_SERVERS环境变量
    • 每个节点需有唯一ZOOKEEPER_SERVER_ID
  2. Kafka多节点配置

    • 每个broker必须有唯一KAFKA_BROKER_ID
    • 推荐使用双协议监听:
      1. KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      2. KAFKA_ADVERTISED_LISTENERS: INTERNAL://kafka1:19092,EXTERNAL://192.168.1.100:9092
    • 关键参数:
      • KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR:建议与节点数相同
      • KAFKA_MIN_INSYNC_REPLICAS:至少2(3节点集群)

3.3 集群验证测试

  1. # 创建3副本topic
  2. docker exec -it kafka1 kafka-topics --create --topic cluster-test --bootstrap-server kafka1:19092 --partitions 3 --replication-factor 3
  3. # 查看topic详情
  4. docker exec -it kafka1 kafka-topics --describe --topic cluster-test --bootstrap-server kafka1:19092
  5. # 测试高可用
  6. # 停止一个broker后验证消息仍可正常收发
  7. docker stop kafka2

四、生产环境优化建议

4.1 性能调优参数

  1. environment:
  2. KAFKA_NUM_PARTITIONS: 6 # 默认分区数
  3. KAFKA_LOG_RETENTION_HOURS: 168 # 消息保留时间
  4. KAFKA_LOG_SEGMENT_BYTES: 1073741824 # 1GB段大小
  5. KAFKA_MESSAGE_MAX_BYTES: 1000012 # 最大消息大小
  6. KAFKA_NUM_NETWORK_THREADS: 3 # 网络线程数
  7. KAFKA_NUM_IO_THREADS: 8 # IO线程数

4.2 监控集成方案

推荐配置:

  1. JMX导出
    1. environment:
    2. KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.management.jmxremote.rmi.port=9999 -Djava.rmi.server.hostname=localhost"
  2. Prometheus+Grafana监控
    • 使用bitnami/jmx-exporter容器采集指标
    • 配置Grafana仪表盘(ID:7589)

4.3 备份恢复策略

  1. 定期备份

    1. # 备份元数据
    2. docker exec -it zookeeper bash -c "echo stat | nc localhost 2181 > /tmp/zookeeper_stat.log"
    3. # 备份topic数据
    4. docker exec -it kafka1 bash -c "kafka-configs --bootstrap-server localhost:9092 --entity-type topics --describe > /tmp/topics_config.log"
  2. 灾难恢复
    • 使用kafka-mirror-maker进行数据迁移
    • 测试恢复流程:kafka-topics --create --topic restored-topic --bootstrap-server new-cluster:9092 --config replication.factor=3

五、常见问题解决方案

5.1 连接问题排查

  1. 端口不通

    • 检查防火墙规则:iptables -L -n
    • 验证端口监听:netstat -tulnp | grep 9092
  2. 广告地址错误

    • 确保KAFKA_ADVERTISED_LISTENERS配置正确
    • 测试外部访问:telnet <host-ip> 9092

5.2 集群同步问题

  1. UnderReplicatedPartitions警告

    • 检查kafka-topics --describe输出
    • 验证ISR列表是否完整
  2. Zookeeper会话过期

    • 调整zookeeper.session.timeout.ms(默认18000ms)
    • 检查网络延迟:ping zookeeper

5.3 性能瓶颈分析

  1. 生产者延迟

    • 监控record-queue-time-avg指标
    • 调整batch.sizelinger.ms参数
  2. 消费者滞后

    • 监控records-lag-max指标
    • 增加消费者实例或调整fetch.min.bytes

六、进阶部署方案

6.1 使用Kafka Connect

配置示例:

  1. kafka-connect:
  2. image: confluentinc/cp-kafka-connect:7.3.0
  3. container_name: kafka-connect
  4. depends_on:
  5. - kafka1
  6. ports:
  7. - "8083:8083"
  8. environment:
  9. CONNECT_BOOTSTRAP_SERVERS: kafka1:19092
  10. CONNECT_REST_ADVERTISED_HOST_NAME: connect
  11. CONNECT_GROUP_ID: compose-connect-group
  12. CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
  13. CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
  14. CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
  15. networks:
  16. - kafka-net

6.2 集成Schema Registry

  1. schema-registry:
  2. image: confluentinc/cp-schema-registry:7.3.0
  3. container_name: schema-registry
  4. depends_on:
  5. - kafka1
  6. ports:
  7. - "8081:8081"
  8. environment:
  9. SCHEMA_REGISTRY_HOST_NAME: schema-registry
  10. SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: kafka1:19092
  11. networks:
  12. - kafka-net

七、总结与最佳实践

  1. 部署原则

    • 单机版适用于开发测试
    • 生产环境至少3节点集群
    • 副本因子建议设置为节点数
  2. 监控告警

    • 关键指标:UnderReplicatedPartitions、RequestLatency、DiskUsage
    • 告警阈值:ISR收缩>10%、磁盘使用>80%
  3. 升级策略

    • 滚动升级:每次升级1个broker
    • 版本兼容性:确保Zookeeper和Kafka版本匹配
  4. 安全建议

    • 启用SASL_SSL认证
    • 配置ACL权限控制
    • 定期轮换密钥

通过本文提供的Docker Compose配置和操作指南,开发者可以快速搭建满足不同场景需求的Kafka环境。实际部署时建议先在测试环境验证配置,再逐步迁移到生产环境。