On Linux, it’s quite easy to get a single Kafka node cluster up & running with Docker. Confluence gives a great primer in their documentation: https://docs.confluent.io/current/installation/docker/docs/quickstart.html#getting-started-with-docker-compose.
On macOS, things get a little bit more complicated because containers are not directly supported in the OS. To use docker on macOS, one must use the docker-machine (older method) or the new Docker for Mac. This post presents at method on both. The result will be a single node cluster. You will be able to:
- produce/consume from another container
- produce/consume from the host
Note: with a few changes to the kafka docker image and the docker-compose.yml file, it's possible to have a multiple nodes cluster.
The problem on Mac is that Docker runs with a VM. It’s true in the case of the Docker Machine but it’s also true in the case of Docker for Mac. The implementation is different but the result is essentially the same: we need an
address:port that is resolvable from the host and from other containers.
On linux, there is no problem, you slap
--network host arguments when you run the containers. This way, anybody can use
localhost:port to communicate.
--network host is useless.
With the Docker Machine, you get an IP, which is the IP of the virtual machine where docker runs. You can use this IP from your host, but it will be not work from another container.
With Docker for Mac, the VM implementation is different and from the host perspective, things are mapped on
localhost. But from within a container, that’s not true,
localhost resolves on the container and not on the host.
The solution I found uses the famous pause container.
To resolves the issue, I did what kubernetes does to allow multiple containers in one pod, to talk to each other using
localhost. We will use the pause container to do port forwarding from the host to the containers. The pause container will expose port on the host, map them on it’s
localhost interface. The other containers will use the pause container’s network interface as their network interface.
Kafka works with Zookeeper and they both need at least one port exposed,
2181 respectively. So we can start a pause container to expose the two ports:
docker run -d --name pause \ -p 9092:9092 \ -p 2181:2181 \ gcr.io/google_containers/pause-amd64:3.0
Now we can start zookeeper:
docker run -d \ --net=container:pause \ --ipc=container:pause \ --pid=container:pause \ -e "ZOOKEEPER_CLIENT_PORT=2181" \ confluentinc/cp-zookeeper:4.0.0-3
An finally, you can start kafka:
docker run -d \ --net=container:pause \ --ipc=container:pause \ --pid=container:pause \ -e "KAFKA_ZOOKEEPER_CONNECT=localhost:2181" \ -e "KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092" \ -e "KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1" \ confluentinc/cp-kafka:4.0.0-3
At this point, you should have a running cluster with on Kafka node:
CONTAINER ID IMAGE PORTS 14a04b6682ca confluentinc/cp-kafka:4.0.0-3 30dfc793b973 confluentinc/cp-zookeeper:4.0.0-3 540fe4395d1f gcr.io/google_containers/pause-amd64:3.0 0.0.0.0:2181->2181/tcp, 0.0.0.0:9092->9092/tcp
Test the cluster
To make sure everything is in order, here are a few commands to test whether the cluster can be used.
Set up a temporary directory with executable to consume/produce from your host:
mkdir /tmp/kafka-tests cd /tmp/kafka-tests wget http://apache.parentingamerica.com/kafka/0.11.0.2/kafka_2.12-0.11.0.2.tgz tar -xvf kafka_2.12-0.11.0.2.tgz cd kafka_2.12-0.11.0.2/bin/
When you have the executables, you can do the following.
Create the topic:
/tmp/kafka-tests/kafka_2.12-0.11.0.2/bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test --partitions 1 --replication-factor 1
seq 1 45 | /tmp/kafka-tests/kafka_2.12-0.11.0.2/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Consume all messages of a topic:
/tmp/kafka-tests/kafka_2.12-0.11.0.2/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Consume from another container:
docker run -it \ --net=container:pause \ --ipc=container:pause \ --pid=container:pause \ confluentinc/cp-kafka:4.0.0-3 \ /usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic test --from-beginning
Produce from another container:
docker run -it \ --net=container:pause \ --ipc=container:pause \ --pid=container:pause \ confluentinc/cp-kafka:4.0.0-3 \ /usr/bin/kafka-console-producer --broker-list localhost:9092 --topic test