Kafka on Docker for Mac

2018-02-18

On Linux, it’s quite easy to get a single Kafka node cluster up & running with Docker. Confluence gives a great primer in their documentation: https://docs.confluent.io/current/installation/docker/docs/quickstart.html#getting-started-with-docker-compose.

On macOS, things get a little bit more complicated because containers are not directly supported in the OS. To use docker on macOS, one must use the docker-machine (older method) or the new Docker for Mac. This post presents at method on both. The result will be a single node cluster. You will be able to:

Note: with a few changes to the kafka docker image and the docker-compose.yml file, it's possible to have a multiple nodes cluster.

The problem

The problem on Mac is that Docker runs with a VM. It’s true in the case of the Docker Machine but it’s also true in the case of Docker for Mac. The implementation is different but the result is essentially the same: we need an address:port that is resolvable from the host and from other containers.

On linux, there is no problem, you slap --network host arguments when you run the containers. This way, anybody can use localhost:port to communicate.

On Mac, --network host is useless.

With the Docker Machine, you get an IP, which is the IP of the virtual machine where docker runs. You can use this IP from your host, but it will be not work from another container.

With Docker for Mac, the VM implementation is different and from the host perspective, things are mapped on localhost. But from within a container, that’s not true, localhost resolves on the container and not on the host.

The solution I found uses the famous pause container.

The solution

To resolves the issue, I did what kubernetes does to allow multiple containers in one pod, to talk to each other using localhost. We will use the pause container to do port forwarding from the host to the containers. The pause container will expose port on the host, map them on it’s localhost interface. The other containers will use the pause container’s network interface as their network interface.

Kafka works with Zookeeper and they both need at least one port exposed, 9092 and 2181 respectively. So we can start a pause container to expose the two ports:

docker run -d --name pause \
    -p 9092:9092 \
    -p 2181:2181 \
    gcr.io/google_containers/pause-amd64:3.0

Now we can start zookeeper:

docker run -d \
    --net=container:pause \
    --ipc=container:pause \
    --pid=container:pause \
    -e "ZOOKEEPER_CLIENT_PORT=2181" \
    confluentinc/cp-zookeeper:4.0.0-3

An finally, you can start kafka:

docker run -d \
    --net=container:pause \
    --ipc=container:pause \
    --pid=container:pause \
    -e "KAFKA_ZOOKEEPER_CONNECT=localhost:2181" \
    -e "KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092" \
    -e "KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1" \
    confluentinc/cp-kafka:4.0.0-3

At this point, you should have a running cluster with on Kafka node:

CONTAINER ID  IMAGE                                     PORTS
14a04b6682ca  confluentinc/cp-kafka:4.0.0-3
30dfc793b973  confluentinc/cp-zookeeper:4.0.0-3
540fe4395d1f  gcr.io/google_containers/pause-amd64:3.0  0.0.0.0:2181->2181/tcp, 0.0.0.0:9092->9092/tcp

Test the cluster

To make sure everything is in order, here are a few commands to test whether the cluster can be used.

Set up a temporary directory with executable to consume/produce from your host:

mkdir /tmp/kafka-tests
cd /tmp/kafka-tests

wget http://apache.parentingamerica.com/kafka/0.11.0.2/kafka_2.12-0.11.0.2.tgz
tar -xvf kafka_2.12-0.11.0.2.tgz
cd kafka_2.12-0.11.0.2/bin/

When you have the executables, you can do the following.

Create the topic:

/tmp/kafka-tests/kafka_2.12-0.11.0.2/bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic test --partitions 1 --replication-factor 1

Produce messages:

seq 1 45 | /tmp/kafka-tests/kafka_2.12-0.11.0.2/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Consume all messages of a topic:

/tmp/kafka-tests/kafka_2.12-0.11.0.2/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

Consume from another container:

docker run -it \
    --net=container:pause \
    --ipc=container:pause \
    --pid=container:pause \
    confluentinc/cp-kafka:4.0.0-3 \
    /usr/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic test --from-beginning

Produce from another container:

docker run -it \
    --net=container:pause \
    --ipc=container:pause \
    --pid=container:pause \
    confluentinc/cp-kafka:4.0.0-3 \
    /usr/bin/kafka-console-producer --broker-list localhost:9092 --topic test