Apache Kafka : Install Zookeeper ensemble on AWS EC2

  • 4.4/5
  • 5460
  • Jul 20, 2024

In this article we will see how to setup a three-node Zookeeper ensemble on AWS EC2 instances.

Let's assume we already have three EC2 instances up an running with following public IPs:

15.232.8.517
15.232.46.302
15.0.185.130

Please make sure to open 2181, 2888 and 3888 ports on each machine, as the Zookeeper instances need them to communicate with the client and themselves.

Install OpenJDK 17

The first step is to install OpenJDK 17 on each server as shown below:

sudo apt update && sudo apt upgrade -y
apt-cache search openjdk
sudo apt-get install openjdk-17-jdk
java --version

openjdk 17.0.4 2022-07-19
OpenJDK Runtime Environment (build 17.0.4+8-Ubuntu-122.04)
OpenJDK 64-Bit Server VM (build 17.0.4+8-Ubuntu-122.04, mixed mode, sharing)

Install ZooKeeper

Download ZooKeeper from the release page and move it to "/usr/local/" directory as shown below.

wget https://dlcdn.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz
tar -zxf apache-zookeeper-3.8.0-bin.tar.gz
sudo mv apache-zookeeper-3.8.0-bin /usr/local/zookeeper

Also create a "zookeeper" folder under "/var/lib/", this will act as a data directory for zookeeper.

sudo mkdir -p /var/lib/zookeeper

Each node must have a common configuration that lists all servers, and each server needs a "myid" file in the data directory that specifies the ID number of the server.

touch /var/lib/zookeeper/myid
vi /var/lib/zookeeper/myid

This file (myid) must contain the ID number of the server, which must match the configuration file.

If the ips of the servers in the ensemble are 15.232.8.517, 15.232.46.302, and 15.0.185.130, the configuration file "zoo.cfg" under "/usr/local/zookeeper/conf/" should look like this:

touch /usr/local/zookeeper/conf/zoo.cfg
vi /usr/local/zookeeper/conf/zoo.cfg

1) Node 15.232.8.517

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=20
syncLimit=5
server.1=0.0.0.0:2888:3888
server.2=15.232.46.302:2888:3888
server.3=15.0.185.130:2888:3888

2) Node 15.232.46.302

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=20
syncLimit=5
server.1=15.232.8.517:2888:3888
server.2=0.0.0.0:2888:3888
server.3=15.0.185.130:2888:3888

3) Node 15.0.185.130

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=20
syncLimit=5
server.1=15.232.8.517:2888:3888
server.2=15.232.46.302:2888:3888
server.3=0.0.0.0:2888:3888

Note: You must specify 0.0.0.0 for the current node.

Once these steps are complete, start up the servers with below mentioned command and the nodes should communicate with one another in an ensemble.

sudo ZK_SERVER_HEAP=128 /usr/local/zookeeper/bin/zkServer.sh start

/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... 
STARTED

In order to test if the ensemble is running correctly, we can use the below command.

The "srvr" command will return local zookeeper node details.

# telnet localhost 2181
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
srvr

Zookeeper version: 3.8.0-5a02a05eddb59aee6ac762f7ea82e92a68eb9c0f, built on 2022-02-25 08:49 UTC
Latency min/avg/max: 0/0.0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x100000000
Mode: follower
Node count: 5
Connection closed by foreign host.

Here we are all done with our Zookeeper cluster setup.

Recommended number of zookeepers nodes in an ensemble ?

A majority of ensemble members (a quorum) must be working in order for ZooKeeper to respond to requests.

It is generally recommended to have an odd number of ZooKeeper servers in your ensemble, so a majority is maintained.

This means that in a three-node ensemble, you can run with one node missing.

With a five-node ensemble, you can run with two nodes missing.

In general, five is usually a good number if you have a fair number of servers.

More servers means less write performance but slightly better read performance.

Five is good because it allows you to remove a server for upgrading while still having a healthy cluster.

Configuration

Below is a quick overview of useful Zookeeper configurations and settings.

Cong Meaninig
initLimit The amount of time to allow followers to connect with a leader.
syncLimit Determines how long followers can be out of sync with the leader.
tickTime Both "initLimit" and "syncLimit" values are a number of tickTime units, which makes the init Limit 20 × 2,000 ms, or 40 seconds.
clientPort Clients only need to be able to connect to the ensemble over the clientPort.
server.X=hostname:peerPort:leaderPort Where "x" is the integer ID number of the server, "hostname" is the hostname or IP address of the server, "peerPort" is the TCP port over which servers communicate with one another, and "leaderPort" is the TCP port over which leader election is performed.

The nodes of the ensemble must be able to communicate with one another over all three (clientPort, peerPort and leaderPort).

Index
Apache Kafka : Install Zookeeper ensemble on AWS EC2

3 min

Apache Kafka : Developing Producer & Consumer in Java

10 min

Apache Kafka : Best Practices-Topic, Partitions, Consumers and Producers

2 min

Apache Kafka : Setup Multi Broker Kafka Cluster on Amazon EC2

3 min

RabbitMQ Java Client Example (Producer & Consumer)

5 min

Elasticsearch : Introduction to Spring Data Elasticsearch

7 min

Elasticsearch : Getting started with Elasticsearch & Kibana

3 min

Getting started with Spring Data MongoDB (Spring Boot + MongoDB)

11 min

Spring Data Redis with RedisTemplate and CrudRepository

14 min