Apache Kafka : Install Zookeeper ensemble on AWS EC2
- 4.7/5
- 6110
- Jul 20, 2024
In this article we will see how to setup a three-node Zookeeper ensemble on AWS EC2 instances.
Let's assume we already have three EC2 instances up an running with following public IPs:
15.232.8.517 15.232.46.302 15.0.185.130
Please make sure to open 2181, 2888 and 3888 ports on each machine, as the Zookeeper instances need them to communicate with the client and themselves.
Install OpenJDK 17
The first step is to install OpenJDK 17 on each server as shown below:
sudo apt update && sudo apt upgrade -y apt-cache search openjdk sudo apt-get install openjdk-17-jdk
java --version openjdk 17.0.4 2022-07-19 OpenJDK Runtime Environment (build 17.0.4+8-Ubuntu-122.04) OpenJDK 64-Bit Server VM (build 17.0.4+8-Ubuntu-122.04, mixed mode, sharing)
Install ZooKeeper
Download ZooKeeper from the release page and move it to "/usr/local/" directory as shown below.
wget https://dlcdn.apache.org/zookeeper/zookeeper-3.8.0/apache-zookeeper-3.8.0-bin.tar.gz tar -zxf apache-zookeeper-3.8.0-bin.tar.gz sudo mv apache-zookeeper-3.8.0-bin /usr/local/zookeeper
Also create a "zookeeper" folder under "/var/lib/", this will act as a data directory for zookeeper.
sudo mkdir -p /var/lib/zookeeper
Each node must have a common configuration that lists all servers, and each server needs a "myid" file in the data directory that specifies the ID number of the server.
touch /var/lib/zookeeper/myid vi /var/lib/zookeeper/myid
This file (myid) must contain the ID number of the server, which must match the configuration file.
If the ips of the servers in the ensemble are 15.232.8.517, 15.232.46.302, and 15.0.185.130, the configuration file "zoo.cfg" under "/usr/local/zookeeper/conf/" should look like this:
touch /usr/local/zookeeper/conf/zoo.cfg vi /usr/local/zookeeper/conf/zoo.cfg
1) Node 15.232.8.517
tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=20 syncLimit=5 server.1=0.0.0.0:2888:3888 server.2=15.232.46.302:2888:3888 server.3=15.0.185.130:2888:3888
2) Node 15.232.46.302
tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=20 syncLimit=5 server.1=15.232.8.517:2888:3888 server.2=0.0.0.0:2888:3888 server.3=15.0.185.130:2888:3888
3) Node 15.0.185.130
tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=20 syncLimit=5 server.1=15.232.8.517:2888:3888 server.2=15.232.46.302:2888:3888 server.3=0.0.0.0:2888:3888
Note: You must specify 0.0.0.0 for the current node.
Once these steps are complete, start up the servers with below mentioned command and the nodes should communicate with one another in an ensemble.
sudo ZK_SERVER_HEAP=128 /usr/local/zookeeper/bin/zkServer.sh start /usr/bin/java ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
In order to test if the ensemble is running correctly, we can use the below command.
The "srvr" command will return local zookeeper node details.
# telnet localhost 2181 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. srvr Zookeeper version: 3.8.0-5a02a05eddb59aee6ac762f7ea82e92a68eb9c0f, built on 2022-02-25 08:49 UTC Latency min/avg/max: 0/0.0/0 Received: 1 Sent: 0 Connections: 1 Outstanding: 0 Zxid: 0x100000000 Mode: follower Node count: 5 Connection closed by foreign host.
Here we are all done with our Zookeeper cluster setup.
Recommended number of zookeepers nodes in an ensemble ?
A majority of ensemble members (a quorum) must be working in order for ZooKeeper to respond to requests.
It is generally recommended to have an odd number of ZooKeeper servers in your ensemble, so a majority is maintained.
This means that in a three-node ensemble, you can run with one node missing.
With a five-node ensemble, you can run with two nodes missing.
In general, five is usually a good number if you have a fair number of servers.
More servers means less write performance but slightly better read performance.
Five is good because it allows you to remove a server for upgrading while still having a healthy cluster.
Configuration
Below is a quick overview of useful Zookeeper configurations and settings.
Cong | Meaninig |
---|---|
initLimit | The amount of time to allow followers to connect with a leader. |
syncLimit | Determines how long followers can be out of sync with the leader. |
tickTime | Both "initLimit" and "syncLimit" values are a number of tickTime units, which makes the init Limit 20 × 2,000 ms, or 40 seconds. |
clientPort | Clients only need to be able to connect to the ensemble over the clientPort. |
server.X=hostname:peerPort:leaderPort | Where "x" is the integer ID number of the server, "hostname" is the hostname or IP address of the server, "peerPort" is the TCP port over which servers communicate with one another, and "leaderPort" is the TCP port over which leader election is performed. |
The nodes of the ensemble must be able to communicate with one another over all three (clientPort, peerPort and leaderPort).