Performance considerations, Using microservices for ZooKeeper & Kafka in Spark Streaming

Performance considerations, Using microservices for ZooKeeper & Kafka in Spark Streaming

Mich Talebzadeh


I wanted to share some experience of using microservices in the form of dockers for ZooKeeper and Kafka cluster.

I created a streaming ensemble with microservices as shown in the following diagram.

There are a Zookeeper docker and three Kafka dockers all residing on RHEL 7.5 server. For sake of simplicity I confined my test to these microservices and fed the streaming prices in batches of 100 into both Spark Streaming and Flink.

For the sake of this brief I will confine to Spark Streaming. So we have the following test parameters

Batch interval (window)= 2 seconds sending 100 test market data (historical) prices in Micro Batches, i.e. collect and process data later

The Sliding Interval = Batch Interval = 2 seconds

Windows Length = 2 x Batch Interval = 4 seconds

Unfortunately I don't have the figures from classic ZooKeeper and Kafka set-up. However, I see the improvement in performance that looks impressive.

My artefacts versions are:

ZooKeeper: 3.4.11
Kafka: 2.12-
Spark: 2.3.0
Hadoop: 3.0
HBase: 1.2.6

Spark Streaming is part of Speed layer in Lambda Architecture and flushes high value prices into an HBase table. It uses standalone mode with 24G river-memory, 8G executor-memory and number-executors 4
My observations are based on Spark GUI as shown below

Note that one expects for timely operation, processing time much less that Batch Interval. Looking at the graph above I see no delay and an average processing time of 15 ms. I assume the total delay averaged at 15 ms = average processing time. 

I am not sure how much of this gain is due to deployment of microservices. However, I assume these services of being lightweight, efficient and loose coupling do make a difference.


Dr Mich Talebzadeh



