Performance considerations, Using microservices for ZooKeeper & Kafka in Spark Streaming

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Performance considerations, Using microservices for ZooKeeper & Kafka in Spark Streaming

Mich Talebzadeh


I wanted to share some experience of using microservices in the form of dockers for ZooKeeper and Kafka cluster.

I created a streaming ensemble with microservices as shown in the following diagram.

There are a Zookeeper docker and three Kafka dockers all residing on RHEL 7.5 server. For sake of simplicity I confined my test to these microservices and fed the streaming prices in batches of 100 into both Spark Streaming and Flink.

For the sake of this brief I will confine to Spark Streaming. So we have the following test parameters

Batch interval (window)= 2 seconds sending 100 test market data (historical) prices in Micro Batches, i.e. collect and process data later

The Sliding Interval = Batch Interval = 2 seconds

Windows Length = 2 x Batch Interval = 4 seconds

Unfortunately I don't have the figures from classic ZooKeeper and Kafka set-up. However, I see the improvement in performance that looks impressive.

My artefacts versions are:

ZooKeeper: 3.4.11
Kafka: 2.12-
Spark: 2.3.0
Hadoop: 3.0
HBase: 1.2.6

Spark Streaming is part of Speed layer in Lambda Architecture and flushes high value prices into an HBase table. It uses standalone mode with 24G river-memory, 8G executor-memory and number-executors 4
My observations are based on Spark GUI as shown below

Note that one expects for timely operation, processing time much less that Batch Interval. Looking at the graph above I see no delay and an average processing time of 15 ms. I assume the total delay averaged at 15 ms = average processing time. 

I am not sure how much of this gain is due to deployment of microservices. However, I assume these services of being lightweight, efficient and loose coupling do make a difference.


Dr Mich Talebzadeh



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.