Quantcast

In-order processing using spark streaming

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

In-order processing using spark streaming

scorpio
This post has NOT been accepted by the mailing list yet.
I am reading messages from Kafka and processing them using Spark Streaming. The incoming messages belong to various sessions and I am using a partitioned topic to ensure that the messages belonging to the same session end up in the same partition of Kafka topic.
Spark streaming job reads the messages from kafka topic using a direct consumer. There may be n number of executors reading in parallel. Since the messages belonging to a session were ordered at Kakfa, the same shall be read in an ordered fashion at the Kafka consumer being run by Spark engine. I believe Spark dedicates one core per executor to read from Kakfa and the remaining cores are used for processing the read messages. So, if the read messages are processed by more than one core in parallel, won't it break the in-order processing which I desire? If yes, I could limit the number of cores per executor but won't it mean under utilizing my Spark cluster resources?
Is there some way by which I can ensure that the session messages are processed by Spark Streaming in the order they were reported at kafka?
Loading...