This post has NOT been accepted by the mailing list yet.
I am reading messages from Kafka and processing them using Spark Streaming. The incoming messages belong to various sessions and I am using a partitioned topic to ensure that the messages belonging to the same session end up in the same partition of Kafka topic.
Spark streaming job reads the messages from kafka topic using a direct consumer. There may be n number of executors reading in parallel. Since the messages belonging to a session were ordered at Kakfa, the same shall be read in an ordered fashion at the Kafka consumer being run by Spark engine. I believe Spark dedicates one core per executor to read from Kakfa and the remaining cores are used for processing the read messages. So, if the read messages are processed by more than one core in parallel, won't it break the in-order processing which I desire? If yes, I could limit the number of cores per executor but won't it mean under utilizing my Spark cluster resources?
Is there some way by which I can ensure that the session messages are processed by Spark Streaming in the order they were reported at kafka?