Spark Streaming - Routing rdd to Executor based on Key

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Streaming - Routing rdd to Executor based on Key

forece85
We are doing batch processing using Spark Streaming with Kinesis with a batch
size of 5 mins. We want to send all events with same eventId to same
executor for a batch so that we can do multiple events based grouping
operations based on eventId. No previous batch or future batch data is
concerned. Only Current batch keyed operation needed.

Please help me how to achieve this.

Thanks.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Streaming - Routing rdd to Executor based on Key

AliGouta
Do not know Kenesis, but it looks like it works like kafka. Your producer should implement a paritionner that makes it possible to send your data with the same key to the same partition. Though, each task in your spark streaming app will load data from the same partition in the same executor. I think this is the simplest way to achieve what you want to do.

Best regards,
Ali Gouta.

On Tue, Mar 9, 2021 at 11:30 AM forece85 <[hidden email]> wrote:
We are doing batch processing using Spark Streaming with Kinesis with a batch
size of 5 mins. We want to send all events with same eventId to same
executor for a batch so that we can do multiple events based grouping
operations based on eventId. No previous batch or future batch data is
concerned. Only Current batch keyed operation needed.

Please help me how to achieve this.

Thanks.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Streaming - Routing rdd to Executor based on Key

srowen
You can also group by the key in the transformation on each batch. But yes that's faster/easier if it's already partitioned that way.

On Tue, Mar 9, 2021 at 7:30 AM Ali Gouta <[hidden email]> wrote:
Do not know Kenesis, but it looks like it works like kafka. Your producer should implement a paritionner that makes it possible to send your data with the same key to the same partition. Though, each task in your spark streaming app will load data from the same partition in the same executor. I think this is the simplest way to achieve what you want to do.

Best regards,
Ali Gouta.

On Tue, Mar 9, 2021 at 11:30 AM forece85 <[hidden email]> wrote:
We are doing batch processing using Spark Streaming with Kinesis with a batch
size of 5 mins. We want to send all events with same eventId to same
executor for a batch so that we can do multiple events based grouping
operations based on eventId. No previous batch or future batch data is
concerned. Only Current batch keyed operation needed.

Please help me how to achieve this.

Thanks.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Streaming - Routing rdd to Executor based on Key

forece85
In reply to this post by AliGouta
Not sure if kinesis have such flexibility. What else possibilities are there
at transformations level?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Streaming - Routing rdd to Executor based on Key

forece85
In reply to this post by srowen
Any example for this please



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]