Spark Structured Streaming Continuous Trigger Mode React to an External Trigger

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Structured Streaming Continuous Trigger Mode React to an External Trigger

shahrajesh2006
I am using Spark(Java) Structured Streaming in Continuous Trigger Mode  
connecting to Kafka Broker. Usecase is very simple to do some custom
filter/transformation using a simple java method and ingest data into an
external system. Kafka has 6 partitions -so application is running 6
executors.  I have requirement to change the behavior of
filter/transfomration logic each executor is doing based on external
event(for example a property change in s3 file).  This is  towards the goal
of building a highly resiliency architecture where Spark application is
running into two Cloud Regions and react to an external event.

What is best way to send a signal to running spark application and prorogate
same to each executor?

Approach I have in mind is to create a new component which periodically
refreshes s3 file to look for trigger event. This component will be
integrated with logic  running on each Spark executor JVM.

Please advice.

Thanks and Regards
 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming Continuous Trigger Mode React to an External Trigger

Mich Talebzadeh
Hi,

Would help if you could include a high level architecture diagram.

So as I understand you are running a single broker with 6 partitions (or 6 brokers with one partition (default each).

You said your are using continuous triggering mode, meaning as an example


                   foreach(ForeachWriter()).

                     trigger(Trigger.Continuous("1 second").

 

with 1 sec checkpointing interval.


so the class ForeachWriter() will handle the transformation logic that will apply to all executors.  


For looking for changes to s3 files, you need an orchestrator integrated with Spark. So this is all event driven. Something like airflow with a file sensor. I am not sure what the granularity of your resolution is but you may be able to use micro-batching as well.


                     foreachBatch(SendToBigQuery). \

                     trigger(processingTime='2 seconds'). \


Whatever happens, that class within spark should be able to poll for changes and take the correct logic where necessary. 

HTH


   view my Linkedin profile

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Sat, 27 Mar 2021 at 12:46, shahrajesh2006 <[hidden email]> wrote:
I am using Spark(Java) Structured Streaming in Continuous Trigger Mode 
connecting to Kafka Broker. Usecase is very simple to do some custom
filter/transformation using a simple java method and ingest data into an
external system. Kafka has 6 partitions -so application is running 6
executors.  I have requirement to change the behavior of
filter/transfomration logic each executor is doing based on external
event(for example a property change in s3 file).  This is  towards the goal
of building a highly resiliency architecture where Spark application is
running into two Cloud Regions and react to an external event.

What is best way to send a signal to running spark application and prorogate
same to each executor?

Approach I have in mind is to create a new component which periodically
refreshes s3 file to look for trigger event. This component will be
integrated with logic  running on each Spark executor JVM.

Please advice.

Thanks and Regards




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming Continuous Trigger Mode React to an External Trigger

shahrajesh2006
Sorry I don't have a diagram to share.  your understanding of how I are using
spark application is right.
 
Its  kafka topic with 6 partitions, so spark is able to create 6 parallel
consumers/executors.

Thought of using Airflow is interesting. I will explore this option more.

Other idea of using ProcessingTime trigger(every 60 seconds) to build a new
query to load data from s3 file and use results from this query with
ContinuousTrigger query - I will try this option also.

Thanks again!

 







--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming Continuous Trigger Mode React to an External Trigger

shahrajesh2006
I tried to create a Dataset by loading a file and pass that as argument to
java method as below:

Dataset<Row> propertiesFile// Dataset created by loading a json property
file
Dataset<Row> streamingQuery // Dataset for streaming query

streamingQuery.map(
        row -> myfunction( row, propertiesFile), Encoders.STRING());

This approach throws NulPointerException while reading from propertiesFile
Dataset.
Looks like I can not pass propertiesFile Dataset in this way in Spark, I
have to join with streamingQuery Dataset.






--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]