Joining streaming data with static table data.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Joining streaming data with static table data.

satyajit vegesna
Hi All,

I working on real time reporting project and i have a question about structured streaming job, that is going to stream a particular table records and would have to join to an existing table.

Stream ----> query/join to another DF/DS ---> update the Stream data record.

Now i have a problem on how do i approach the mid layer(query/join to another DF/DS), should i create a DF from spark.read.format("JDBC") or "stream and maintain the data in memory sink" or if there is any better way to do it.

Would like to know, if anyone has faced a similar scenario and have any suggestion on how to go ahead.

Regards,
Satyajit.
Reply | Threaded
Open this post in threaded view
|

Re: Joining streaming data with static table data.

Rishi Mishra
You can do a join between streaming dataset and a static dataset. I would prefer your first approach. But the problem with this approach is performance.
Unless you cache the dataset , every time you fire a join query it will fetch the latest records from the table.




On Tue, Dec 12, 2017 at 6:29 AM, satyajit vegesna <[hidden email]> wrote:
Hi All,

I working on real time reporting project and i have a question about structured streaming job, that is going to stream a particular table records and would have to join to an existing table.

Stream ----> query/join to another DF/DS ---> update the Stream data record.

Now i have a problem on how do i approach the mid layer(query/join to another DF/DS), should i create a DF from spark.read.format("JDBC") or "stream and maintain the data in memory sink" or if there is any better way to do it.

Would like to know, if anyone has faced a similar scenario and have any suggestion on how to go ahead.

Regards,
Satyajit.

Reply | Threaded
Open this post in threaded view
|

Re: Joining streaming data with static table data.

Vikash Pareek
In reply to this post by satyajit vegesna
Hi Satyajit,

For the query/join part there is a couple of approaches.
1. create a dataframe from all incoming streaming batch (i.e. actually an
rdd) and join with your reference data (coming from existing table)
2. you can use structure streaming that basically consists of schema in
every batch (you can understand it as a stream of dataframes)

While joining with reference data, if it is static data then load once and
persist it or if it is dynamic data then keep updating this at a regular
interval.


Best Regards,
Vikash Pareek



-----

__Vikash Pareek
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


__Vikash Pareek
Reply | Threaded
Open this post in threaded view
|

Re: Joining streaming data with static table data.

Vikash Pareek
In reply to this post by Rishi Mishra
Hi Satyajit,

For the query/join part there is a couple of approaches.
1. create a dataframe from all incoming streaming batch (i.e. actually an
rdd) and join with your reference data (coming from existing table) 2. you
can use structure streaming that basically consists of the schema in every
batch (you can understand it as a stream of dataframes)

While joining with reference data, if it is static data then load once and
persist it or if it is dynamic data then keep updating this at a regular
interval.


Best Regards,
Vikash Pareek




-----

__Vikash Pareek
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


__Vikash Pareek