Any advice how to do this usecase in spark sql ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Any advice how to do this usecase in spark sql ?

Shyam P
Hi,
Any advice how to do this in spark sql ?

I have a scenario as below

dataframe1   = loaded from an HDFS Parquet file.

dataframe2 =   read from a Kafka Stream.

If column1 of dataframe1 value in columnX value of dataframe2 , then I need then I need to replace column1 value of dataframe1. 

Else add column1 value of dataframe1 to dataframe2 as a new record.


In a sense need to implement a look up dataframe which is refresh-able.

For more information please check


 Let me know if u need more info  

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Any advice how to do this usecase in spark sql ?

Jörn Franke
Have you tried to join both datasets, filter accordingly and then write the full dataset to your filesystem?
Alternatively work with a NoSQL database that you update by key (eg it sounds a key/value store could be useful for you).

However, it could be also that you need to do more depending on your use case.

Am 14.08.2019 um 05:08 schrieb Shyam P <[hidden email]>:

Hi,
Any advice how to do this in spark sql ?

I have a scenario as below

dataframe1   = loaded from an HDFS Parquet file.

dataframe2 =   read from a Kafka Stream.

If column1 of dataframe1 value in columnX value of dataframe2 , then I need then I need to replace column1 value of dataframe1. 

Else add column1 value of dataframe1 to dataframe2 as a new record.


In a sense need to implement a look up dataframe which is refresh-able.

For more information please check


 Let me know if u need more info  

Thanks