DataFrame joins with Spark-Java

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

DataFrame joins with Spark-Java

sushma spark
Dear Friends,

I am new to spark DataFrame. My requirement is i have a dataframe1 contains the today's records and dataframe2 contains yesterday's records. I need to compare the today's records with yesterday's records and find out new records which are not exists in the yesterday's records based on the primary key of the column. Here, the problem is sometimes there are multiple columns having primary keys.

I am receiving primary key columns in a List.

example:

List<String> primaryKeyList = listOfPrimarykeys; // single or multiple primary key columns

DataFrame currentDataRecords = queryexecutor.getCurrentRecords(); // this contains today's records
DataFrame yesterdayRecords = queryexecutor.getYesterdayRecords();// this contains yesterday's records

Can you anyone help me how to join these two dataframes and apply WHERE conditions on columns dynamically with SPARK-JAVA code.

Thanks
Sushma

Reply | Threaded
Open this post in threaded view
|

Re: DataFrame joins with Spark-Java

Rishi Mishra
Hi Sushma,
can you try as below with a left anti join ..In my example name & id consists of a key.

    df1.alias("a").join(df2.alias("b"),
        col("a.name").equalTo(col("b.name"))
            .and(col("a.id").equalTo(col("b.id"))) ,
        "left_anti").selectExpr("name", "id").show(10, false);


On Thu, Nov 30, 2017 at 7:38 AM, sushma spark <[hidden email]> wrote:
Dear Friends,

I am new to spark DataFrame. My requirement is i have a dataframe1 contains the today's records and dataframe2 contains yesterday's records. I need to compare the today's records with yesterday's records and find out new records which are not exists in the yesterday's records based on the primary key of the column. Here, the problem is sometimes there are multiple columns having primary keys.

I am receiving primary key columns in a List.

example:

List<String> primaryKeyList = listOfPrimarykeys; // single or multiple primary key columns

DataFrame currentDataRecords = queryexecutor.getCurrentRecords(); // this contains today's records
DataFrame yesterdayRecords = queryexecutor.getYesterdayRecords();// this contains yesterday's records

Can you anyone help me how to join these two dataframes and apply WHERE conditions on columns dynamically with SPARK-JAVA code.

Thanks
Sushma