How can I use pyspark to upsert one row without replacing entire table

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How can I use pyspark to upsert one row without replacing entire table

Siavash Namvar
Hi,

I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.

for instance if I have 3 following rows

-------------------
id | fname         
-------------------
 1 | john          
-------------------
 2 | Steve         
-------------------
 3 | Jack         
-------------------

And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like

-------------------
id | fname         
-------------------
 1 | john          
-------------------
 2 | Michael         
-------------------
 3 | Jack         
-------------------

Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: How can I use pyspark to upsert one row without replacing entire table

srowen
It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.

On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <[hidden email]> wrote:

>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Steve
> -------------------
>  3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Michael
> -------------------
>  3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How can I use pyspark to upsert one row without replacing entire table

Siavash Namvar
Thanks Sean,

Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db

On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <[hidden email]> wrote:
It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.

On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <[hidden email]> wrote:
>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Steve
> -------------------
>  3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Michael
> -------------------
>  3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks
Reply | Threaded
Open this post in threaded view
|

Re: How can I use pyspark to upsert one row without replacing entire table

Nicholas Gustafson
The delta docs have examples of upserting:


On Aug 12, 2020, at 08:31, Siavash Namvar <[hidden email]> wrote:


Thanks Sean,

Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db

On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <[hidden email]> wrote:
It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.

On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <[hidden email]> wrote:
>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Steve
> -------------------
>  3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Michael
> -------------------
>  3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks
ed
Reply | Threaded
Open this post in threaded view
|

Re: How can I use pyspark to upsert one row without replacing entire table

ed
In reply to this post by Siavash Namvar
You’ll need to do an insert and use a trigger on the table to change it into an upsert, also make sure your mode is append rather than overwrite.

Ed


From: Siavash Namvar <[hidden email]>
Sent: Wednesday, August 12, 2020 4:09:07 PM
To: Sean Owen <[hidden email]>
Cc: User <[hidden email]>
Subject: Re: How can I use pyspark to upsert one row without replacing entire table
 
Thanks Sean,

Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db

On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <[hidden email]> wrote:
It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.

On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <[hidden email]> wrote:
>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Steve
> -------------------
>  3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Michael
> -------------------
>  3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks
Reply | Threaded
Open this post in threaded view
|

Re: How can I use pyspark to upsert one row without replacing entire table

Siavash Namvar
That's kind of solution Ed, can you elaborate how can I do this on Spark side? Or do I need to update table configuration in the DB

Siavash

On Wed, Aug 12, 2020 at 5:55 PM ed elliott <[hidden email]> wrote:
You’ll need to do an insert and use a trigger on the table to change it into an upsert, also make sure your mode is append rather than overwrite.

Ed


From: Siavash Namvar <[hidden email]>
Sent: Wednesday, August 12, 2020 4:09:07 PM
To: Sean Owen <[hidden email]>
Cc: User <[hidden email]>
Subject: Re: How can I use pyspark to upsert one row without replacing entire table
 
Thanks Sean,

Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db

On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <[hidden email]> wrote:
It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.

On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <[hidden email]> wrote:
>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Steve
> -------------------
>  3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Michael
> -------------------
>  3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks