assign one identifier for all rows that have similar value in RDD

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

assign one identifier for all rows that have similar value in RDD

Donni Khan
Hi Spark Users,

I want to add one Identifier for all rows that have similar values in an specific column.
actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to assign one identifier for all rows that have similar value in "Col"


Doeas anyone knows any idea (codes ) to do that?

Thank you,

Reply | Threaded
Open this post in threaded view
|

Re: assign one identifier for all rows that have similar value in RDD

Daniel Zhang

Search Spark windows and first_value function.


Yong




From: Donni Khan <[hidden email]>
Sent: Friday, April 20, 2018 7:19 AM
To: [hidden email]
Subject: assign one identifier for all rows that have similar value in RDD
 
Hi Spark Users,

I want to add one Identifier for all rows that have similar values in an specific column.
actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to assign one identifier for all rows that have similar value in "Col"


Doeas anyone knows any idea (codes ) to do that?

Thank you,

Reply | Threaded
Open this post in threaded view
|

Re: assign one identifier for all rows that have similar value in RDD

Vadim Semenov-2
In reply to this post by Donni Khan
Create another rdd with one-to-one relations Col -> Id, and then join on it?

On Fri, Apr 20, 2018 at 7:19 AM, Donni Khan <[hidden email]> wrote:
Hi Spark Users,

I want to add one Identifier for all rows that have similar values in an specific column.
actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to assign one identifier for all rows that have similar value in "Col"


Doeas anyone knows any idea (codes ) to do that?

Thank you,




--
Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: assign one identifier for all rows that have similar value in RDD

Bowden, Chris

Just hash the column value


-Chris


From: Vadim Semenov <[hidden email]>
Sent: Friday, April 20, 2018 7:09:51 AM
To: Donni Khan
Cc: user
Subject: Re: assign one identifier for all rows that have similar value in RDD
 
Create another rdd with one-to-one relations Col -> Id, and then join on it?

On Fri, Apr 20, 2018 at 7:19 AM, Donni Khan <[hidden email]> wrote:
Hi Spark Users,

I want to add one Identifier for all rows that have similar values in an specific column.
actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to assign one identifier for all rows that have similar value in "Col"


Doeas anyone knows any idea (codes ) to do that?

Thank you,




--
Sent from my iPhone