Regarding column partitioning IDs and names as per hierarchical level SparkSQL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Regarding column partitioning IDs and names as per hierarchical level SparkSQL

Aakash Basu-2
Hi all,

I have to generate a table with Spark-SQL with the following columns -


Level One Id: VARCHAR(20) NULL
Level One Name: VARCHAR( 50) NOT NULL
Level Two Id: VARCHAR( 20) NULL
Level Two Name: VARCHAR(50) NULL
Level Thr ee Id: VARCHAR(20) NULL
Level Thr ee Name: VARCHAR(50) NULL
Level Four Id: VARCHAR(20) NULL
Level Four Name: VARCHAR( 50) NULL
Level Five Id: VARCHAR(20) NULL
Level Five Name: VARCHAR(50) NULL
Level Six Id: VARCHAR(20) NULL
Level Six Name: VARCHAR(50) NULL
Level Seven Id: VARCHAR( 20) NULL
Level Seven Name: VARCHAR(50) NULL
Level Eight Id: VARCHAR( 20) NULL
Level Eight Name: VARCHAR(50) NULL
Level Nine Id: VARCHAR(20) NULL
Level Nine Name: VARCHAR( 50) NULL
Level Ten Id: VARCHAR(20) NULL
Level Ten Name: VARCHAR(50) NULL

My input source has these columns -


ID Description ParentID
10 Great-Grandfather  
1010 Grandfather 10
101010 1. Father A 1010
101011 2. Father B 1010
101012 4. Father C 1010
101013 5. Father D 1010
101015 3. Father E 1010
101018 Father F 1010
101019 6. Father G 1010
101020 Father H 1010
101021 Father I 1010
101022 2A. Father J 1010
10101010 2. Father K 101010

Like the above, I have ID till 20 digits, which means, I have 10 levels.

I want to populate the ID and name itself along with all the parents till the root for any particular level, which I am unable to create a concrete logic for.

Am using this way to fetch respecting levels and populate them in the respective columns but not their parents -

Present Logic ->

FinalJoin_DF = spark.sql("select "
                          + "case when length(a.id)/2 = '1' then a.id else ' ' end as level_one_id, "
  + "case when length(a.id)/2 = '1' then a.desc else ' ' end as level_one_name, "
  + "case when length(a.id)/2 = '2' then a.id else ' ' end as level_two_id, "
  + "case when length(a.id)/2 = '2' then a.desc else ' ' end as level_two_name, "
                          + "case when length(a.id)/2 = '3' then a.id else ' ' end as level_three_id, "
                          + "case when length(a.id)/2 = '3' then a.desc else ' ' end as level_three_name, "
                          + "case when length(a.id)/2 = '4' then a.id else ' ' end as level_four_id, "
                          + "case when length(a.id)/2 = '4' then a.desc else ' ' end as level_four_name, "
                          + "case when length(a.id)/2 = '5' then a.id else ' ' end as level_five_id, "
                          + "case when length(a.id)/2 = '5' then a.desc else ' ' end as level_five_name, "
                          + "case when length(a.id)/2 = '6' then a.id else ' ' end as level_six_id, "
  + "case when length(a.id)/2 = '6' then a.desc else ' ' end as level_six_name, "
  + "case when length(a.id)/2 = '7' then a.id else ' ' end as level_seven_id, "
                          + "case when length(a.id)/2 = '7' then a.desc else ' ' end as level_seven_name, "
                          + "case when length(a.id)/2 = '8' then a.id else ' ' end as level_eight_id, "
  + "case when length(a.id)/2 = '8' then a.desc else ' ' end as level_eight_name, "
                          + "case when length(a.id)/2 = '9' then a.id else ' ' end as level_nine_id, "
  + "case when length(a.id)/2 = '9' then a.desc else ' ' end as level_nine_name, "
  + "case when length(a.id)/2 = '10' then a.id else ' ' end as level_ten_id, "
                          + "case when length(a.id)/2 = '10' then a.desc else ' ' end as level_ten_name "
  + "from CategoryTempTable a")


Can someone help me in also populating all the parents levels in the respective level ID and level name, please?


Thanks,
Aakash.
Reply | Threaded
Open this post in threaded view
|

Fwd: Regarding column partitioning IDs and names as per hierarchical level SparkSQL

Aakash Basu-2
Hey all,

Any help in the below please?

Thanks,
Aakash.


---------- Forwarded message ----------
From: Aakash Basu <[hidden email]>
Date: Tue, Oct 31, 2017 at 9:17 PM
Subject: Regarding column partitioning IDs and names as per hierarchical level SparkSQL
To: user <[hidden email]>


Hi all,

I have to generate a table with Spark-SQL with the following columns -


Level One Id: VARCHAR(20) NULL
Level One Name: VARCHAR( 50) NOT NULL
Level Two Id: VARCHAR( 20) NULL
Level Two Name: VARCHAR(50) NULL
Level Thr ee Id: VARCHAR(20) NULL
Level Thr ee Name: VARCHAR(50) NULL
Level Four Id: VARCHAR(20) NULL
Level Four Name: VARCHAR( 50) NULL
Level Five Id: VARCHAR(20) NULL
Level Five Name: VARCHAR(50) NULL
Level Six Id: VARCHAR(20) NULL
Level Six Name: VARCHAR(50) NULL
Level Seven Id: VARCHAR( 20) NULL
Level Seven Name: VARCHAR(50) NULL
Level Eight Id: VARCHAR( 20) NULL
Level Eight Name: VARCHAR(50) NULL
Level Nine Id: VARCHAR(20) NULL
Level Nine Name: VARCHAR( 50) NULL
Level Ten Id: VARCHAR(20) NULL
Level Ten Name: VARCHAR(50) NULL

My input source has these columns -


ID Description ParentID
10 Great-Grandfather  
1010 Grandfather 10
101010 1. Father A 1010
101011 2. Father B 1010
101012 4. Father C 1010
101013 5. Father D 1010
101015 3. Father E 1010
101018 Father F 1010
101019 6. Father G 1010
101020 Father H 1010
101021 Father I 1010
101022 2A. Father J 1010
10101010 2. Father K 101010

Like the above, I have ID till 20 digits, which means, I have 10 levels.

I want to populate the ID and name itself along with all the parents till the root for any particular level, which I am unable to create a concrete logic for.

Am using this way to fetch respecting levels and populate them in the respective columns but not their parents -

Present Logic ->

FinalJoin_DF = spark.sql("select "
                          + "case when length(a.id)/2 = '1' then a.id else ' ' end as level_one_id, "
  + "case when length(a.id)/2 = '1' then a.desc else ' ' end as level_one_name, "
  + "case when length(a.id)/2 = '2' then a.id else ' ' end as level_two_id, "
  + "case when length(a.id)/2 = '2' then a.desc else ' ' end as level_two_name, "
                          + "case when length(a.id)/2 = '3' then a.id else ' ' end as level_three_id, "
                          + "case when length(a.id)/2 = '3' then a.desc else ' ' end as level_three_name, "
                          + "case when length(a.id)/2 = '4' then a.id else ' ' end as level_four_id, "
                          + "case when length(a.id)/2 = '4' then a.desc else ' ' end as level_four_name, "
                          + "case when length(a.id)/2 = '5' then a.id else ' ' end as level_five_id, "
                          + "case when length(a.id)/2 = '5' then a.desc else ' ' end as level_five_name, "
                          + "case when length(a.id)/2 = '6' then a.id else ' ' end as level_six_id, "
  + "case when length(a.id)/2 = '6' then a.desc else ' ' end as level_six_name, "
  + "case when length(a.id)/2 = '7' then a.id else ' ' end as level_seven_id, "
                          + "case when length(a.id)/2 = '7' then a.desc else ' ' end as level_seven_name, "
                          + "case when length(a.id)/2 = '8' then a.id else ' ' end as level_eight_id, "
  + "case when length(a.id)/2 = '8' then a.desc else ' ' end as level_eight_name, "
                          + "case when length(a.id)/2 = '9' then a.id else ' ' end as level_nine_id, "
  + "case when length(a.id)/2 = '9' then a.desc else ' ' end as level_nine_name, "
  + "case when length(a.id)/2 = '10' then a.id else ' ' end as level_ten_id, "
                          + "case when length(a.id)/2 = '10' then a.desc else ' ' end as level_ten_name "
  + "from CategoryTempTable a")


Can someone help me in also populating all the parents levels in the respective level ID and level name, please?


Thanks,
Aakash.

Reply | Threaded
Open this post in threaded view
|

Re: Regarding column partitioning IDs and names as per hierarchical level SparkSQL

Jean Georges Perrin
Write a UDF?

On Oct 31, 2017, at 11:48, Aakash Basu <[hidden email]> wrote:

Hey all,

Any help in the below please?

Thanks,
Aakash.


---------- Forwarded message ----------
From: Aakash Basu <[hidden email]>
Date: Tue, Oct 31, 2017 at 9:17 PM
Subject: Regarding column partitioning IDs and names as per hierarchical level SparkSQL
To: user <[hidden email]>


Hi all,

I have to generate a table with Spark-SQL with the following columns -


Level One Id: VARCHAR(20) NULL
Level One Name: VARCHAR( 50) NOT NULL
Level Two Id: VARCHAR( 20) NULL
Level Two Name: VARCHAR(50) NULL
Level Thr ee Id: VARCHAR(20) NULL
Level Thr ee Name: VARCHAR(50) NULL
Level Four Id: VARCHAR(20) NULL
Level Four Name: VARCHAR( 50) NULL
Level Five Id: VARCHAR(20) NULL
Level Five Name: VARCHAR(50) NULL
Level Six Id: VARCHAR(20) NULL
Level Six Name: VARCHAR(50) NULL
Level Seven Id: VARCHAR( 20) NULL
Level Seven Name: VARCHAR(50) NULL
Level Eight Id: VARCHAR( 20) NULL
Level Eight Name: VARCHAR(50) NULL
Level Nine Id: VARCHAR(20) NULL
Level Nine Name: VARCHAR( 50) NULL
Level Ten Id: VARCHAR(20) NULL
Level Ten Name: VARCHAR(50) NULL

My input source has these columns -


ID Description ParentID
10 Great-Grandfather  
1010 Grandfather 10
101010 1. Father A 1010
101011 2. Father B 1010
101012 4. Father C 1010
101013 5. Father D 1010
101015 3. Father E 1010
101018 Father F 1010
101019 6. Father G 1010
101020 Father H 1010
101021 Father I 1010
101022 2A. Father J 1010
10101010 2. Father K 101010

Like the above, I have ID till 20 digits, which means, I have 10 levels.

I want to populate the ID and name itself along with all the parents till the root for any particular level, which I am unable to create a concrete logic for.

Am using this way to fetch respecting levels and populate them in the respective columns but not their parents -

Present Logic ->

FinalJoin_DF = spark.sql("select "
                          + "case when length(a.id)/2 = '1' then a.id else ' ' end as level_one_id, "
  + "case when length(a.id)/2 = '1' then a.desc else ' ' end as level_one_name, "
  + "case when length(a.id)/2 = '2' then a.id else ' ' end as level_two_id, "
  + "case when length(a.id)/2 = '2' then a.desc else ' ' end as level_two_name, "
                          + "case when length(a.id)/2 = '3' then a.id else ' ' end as level_three_id, "
                          + "case when length(a.id)/2 = '3' then a.desc else ' ' end as level_three_name, "
                          + "case when length(a.id)/2 = '4' then a.id else ' ' end as level_four_id, "
                          + "case when length(a.id)/2 = '4' then a.desc else ' ' end as level_four_name, "
                          + "case when length(a.id)/2 = '5' then a.id else ' ' end as level_five_id, "
                          + "case when length(a.id)/2 = '5' then a.desc else ' ' end as level_five_name, "
                          + "case when length(a.id)/2 = '6' then a.id else ' ' end as level_six_id, "
  + "case when length(a.id)/2 = '6' then a.desc else ' ' end as level_six_name, "
  + "case when length(a.id)/2 = '7' then a.id else ' ' end as level_seven_id, "
                          + "case when length(a.id)/2 = '7' then a.desc else ' ' end as level_seven_name, "
                          + "case when length(a.id)/2 = '8' then a.id else ' ' end as level_eight_id, "
  + "case when length(a.id)/2 = '8' then a.desc else ' ' end as level_eight_name, "
                          + "case when length(a.id)/2 = '9' then a.id else ' ' end as level_nine_id, "
  + "case when length(a.id)/2 = '9' then a.desc else ' ' end as level_nine_name, "
  + "case when length(a.id)/2 = '10' then a.id else ' ' end as level_ten_id, "
                          + "case when length(a.id)/2 = '10' then a.desc else ' ' end as level_ten_name "
  + "from CategoryTempTable a")


Can someone help me in also populating all the parents levels in the respective level ID and level name, please?


Thanks,
Aakash.


Reply | Threaded
Open this post in threaded view
|

Re: Regarding column partitioning IDs and names as per hierarchical level SparkSQL

ayan guha
you can use 10 passes over the same dataset and build the data


On Fri, Nov 3, 2017 at 9:48 PM, Jean Georges Perrin <[hidden email]> wrote:
Write a UDF?

On Oct 31, 2017, at 11:48, Aakash Basu <[hidden email]> wrote:

Hey all,

Any help in the below please?

Thanks,
Aakash.


---------- Forwarded message ----------
From: Aakash Basu <[hidden email]>
Date: Tue, Oct 31, 2017 at 9:17 PM
Subject: Regarding column partitioning IDs and names as per hierarchical level SparkSQL
To: user <[hidden email]>


Hi all,

I have to generate a table with Spark-SQL with the following columns -


Level One Id: VARCHAR(20) NULL
Level One Name: VARCHAR( 50) NOT NULL
Level Two Id: VARCHAR( 20) NULL
Level Two Name: VARCHAR(50) NULL
Level Thr ee Id: VARCHAR(20) NULL
Level Thr ee Name: VARCHAR(50) NULL
Level Four Id: VARCHAR(20) NULL
Level Four Name: VARCHAR( 50) NULL
Level Five Id: VARCHAR(20) NULL
Level Five Name: VARCHAR(50) NULL
Level Six Id: VARCHAR(20) NULL
Level Six Name: VARCHAR(50) NULL
Level Seven Id: VARCHAR( 20) NULL
Level Seven Name: VARCHAR(50) NULL
Level Eight Id: VARCHAR( 20) NULL
Level Eight Name: VARCHAR(50) NULL
Level Nine Id: VARCHAR(20) NULL
Level Nine Name: VARCHAR( 50) NULL
Level Ten Id: VARCHAR(20) NULL
Level Ten Name: VARCHAR(50) NULL

My input source has these columns -


ID Description ParentID
10 Great-Grandfather  
1010 Grandfather 10
101010 1. Father A 1010
101011 2. Father B 1010
101012 4. Father C 1010
101013 5. Father D 1010
101015 3. Father E 1010
101018 Father F 1010
101019 6. Father G 1010
101020 Father H 1010
101021 Father I 1010
101022 2A. Father J 1010
10101010 2. Father K 101010

Like the above, I have ID till 20 digits, which means, I have 10 levels.

I want to populate the ID and name itself along with all the parents till the root for any particular level, which I am unable to create a concrete logic for.

Am using this way to fetch respecting levels and populate them in the respective columns but not their parents -

Present Logic ->

FinalJoin_DF = spark.sql("select "
                          + "case when length(a.id)/2 = '1' then a.id else ' ' end as level_one_id, "
  + "case when length(a.id)/2 = '1' then a.desc else ' ' end as level_one_name, "
  + "case when length(a.id)/2 = '2' then a.id else ' ' end as level_two_id, "
  + "case when length(a.id)/2 = '2' then a.desc else ' ' end as level_two_name, "
                          + "case when length(a.id)/2 = '3' then a.id else ' ' end as level_three_id, "
                          + "case when length(a.id)/2 = '3' then a.desc else ' ' end as level_three_name, "
                          + "case when length(a.id)/2 = '4' then a.id else ' ' end as level_four_id, "
                          + "case when length(a.id)/2 = '4' then a.desc else ' ' end as level_four_name, "
                          + "case when length(a.id)/2 = '5' then a.id else ' ' end as level_five_id, "
                          + "case when length(a.id)/2 = '5' then a.desc else ' ' end as level_five_name, "
                          + "case when length(a.id)/2 = '6' then a.id else ' ' end as level_six_id, "
  + "case when length(a.id)/2 = '6' then a.desc else ' ' end as level_six_name, "
  + "case when length(a.id)/2 = '7' then a.id else ' ' end as level_seven_id, "
                          + "case when length(a.id)/2 = '7' then a.desc else ' ' end as level_seven_name, "
                          + "case when length(a.id)/2 = '8' then a.id else ' ' end as level_eight_id, "
  + "case when length(a.id)/2 = '8' then a.desc else ' ' end as level_eight_name, "
                          + "case when length(a.id)/2 = '9' then a.id else ' ' end as level_nine_id, "
  + "case when length(a.id)/2 = '9' then a.desc else ' ' end as level_nine_name, "
  + "case when length(a.id)/2 = '10' then a.id else ' ' end as level_ten_id, "
                          + "case when length(a.id)/2 = '10' then a.desc else ' ' end as level_ten_name "
  + "from CategoryTempTable a")


Can someone help me in also populating all the parents levels in the respective level ID and level name, please?


Thanks,
Aakash.





--
Best Regards,
Ayan Guha