OrderBy Year and Month is not displaying correctly

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

OrderBy Year and Month is not displaying correctly

Mich Talebzadeh
Hi,

I am working out monthly outgoing etc from an account and I am using the following code

import org.apache.spark.sql.expressions.Window
val wSpec = Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))

joint_accounts.
      select(year(col("transactiondate")).as("Year")
    , month(col("transactiondate")).as("Month")
    , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per Month")
    , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per Month")).
    orderBy(year(col("transactiondate")),month(col("transactiondate"))).
    distinct.
    show(1000,false)

This shows as follows:


|Year|Month|incoming Per Month|outgoing Per Month|
+----+-----+------------------+------------------+
|2019|9    |13958.58          |17920.31          |
|2019|11   |4032.30           |4225.30           |
|2020|1    |1530.00           |1426.91           |
|2019|10   |10029.00          |10067.52          |
|2019|12   |742.00            |814.49            |
+----+-----+------------------+------------------+

 however the orderby is not correct as I expect to see 2010 record and 2019 records in the order of year and month. 

Any suggestions?

Thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 

Reply | Threaded
Open this post in threaded view
|

Re: OrderBy Year and Month is not displaying correctly

Enrico Minack
The distinct transformation does not preserve order, you need to distinct first, then orderby.

Enrico


Am 06.01.20 um 00:39 schrieb Mich Talebzadeh:
Hi,

I am working out monthly outgoing etc from an account and I am using the following code

import org.apache.spark.sql.expressions.Window
val wSpec = Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))

joint_accounts.
      select(year(col("transactiondate")).as("Year")
    , month(col("transactiondate")).as("Month")
    , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per Month")
    , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per Month")).
    orderBy(year(col("transactiondate")),month(col("transactiondate"))).
    distinct.
    show(1000,false)

This shows as follows:


|Year|Month|incoming Per Month|outgoing Per Month|
+----+-----+------------------+------------------+
|2019|9    |13958.58          |17920.31          |
|2019|11   |4032.30           |4225.30           |
|2020|1    |1530.00           |1426.91           |
|2019|10   |10029.00          |10067.52          |
|2019|12   |742.00            |814.49            |
+----+-----+------------------+------------------+

 however the orderby is not correct as I expect to see 2010 record and 2019 records in the order of year and month. 

Any suggestions?

Thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: OrderBy Year and Month is not displaying correctly

Gourav Sengupta
or just use SQL, which is less verbose, easily readable, and takes care of all such scenarios. But for some weird reason I have found that people using data frame API's have a perception that using SQL is less intelligent. But I think that using less effort to get better output can me a measure of intelligence.

Regards,
Gourav Sengupta

On Mon, Jan 6, 2020 at 9:23 AM Enrico Minack <[hidden email]> wrote:
The distinct transformation does not preserve order, you need to distinct first, then orderby.

Enrico


Am 06.01.20 um 00:39 schrieb Mich Talebzadeh:
Hi,

I am working out monthly outgoing etc from an account and I am using the following code

import org.apache.spark.sql.expressions.Window
val wSpec = Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))

joint_accounts.
      select(year(col("transactiondate")).as("Year")
    , month(col("transactiondate")).as("Month")
    , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per Month")
    , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per Month")).
    orderBy(year(col("transactiondate")),month(col("transactiondate"))).
    distinct.
    show(1000,false)

This shows as follows:


|Year|Month|incoming Per Month|outgoing Per Month|
+----+-----+------------------+------------------+
|2019|9    |13958.58          |17920.31          |
|2019|11   |4032.30           |4225.30           |
|2020|1    |1530.00           |1426.91           |
|2019|10   |10029.00          |10067.52          |
|2019|12   |742.00            |814.49            |
+----+-----+------------------+------------------+

 however the orderby is not correct as I expect to see 2010 record and 2019 records in the order of year and month. 

Any suggestions?

Thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


Reply | Threaded
Open this post in threaded view
|

Re: OrderBy Year and Month is not displaying correctly

Mich Talebzadeh

The distinct transformation does not preserve order, you need to distinct first, then orderby.

Thanks Enrico. You are correct. Worked fine!

joint_accounts.
      select(year(col("transactiondate")).as("Year")
    , month(col("transactiondate")).as("Month")
    , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per Month")
    , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per Month")).
      withColumn(("incoming Per Month"),format_number(col("incoming Per Month"),2)).
      withColumn(("outgoing Per Month"),format_number(col("outgoing Per Month"),2)).
      distinct.
      orderBy(col("Year"),col("Month")).
 
     show(1000,false)

+----+-----+------------------+------------------+
|Year|Month|incoming Per Month|outgoing Per Month|
+----+-----+------------------+------------------+
|2019|9    |13,958.58         |17,920.31         |
|2019|10   |10,029.00         |10,067.52         |

|2019|11   |4,032.30          |4,225.30          |
|2019|12   |742.00            |814.49            |
|2020|1    |1,570.00          |1,540.86          |
+----+-----+------------------+------------------+

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 6 Jan 2020 at 09:35, Gourav Sengupta <[hidden email]> wrote:
or just use SQL, which is less verbose, easily readable, and takes care of all such scenarios. But for some weird reason I have found that people using data frame API's have a perception that using SQL is less intelligent. But I think that using less effort to get better output can me a measure of intelligence.

Regards,
Gourav Sengupta

On Mon, Jan 6, 2020 at 9:23 AM Enrico Minack <[hidden email]> wrote:
The distinct transformation does not preserve order, you need to distinct first, then orderby.

Enrico


Am 06.01.20 um 00:39 schrieb Mich Talebzadeh:
Hi,

I am working out monthly outgoing etc from an account and I am using the following code

import org.apache.spark.sql.expressions.Window
val wSpec = Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))

joint_accounts.
      select(year(col("transactiondate")).as("Year")
    , month(col("transactiondate")).as("Month")
    , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per Month")
    , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per Month")).
    orderBy(year(col("transactiondate")),month(col("transactiondate"))).
    distinct.
    show(1000,false)

This shows as follows:


|Year|Month|incoming Per Month|outgoing Per Month|
+----+-----+------------------+------------------+
|2019|9    |13958.58          |17920.31          |
|2019|11   |4032.30           |4225.30           |
|2020|1    |1530.00           |1426.91           |
|2019|10   |10029.00          |10067.52          |
|2019|12   |742.00            |814.49            |
+----+-----+------------------+------------------+

 however the orderby is not correct as I expect to see 2010 record and 2019 records in the order of year and month. 

Any suggestions?

Thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.