Multiple column aggregations

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple column aggregations

Sonu Jyotshna

Hello,

I have a requirement where I need to group by multiple columns and aggregate them not at same time .. I mean I have a structure which contains accountid, some cols, order id . I need to calculate some scenarios like account having multiple orders so group by account and aggregate will work here but I need to find orderid associated to multiple accounts so may be group by orderid will work here but for better performance on the dataset level can we do in single step? Where both will work or any better approach I can follow . Can you help 


Regards,
Sonu
Reply | Threaded
Open this post in threaded view
|

RE: Multiple column aggregations

Shiva Prashanth Vallabhaneni

Hi Sonu,

 

You could use a query that is similar to the below one. You could further optimize the below query by adding a WHERE clause. I would suggest that you benchmark the performance of both approaches (multiple group-by queries vs single query with multiple window functions), before choosing one of these options. Before running the benchmark, I would ensure that the underlying data is stored in a columnar storage format with compression enabled. For instance, you could use parquet file format with block-level compression using Snappy.

 

SELECT  SUM(CASE WHEN accountRank =2 THEN 1 ELSE 0 END) AS accountsWithMoreThanOneOrder,

SUM(CASE WHEN orderRank =2 THEN 1 ELSE 0 END) AS ordersWithMoreThanOneAccount,

FROM   (

                  SELECT  accountNo,

                               orderNo, 

  rank() OVER (PARTITION BY orderNo ORDER BY accountNo) AS orderRank,

 rank() OVER (PARTITION BY accountNo ORDER BY orderNo) AS accountRank

                  FROM   accountOrders

                )

 

P.S – You will need to check the above query for any syntax errors.

 

– Shiva

 

From: Sonu Jyotshna <[hidden email]>
Sent: Saturday, February 9, 2019 10:17 AM
To: [hidden email]
Subject: Multiple column aggregations

 

 

Hello,

 

I have a requirement where I need to group by multiple columns and aggregate them not at same time .. I mean I have a structure which contains accountid, some cols, order id . I need to calculate some scenarios like account having multiple orders so group by account and aggregate will work here but I need to find orderid associated to multiple accounts so may be group by orderid will work here but for better performance on the dataset level can we do in single step? Where both will work or any better approach I can follow . Can you help 

 

 

Regards,

Sonu


Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All emails sent from or to Tavant Technologies may be subject to our monitoring procedures.