shuffle mathematic formulat

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

shuffle mathematic formulat

asma zgolli
dear spark contributors, 

I'm searching for a way to model spark shuffle cost and i wonder if there s mathematic formulas to compute "shuffle read " and "shuffle write" sizes in the stages view in spark UI. 
if there isn't, are there any references to head start in this. 

thank you for the help and the directions
yours sincerely 
Asma ZGOLLI

Ph.D. student in data engineering - computer science
Reply | Threaded
Open this post in threaded view
|

Re: shuffle mathematic formulat

Alonso
I would have to check it, but in principle it could be done by checking the streaming logs, so that once you detect when a shuffle operation starts and ends, you can know the total operation time.



El mar., 4 feb. 2020 a las 12:58, asma zgolli (<[hidden email]>) escribió:
dear spark contributors, 

I'm searching for a way to model spark shuffle cost and i wonder if there s mathematic formulas to compute "shuffle read " and "shuffle write" sizes in the stages view in spark UI. 
if there isn't, are there any references to head start in this. 

thank you for the help and the directions
yours sincerely 
Asma ZGOLLI

Ph.D. student in data engineering - computer science


--