Hard to say without a lot more info, but 76.5K tasks is very large. How big are the tasks / how long do they take? if very short, you should repartition down.
Do you end up with 800 executors? if so why 2 per machine? that generally is a loss at this scale of worker. I'm confused because you have 4000 tasks running, which would be just 10 per executor as well.
What is the data input format? it's far faster to 'count' parquet as it's just a metadata read.
Is anything else happening besides count() after the data is read?