Aggregating large objects and reducing memory pressure
I am writing here because I need help/advice on how to perform aggregations
In my current setup I have a Accumulator object which is used as zeroValue
for the foldByKey function. This Accumulator object can get very large since
the accumulations also include lists and maps. This in turn demands that we
use larger and larger memory machines or risk breakdown of jobs due to out
of memory errors.
I have been thinking how to tackle this problem and try to come up with
solutions but I would like to hear what are the best practices or how others
have deal with this kind of problem.