Whatever you want to do, if you really
have to do it that way, don't use Spark. And the answer to your
question is : Spark automatically "interleaves" stages that can be
interleaved.
Now, I do not believe that you really want to do that. You
probably should just do a filter + map or a flatmap. But explain
what you're trying to achieve so we can recommend you with a
better way.
Guillaume
With so little information about what your code is
actually doing, what you have shared looks likely to be an
anti-pattern to me. Doing many collect actions is something to
be avoided if at all possible, since this forces a lot of
network communication to materialize the results back within the
driver process, and network communication severely constrains
performance.
--
|
Guillaume
PITEL, Président
+33(0)6 25 48 86 80
eXenSa
S.A.S.
41, rue Périer -
92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37
05
|