what is the difference between action and transformation?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

what is the difference between action and transformation?

Mina
This post has NOT been accepted by the mailing list yet.


Creating RDDs using map, and parallelize, so on works fine but it seems they are doing nothing until some actions are performed. I deliberately supplied a wrong path to the input of sc.textFile() but it worked fine, then I got an error when I did an action to collect RDDs. Could anyone explain it?

Ex:
ParallelelCollectionRDD[38] at parallelize at PythonRDD.scala:204

Thank you.
Joe
Reply | Threaded
Open this post in threaded view
|

Re: what is the difference between action and transformation?

Mina
This post has NOT been accepted by the mailing list yet.
Please help me someone I have been stuck in this issue for weeks. When I do some actions for example count() it is taking so long to perform. I am using hadoop clusters but it seems like something is wrong. because the data I am using is about 1Gbyte which is quite small. Please help me, thank you.