What is the equivalent of forearchRDD in DataFrames?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the equivalent of forearchRDD in DataFrames?

Noorul Islam K M
Hi all,

I have a Dataframe with 1000 records. I want to split them into 100
each and post to rest API.

If it was RDD, I could use something like this

    myRDD.foreachRDD {
      rdd =>
        rdd.foreachPartition {
          partition => {

This will ensure that code is executed on executors and not on driver.

Is there any similar approach that we can take for Dataframes? I see
examples on stackoverflow with collect() which will bring whole data
to driver.

Thanks and Regards
Noorul

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

jgp
Reply | Threaded
Open this post in threaded view
|

Re: What is the equivalent of forearchRDD in DataFrames?

jgp
Just hints: Repartition in 10? Get the RDD from the dataframe?

What about a forEach row and send every 100? (I just did that actually)

jg


> On Oct 26, 2017, at 13:37, Noorul Islam Kamal Malmiyoda <[hidden email]> wrote:
>
> Hi all,
>
> I have a Dataframe with 1000 records. I want to split them into 100
> each and post to rest API.
>
> If it was RDD, I could use something like this
>
>    myRDD.foreachRDD {
>      rdd =>
>        rdd.foreachPartition {
>          partition => {
>
> This will ensure that code is executed on executors and not on driver.
>
> Is there any similar approach that we can take for Dataframes? I see
> examples on stackoverflow with collect() which will bring whole data
> to driver.
>
> Thanks and Regards
> Noorul
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: What is the equivalent of forearchRDD in DataFrames?

Deepak Sharma
In reply to this post by Noorul Islam K M
df.rdd.foreach

Thanks
Deepak

On Oct 26, 2017 18:07, "Noorul Islam Kamal Malmiyoda" <[hidden email]> wrote:
Hi all,

I have a Dataframe with 1000 records. I want to split them into 100
each and post to rest API.

If it was RDD, I could use something like this

    myRDD.foreachRDD {
      rdd =>
        rdd.foreachPartition {
          partition => {

This will ensure that code is executed on executors and not on driver.

Is there any similar approach that we can take for Dataframes? I see
examples on stackoverflow with collect() which will bring whole data
to driver.

Thanks and Regards
Noorul

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]