Run Python User Defined Functions / code in Spark with Scala Codebase

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Run Python User Defined Functions / code in Spark with Scala Codebase

Chetan Khatri
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.



Reply | Threaded
Open this post in threaded view
|

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Chetan Khatri
Can someone please suggest me , thanks 

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <[hidden email]> wrote:
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.



Reply | Threaded
Open this post in threaded view
|

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Prem Sure
try .pipe(.py) on RDD

Thanks,
Prem

On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <[hidden email]> wrote:
Can someone please suggest me , thanks 

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <[hidden email]> wrote:
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.




Reply | Threaded
Open this post in threaded view
|

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Chetan Khatri
Prem sure, Thanks for suggestion.

On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <[hidden email]> wrote:
try .pipe(.py) on RDD

Thanks,
Prem

On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <[hidden email]> wrote:
Can someone please suggest me , thanks 

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <[hidden email]> wrote:
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.





Reply | Threaded
Open this post in threaded view
|

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

jayantshekhar
Hello Chetan,

We have currently done it with .pipe(.py) as Prem suggested.

That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write the results back.

In the Spark/Scala/Java code, you get an RDD of string, which we convert back to a Dataframe.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <[hidden email]> wrote:
Prem sure, Thanks for suggestion.

On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <[hidden email]> wrote:
try .pipe(.py) on RDD

Thanks,
Prem

On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <[hidden email]> wrote:
Can someone please suggest me , thanks 

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <[hidden email]> wrote:
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.






Reply | Threaded
Open this post in threaded view
|

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Chetan Khatri
Hello Jayant,

Thank you so much for suggestion. My view was to  use Python function as transformation which can take couple of column names and return object. which you explained. would that possible to point me to similiar codebase example.

Thanks.

On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar <[hidden email]> wrote:
Hello Chetan,

We have currently done it with .pipe(.py) as Prem suggested.

That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write the results back.

In the Spark/Scala/Java code, you get an RDD of string, which we convert back to a Dataframe.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <[hidden email]> wrote:
Prem sure, Thanks for suggestion.

On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <[hidden email]> wrote:
try .pipe(.py) on RDD

Thanks,
Prem

On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <[hidden email]> wrote:
Can someone please suggest me , thanks 

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <[hidden email]> wrote:
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.







Reply | Threaded
Open this post in threaded view
|

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

jayantshekhar
Hello Chetan,

Sorry missed replying earlier. You can find some sample code here :


We will continue adding more there.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri <[hidden email]> wrote:
Hello Jayant,

Thank you so much for suggestion. My view was to  use Python function as transformation which can take couple of column names and return object. which you explained. would that possible to point me to similiar codebase example.

Thanks.

On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar <[hidden email]> wrote:
Hello Chetan,

We have currently done it with .pipe(.py) as Prem suggested.

That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write the results back.

In the Spark/Scala/Java code, you get an RDD of string, which we convert back to a Dataframe.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <[hidden email]> wrote:
Prem sure, Thanks for suggestion.

On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <[hidden email]> wrote:
try .pipe(.py) on RDD

Thanks,
Prem

On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <[hidden email]> wrote:
Can someone please suggest me , thanks 

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <[hidden email]> wrote:
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.








Reply | Threaded
Open this post in threaded view
|

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Chetan Khatri
Hello Jayant,

Thanks for great OSS Contribution :)

On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar <[hidden email]> wrote:
Hello Chetan,

Sorry missed replying earlier. You can find some sample code here :


We will continue adding more there.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri <[hidden email]> wrote:
Hello Jayant,

Thank you so much for suggestion. My view was to  use Python function as transformation which can take couple of column names and return object. which you explained. would that possible to point me to similiar codebase example.

Thanks.

On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar <[hidden email]> wrote:
Hello Chetan,

We have currently done it with .pipe(.py) as Prem suggested.

That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write the results back.

In the Spark/Scala/Java code, you get an RDD of string, which we convert back to a Dataframe.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <[hidden email]> wrote:
Prem sure, Thanks for suggestion.

On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <[hidden email]> wrote:
try .pipe(.py) on RDD

Thanks,
Prem

On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <[hidden email]> wrote:
Can someone please suggest me , thanks 

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <[hidden email]> wrote:
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.









Reply | Threaded
Open this post in threaded view
|

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

Gourav Sengupta
Hi,

I am not very sure if SPARK data frames apply to your used case, if it does please give a try by creating a UDF in Python and check whether you can call it in Scala or not using select and expr.

Regards,
Gourav Sengupta

On Mon, Jul 16, 2018 at 5:32 AM, Chetan Khatri <[hidden email]> wrote:
Hello Jayant,

Thanks for great OSS Contribution :)

On Thu, Jul 12, 2018 at 1:36 PM, Jayant Shekhar <[hidden email]> wrote:
Hello Chetan,

Sorry missed replying earlier. You can find some sample code here :


We will continue adding more there.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Mon, Jul 9, 2018 at 9:56 PM, Chetan Khatri <[hidden email]> wrote:
Hello Jayant,

Thank you so much for suggestion. My view was to  use Python function as transformation which can take couple of column names and return object. which you explained. would that possible to point me to similiar codebase example.

Thanks.

On Fri, Jul 6, 2018 at 2:56 AM, Jayant Shekhar <[hidden email]> wrote:
Hello Chetan,

We have currently done it with .pipe(.py) as Prem suggested.

That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write the results back.

In the Spark/Scala/Java code, you get an RDD of string, which we convert back to a Dataframe.

Feel free to ping me directly in case of questions.

Thanks,
Jayant


On Thu, Jul 5, 2018 at 3:39 AM, Chetan Khatri <[hidden email]> wrote:
Prem sure, Thanks for suggestion.

On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure <[hidden email]> wrote:
try .pipe(.py) on RDD

Thanks,
Prem

On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri <[hidden email]> wrote:
Can someone please suggest me , thanks 

On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, <[hidden email]> wrote:
Hello Dear Spark User / Dev,

I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. 

Can someone please guide me, which would be best approach to do this. Python function would be mostly transformation function. Also would like to pass Java Function as a String to Spark / Scala job and it applies to RDD / Data Frame and should return RDD / Data Frame.

Thank you.