[pyspark 2.3+] Add scala library to pyspark app and use to derive columns

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[pyspark 2.3+] Add scala library to pyspark app and use to derive columns

rishishah.star
Hi All,

I have a use case where I need to utilize java/scala for regex mapping (as lookbehinds are not well supported with python).. However our entire code is python based so was wondering if there's a suggested way of creating a scala/java lib and use that within pyspark..

I came across this, https://diogoalexandrefranco.github.io/scala-code-in-pyspark/ - will try it out but my colleague ran into some issues with serialization before while trying to use java lib with pyspark.

Typical use case is to use library functions to derive columns.

Any input helps, appreciate it!

--
Regards,

Rishi Shah