Java Spark to Python spark integration

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Java Spark to Python spark integration

Manohar Rao
I would like to know if its possible to invoke python spark code from java.

I have a java based framework where 
a sparksession is created and a some dataframes are passed as argument to an api .
interface   Transformation
     Dataset transform(Set<Dataset> inputDatasets , SparkSession spark);

A user of this framework can them implement a transformation and the framework can then use this custom transformation 
along with rest of the standard transformations . This then integrates into a larger data pipeline.

Some users would like to use python (pyspark ) code to write business logic.

Is there a possibility of passing this java Dataset ( or RDD) via the framework
to python code and then retrieving the python RDD/dataset back as the output to the java framework.

Any reference to some code snippets around this will be helpful  .