Visual PySpark Programming

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Visual PySpark Programming

srungarapu vamsi
Hi,

I have the following use case and I did not find a suitable tool which can serve my purpose.

Use case:
Step 1,2,3 are UI driven.
Step 1)  A user should be able to choose data source (example HDFS) and should be able to configure it so that it points to a file.
Step 2)  A user should be able to apply filters, transformations and actions on the dataframe loaded in the previous step.
Step 3)  A user should be able to perform Step 2 any number of times as a chain.
Step 4)  A user should be able to click a Save button which would convert the data flow diagram into a pyspark job.

I found tools like https://seahorse.deepsense.ai/https://www.streamanalytix.com/product/streamanalytix/ which can do this. However, they give a scala/java spark job instead of a pyspark job. Moreover, these are paid products.

a) Are there any opensource solutions which can serve my need?

If not, I would like to build one. In order to build one, I would require a workflow UI editor which i can tweak to serve my purpose.
But I did not find any free workflow UI editor which I can tweak.

b) Are there any open sourced workflow UI editor which can help me in solving my use case?

c) Are there any other interesting approaches to solve my use case?


Thanks,
Vamsi