Dataset experimental interfaces

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Dataset experimental interfaces

Andrew Old
We are running Spark 2.2.0 in a hadoop cluster and I worked on a proof of concept to read event based data into Spark Datasets and operating over those sets to calculate differences between the event data.

More specifically, ordered position data with odometer values and wanting to calculate the number of miles traveled within certain jurisdictions by vehicle. 

My prototype utilizes some Dataset interfaces (such as map (using Encoders), groupByKey) that are marked experimental (even in the 2.4.0 release).  While I understand experimental means that changes may occur in future releases, I would like to know if others would avoid using the experimental interfaces in any production code at all costs?  We would have control on when we would upgrade to newer versions of Spark so we can test for compatibility when new releases come out but I'm still a bit hesitant to count on these interfaces moving forward.

Since our prototype is showing success, we are considering using it for a new application and I would like to get feedback on if I should consider trying to re-work it using non-experimental interfaces while I have some time.  So far I have found Datasets being great to use to process data and would like to keep using them.