Best practices to keep multiple version of schema in Spark

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Best practices to keep multiple version of schema in Spark

unk1102
Hi I have a couple of datasets where schema keep on changing and I store it
as parquet files. Now I use mergeSchema option while loading these different
schema parquet files in a DataFrame and it works all fine. Now I have a
requirement of maintaining difference between schema over time basically
maintaining list of columns which are latest. Please guide if anybody has
done similar work or in general best practices to maintain changes of
columns over time. Thanks in advance.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]