Spark SQL Macros provide a capability to register custom functions into a Spark Session that is similar to UDF Registration. The difference being that the SQL Macros registration mechanism attempts to translate the function body to an equivalent Spark catalyst Expression with holes(MarcroArg catalyst expressions). Under the covers SQLMacro is a set of scala blackbox macros that attempt to rewrite the scala function body AST into an equivalent catalyst Expression.
The generated expression is encapsulated in a FunctionBuilder that is registered in Spark's FunctionRegistry. Then any function invocation is replaced by the equivalent catalyst Expression with the holes replaced by the calling site arguments.
There are 2 potential performance benefits for replacing function calls with native catalyst expressions: - evaluation performance. Since we avoid the SerDe cost at the function boundary. - More importantly, since the plan has native catalyst expressions more optimizations are possible. - For example see the taxRate example where discount calculation is eliminated. - Pushdown of operations to Datsources has a huge impact. For example see the
Oracle SQL generated and pushed when a macro is defined instead of a UDF.
So this has potential benefits for developers of custom functions and DataSources. I am looking for feedback from the community on potential use cases and features to develop.
To use this functionality: - build the jar by issuing sbt sql/assembly or download from the releases page. - the jar can be added into any Spark env. - We have developed with spark-3.1.0 dependency.