Spark SQL Macros

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark SQL Macros


I have been working on Spark SQL Macros

Spark SQL Macros provide a capability to register custom functions into a Spark Session that is similar to UDF Registration. The difference being that the SQL Macros registration mechanism attempts to translate the function body to an equivalent Spark catalyst Expression with holes(MarcroArg catalyst expressions). Under the covers SQLMacro is a set of scala blackbox macros that attempt to rewrite the scala function body AST into an equivalent catalyst Expression.

The generated expression is encapsulated in a FunctionBuilder that is registered in Spark's FunctionRegistry. Then any function invocation is replaced by the equivalent catalyst Expression with the holes replaced by the calling site arguments.

There are 2 potential performance benefits for replacing function calls with native catalyst expressions:
- evaluation performance. Since we avoid the SerDe cost at the function boundary.
- More importantly, since the plan has native catalyst expressions more optimizations are possible.
  - For example see the taxRate example where discount calculation is eliminated.
  - Pushdown of operations to Datsources has a huge impact. For example see the
    Oracle SQL generated and  pushed when a macro is defined instead of a UDF.

So this has potential benefits for developers of custom functions and DataSources. I am looking for feedback from the community on potential use cases and features to develop.

To use this functionality:
- build the jar by issuing sbt sql/assembly or download from the releases page.
- the jar can be added into any Spark env.
- We have developed with spark-3.1.0 dependency.

The README provides more details, and examples.
This page (
provides even more examples.

Harish Butani.