I have a question. I am trying to serialize a PySpark ML model to mleap.
However, the model makes use of the SQLTransformer to do some column-based
transformations e.g. adding log-scaled versions of some columns. As we all
know, Mleap doesn't support SQLTransformer - see here :
https://github.com/combust/mleap/issues/126 so I've implemented the former
of these 2 suggestions:
For non-row operations, move the SQL out of the ML Pipeline that you plan to
serialize For row-based operations, use the available ML transformers or
write a custom transformer <- this is where the custom transformer
documentation will help. I've externalized the SQL transformation on the
training data used to build the model, and I do the same for the input data
when I run the model for evaluation.
The problem I'm having is that I'm unable to obtain the same results across
the 2 models.