Reverse MinMaxScaler in SparkML

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Reverse MinMaxScaler in SparkML

Tomasz Dudek
Hello,

since the similar question on StackOverflow remains unanswered ( https://stackoverflow.com/questions/46092114/is-there-no-inverse-transform-method-for-a-scaler-like-minmaxscaler-in-spark ) and perhaps there is a solution that I am not aware of, I'll ask:

After traning MinMaxScaler(or similar scaler) is there any built-in way to revert the process? What I mean is to transform the scaled data back to its original form. SKlearn has a dedicated method inverse_transform that does exactly that.

I can, of course, get the originalMin/originalMax Vectors from the MinMaxScalerModel and then map the values myself but it would be nice to have it built-in.

Yours,
Tomasz

Reply | Threaded
Open this post in threaded view
|

Re: Reverse MinMaxScaler in SparkML

MLnick
This would be interesting and a good addition I think.

It bears some thought about the API though. One approach is to have an "inverseTransform" method similar to sklearn.

The other approach is to "formalize" something like StringIndexerModel -> IndexToString. Here, the inverse transformer is a standalone transformer. It could be returned from a "getInverseTransformer" method, for example.

The former approach is simpler, but cannot be used in pipelines (which work on "fit" / "transform"). The latter approach is more cumbersome, but fits better into pipelines. 

So it depends on the use cases - i.e. how common is it to use the inverse transform function within a pipeline (for StringIndexer <-> IndexToString it is quite common to get back the labels, while for other transformers it may or may not be). 

On Mon, 8 Jan 2018 at 11:10 Tomasz Dudek <[hidden email]> wrote:
Hello,

since the similar question on StackOverflow remains unanswered ( https://stackoverflow.com/questions/46092114/is-there-no-inverse-transform-method-for-a-scaler-like-minmaxscaler-in-spark ) and perhaps there is a solution that I am not aware of, I'll ask:

After traning MinMaxScaler(or similar scaler) is there any built-in way to revert the process? What I mean is to transform the scaled data back to its original form. SKlearn has a dedicated method inverse_transform that does exactly that.

I can, of course, get the originalMin/originalMax Vectors from the MinMaxScalerModel and then map the values myself but it would be nice to have it built-in.

Yours,
Tomasz