[ML] LogisticRegression and dataset's standardization before training

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[ML] LogisticRegression and dataset's standardization before training

Filipp Zhinkin
Hi,

LogisticAggregator [1] scales every sample on every iteration. Without
scaling binaryUpdateInPlace could be rewritten using BLAS.dot and that
would significantly improve performance.
However, there is a comment [2] saying that standardization and
caching of the dataset before training will "create a lot of
overhead".

What kind of overhead it is all about and what is rationale to avoid
scaling dataset prior training?

Thanks,
Filipp.

[1] https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregator.scala#L229
[2] https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregator.scala#L40

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]