It happens when there are empty columns. Adding a very small smoothing

factor should help. Btw, I notice that the computation of variance

there is not stable, which should use the stable method implemented in

RDD[Double]. -Xiangrui

On Tue, Jan 28, 2014 at 5:22 AM, yinxusen <

[hidden email]> wrote:

> Hi all,

>

> These days I test Lasso and ridge regression in MLlib, and I find an error

> of Double.Nan. While other classification and regression methods do very

> well.

>

> Finally I find that Lasso and RidgeRegression call computeStats() function

> to compute mean and SD (standard deviation) for normalizing input data.

> However, some returned SDs are zeroes. So when encountering 0.0 / 0.0, there

> will be a Nan error.

>

> How about setting directly to zero if both the divisor and dividend are

> zeroes, and adding a smoothing factor (e.g. 1.0e-10) if the dividend alone

> is zero? Or anyone have better ideas ?

>

> Thanks !

>

>

>

> --

> View this message in context:

http://apache-spark-user-list.1001560.n3.nabble.com/computeStats-in-MLUtils-will-cause-Nan-not-a-number-error-tp980.html> Sent from the Apache Spark User List mailing list archive at Nabble.com.