ML Linear and Logistic Regression - Poor Performance

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

ML Linear and Logistic Regression - Poor Performance

Zois Theodoros
Hello,

I am running an experiment to test logistic and linear regression on spark using MLlib.

My dataset is only 128MB and something weird happens. Linear regression takes about 127 seconds either with 1 or 500 iterations. On the other hand, logistic regression most of the times does not manage to finish either with 1 iteration. I usually get memory heap error.

In both cases I use the default cores and memory for driver and I spawn 1 executor with 1 core and 2GBs of memory.

Except that, I get a warning about NativeBLAS. I searched in the Internet and I found that I have to install libgfortran. Even if I did it the warning remains.

Any ideas for the above?

Thank you,
- Thodoris

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ML Linear and Logistic Regression - Poor Performance

Irving Duran
Are you reformatting the data correctly for logistic (meaning 0 & 1's) before modeling?  What are OS and spark version you using?

Thank You,

Irving Duran


On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois <[hidden email]> wrote:
Hello,

I am running an experiment to test logistic and linear regression on spark using MLlib.

My dataset is only 128MB and something weird happens. Linear regression takes about 127 seconds either with 1 or 500 iterations. On the other hand, logistic regression most of the times does not manage to finish either with 1 iteration. I usually get memory heap error.

In both cases I use the default cores and memory for driver and I spawn 1 executor with 1 core and 2GBs of memory.

Except that, I get a warning about NativeBLAS. I searched in the Internet and I found that I have to install libgfortran. Even if I did it the warning remains.

Any ideas for the above?

Thank you,
- Thodoris

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: ML Linear and Logistic Regression - Poor Performance

Zois Theodoros
I am in CentOS 7 and I use Spark 2.3.0. Below I have posted my code. Logistic regression took 85 minutes and linear regression 127 seconds… 

My dataset as I said is 128 MB and contains: 1000 features and ~100 classes. 


#SparkSession
ss = SparkSession.builder.getOrCreate()


start = time.time()

#Read data
trainData = ss.read.format("csv").option("inferSchema","true").load(file)

#Calculate Features
assembler = VectorAssembler(inputCols=trainData.columns[1:], outputCol="features")
trainData = assembler.transform(trainData)

#Drop columns
dropColumns = trainData.columns
dropColumns = [e for e in dropColumns if e not in ('_c0', 'features')]
trainData = trainData.drop(*dropColumns)

#Rename column from _c0 to label
trainData = trainData.withColumnRenamed("_c0", "label")

#Logistic regression
lr = LogisticRegression(maxIter=500, regParam=0.3, elasticNetParam=0.8)
lrModel = lr.fit(trainData)

#Output Coefficients
print("Coefficients: " + str(lrModel.coefficientMatrix))



- Thodoris


On 27 Apr 2018, at 22:50, Irving Duran <[hidden email]> wrote:

Are you reformatting the data correctly for logistic (meaning 0 & 1's) before modeling?  What are OS and spark version you using?

Thank You,

Irving Duran


On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois <[hidden email]> wrote:
Hello,

I am running an experiment to test logistic and linear regression on spark using MLlib.

My dataset is only 128MB and something weird happens. Linear regression takes about 127 seconds either with 1 or 500 iterations. On the other hand, logistic regression most of the times does not manage to finish either with 1 iteration. I usually get memory heap error.

In both cases I use the default cores and memory for driver and I spawn 1 executor with 1 core and 2GBs of memory.

Except that, I get a warning about NativeBLAS. I searched in the Internet and I found that I have to install libgfortran. Even if I did it the warning remains.

Any ideas for the above?

Thank you,
- Thodoris

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: ML Linear and Logistic Regression - Poor Performance

Irving Duran
May want to think about reducing the number of iterations.  Right now you have it set at 500.

Thank You,

Irving Duran


On Fri, Apr 27, 2018 at 7:15 PM Thodoris Zois <[hidden email]> wrote:
I am in CentOS 7 and I use Spark 2.3.0. Below I have posted my code. Logistic regression took 85 minutes and linear regression 127 seconds… 

My dataset as I said is 128 MB and contains: 1000 features and ~100 classes. 


#SparkSession
ss = SparkSession.builder.getOrCreate()


start = time.time()

#Read data
trainData = ss.read.format("csv").option("inferSchema","true").load(file)

#Calculate Features
assembler = VectorAssembler(inputCols=trainData.columns[1:], outputCol="features")
trainData = assembler.transform(trainData)

#Drop columns
dropColumns = trainData.columns
dropColumns = [e for e in dropColumns if e not in ('_c0', 'features')]
trainData = trainData.drop(*dropColumns)

#Rename column from _c0 to label
trainData = trainData.withColumnRenamed("_c0", "label")

#Logistic regression
lr = LogisticRegression(maxIter=500, regParam=0.3, elasticNetParam=0.8)
lrModel = lr.fit(trainData)

#Output Coefficients
print("Coefficients: " + str(lrModel.coefficientMatrix))



- Thodoris


On 27 Apr 2018, at 22:50, Irving Duran <[hidden email]> wrote:

Are you reformatting the data correctly for logistic (meaning 0 & 1's) before modeling?  What are OS and spark version you using?

Thank You,

Irving Duran


On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois <[hidden email]> wrote:
Hello,

I am running an experiment to test logistic and linear regression on spark using MLlib.

My dataset is only 128MB and something weird happens. Linear regression takes about 127 seconds either with 1 or 500 iterations. On the other hand, logistic regression most of the times does not manage to finish either with 1 iteration. I usually get memory heap error.

In both cases I use the default cores and memory for driver and I spawn 1 executor with 1 core and 2GBs of memory.

Except that, I get a warning about NativeBLAS. I searched in the Internet and I found that I have to install libgfortran. Even if I did it the warning remains.

Any ideas for the above?

Thank you,
- Thodoris

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]