XGBoost Not distributing on cluster having more than 1 worker

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

XGBoost Not distributing on cluster having more than 1 worker

Aakash Basu-2
Hi,

We're trying to use the XGBoost package from DMLC, it runs successfully on a standalone machine, but it gets stuck whenever there is 2 or more worker.

PFA:
Code Filename: test.py
Data: trainvorg.csv

Spark Submit command: spark-submit --master spark://192.168.80.10:7077 --jars "$SPARK_HOME/jars/*.jar" --num-executors 2 --executor-cores 5 --executor-memory 10G --driver-cores 5 --driver-memory 25G --conf spark.sql.shuffle.partitions=100 --conf spark.driver.maxResultSize=2G --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC" --conf spark.default.parallelism=8  --conf spark.scheduler.listenerbus.eventqueue.capacity=20000 /appdata/test.py

Issue being faced:

Screen Shot 2018-09-04 at 5.34.31 PM.png
Any help?

Thanks,
Aakash.


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

test.py (2K) Download Attachment
trainvorg.csv (81K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: XGBoost Not distributing on cluster having more than 1 worker

Aakash Basu-2
Hi all,

This is the error which is the reason behind these retries and failures. Can anyone help understanding as to why it happens and the probably fix for this?

Screen Shot 2018-09-06 at 4.40.31 PM.png

Thanks,
Aakash.

On Thu, Sep 6, 2018 at 3:35 PM Aakash Basu <[hidden email]> wrote:
Hi,

We're trying to use the XGBoost package from DMLC, it runs successfully on a standalone machine, but it gets stuck whenever there is 2 or more worker.

PFA:
Code Filename: test.py
Data: trainvorg.csv

Spark Submit command: spark-submit --master spark://192.168.80.10:7077 --jars "$SPARK_HOME/jars/*.jar" --num-executors 2 --executor-cores 5 --executor-memory 10G --driver-cores 5 --driver-memory 25G --conf spark.sql.shuffle.partitions=100 --conf spark.driver.maxResultSize=2G --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC" --conf spark.default.parallelism=8  --conf spark.scheduler.listenerbus.eventqueue.capacity=20000 /appdata/test.py

Issue being faced:

Screen Shot 2018-09-04 at 5.34.31 PM.png
Any help?

Thanks,
Aakash.