Error in java_gateway.py

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Error in java_gateway.py

ClockSlave
Following the code snippets on this thread, I got a working version of XGBoost on pyspark. But one issue I am still facing is the following
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/dummy_package/xgboost/xgboost.py", line 92, in __init__
    self._java_obj = self._new_java_obj("ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator", self.uid)
  File "/Users/ultrauser/Downloads/spark/python/pyspark/ml/wrapper.py", line 61, in _new_java_obj
    java_obj = getattr(java_obj, name)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/py4j/java_gateway.py", line 1598, in __getattr__
    raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
py4j.protocol.Py4JError: ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator does not exist in the JVM
Exception ignored in: 
Traceback (most recent call last):
  File "/Users/ultrauser/Downloads/spark/python/pyspark/ml/wrapper.py", line 105, in __del__
    SparkContext._active_spark_context._gateway.detach(self._java_obj)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/py4j/java_gateway.py", line 2000, in detach
    java_object._detach()
AttributeError: 'NoneType' object has no attribute '_detach'
From what I read on StackOverflow and elsewhere, this looks like an issue of jar locations. I have two jar files that are needed for this code to work
  • xgboost4j-0.72.jar
  • xgboost4j-spark-0.72
But I am not sure how to proceed. This is what I have tried so far
  1. place the xgboost jar files in
    /Library/Java/Extensions
  2. set the environment variables
    import os
    os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars /Users/ultrauser/Downloads/xgboost4j-0.72.jar, /Users/ultrauser/Downloads/xgboost4j-spark-0.72.jar pyspark-shell'
    
  3. Place the jar files in $SPARK_HOME/jars
But the error still persists. Is there something I am missing here?

Sent from the Apache Spark User List mailing list archive at Nabble.com.