XGBoost Spark One Model Per Worker Integration

classic Classic list List threaded Threaded
1 message Options
grp
Reply | Threaded
Open this post in threaded view
|

XGBoost Spark One Model Per Worker Integration

grp
Hi There Spark Users,

Been trying to follow allow to this posted gxboost spark databricks notebook (https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1526931011080774/3624187670661048/6320440561800420/latest.html) however keep getting ValueError: bad input shape ().  

Tried a few things with fixing it … complete SO post with details => https://stackoverflow.com/questions/58595442/xgboost-spark-one-model-per-worker-integration

##################################

features = inputTrainingDF.select("features").collect()
lables = inputTrainingDF.select("label").collect()

X = np.asarray(map(lambda v: v[0].toArray(), features))
Y = np.asarray(map(lambda v: v[0], lables))

xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic')

model = xgbClassifier.fit(X, Y)
ValueError: bad input shape () 
##################################

##################################

def trainXGbModel(partitionKey, labelAndFeatures):
  X = np.asarray(map(lambda v: v[1].toArray(), labelAndFeatures))
  Y = np.asarray(map(lambda v: v[0], labelAndFeatures))
  xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic' )
  model =  xgbClassifier.fit(X, Y)
  return [partitionKey, model]

xgbModels = inputTrainingDF\
.select("education", "label", "features")\
.rdd\
.map(lambda row: [row[0], [row[1], row[2]]])\
.groupByKey()\
.map(lambda v: trainXGbModel(v[0], list(v[1])))

xgbModels.take(1)
ValueError: bad input shape ()
##################################

Could someone please try to look at this?

Thank you for your time and research!