Spark Mllib logistic regression setWeightCol illegal argument exception

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark Mllib logistic regression setWeightCol illegal argument exception

Patrick-2
Hi Spark Users,

I am trying to solve a class imbalance problem, I figured out, spark supports setting weight in its API but I get IIlegal Argument exception weight column do not exist, but it do exists in the dataset. Any recommedation to go about this problem ? I am using Pipeline API with Logistic regression model and TestTrainSplit.

LogisticRegression l;
l.setWeightCol()


Caused by: java.lang.IllegalArgumentException: Field "weight" does not exist.

at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:267)

at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:267)

at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)

at scala.collection.AbstractMap.getOrElse(Map.scala:59)

at org.apache.spark.sql.types.StructType.apply(StructType.scala:266)

at org.apache.spark.ml.util.SchemaUtils$.checkNumericType(SchemaUtils.scala:71)

at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:58)

at org.apache.spark.ml.classification.Classifier.org$apache$spark$ml$classification$ClassifierParams$$super$validateAndTransformSchema(Classifier.scala:58)

at org.apache.spark.ml.classification.ClassifierParams$class.validateAndTransformSchema(Classifier.scala:42)

at org.apache.spark.ml.classification.ProbabilisticClassifier.org$apache$spark$ml$classification$ProbabilisticClassifierParams$$super$validateAndTransformSchema(ProbabilisticClassifier.scala:53)

at org.apache.spark.ml.classification.ProbabilisticClassifierParams$class.validateAndTransformSchema(ProbabilisticClassifier.scala:37)

at org.apache.spark.ml.classification.LogisticRegression.org$apache$spark$ml$classification$LogisticRegressionParams$$super$validateAndTransformSchema(LogisticRegression.scala:278)

at org.apache.spark.ml.classification.LogisticRegressionParams$class.validateAndTransformSchema(LogisticRegression.scala:265)

at org.apache.spark.ml.classification.LogisticRegression.validateAndTransformSchema(LogisticRegression.scala:278)

at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:144)

at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:184)

at org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:184)

at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)

at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)

at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)

at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:184)

at org.apache.spark.ml.tuning.ValidatorParams$class.transformSchemaImpl(ValidatorParams.scala:77)

at org.apache.spark.ml.tuning.TrainValidationSplit.transformSchemaImpl(TrainValidationSplit.scala:67)

at org.apache.spark.ml.tuning.TrainValidationSplit.transformSchema(TrainValidationSplit.scala:180)

at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)

at org.apache.spark.ml.tuning.TrainValidationSplit.fit(TrainValidationSplit.scala:121)


Thanks in advance,