Testing another Dataset after ML training

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Testing another Dataset after ML training

mckunkel
This post has NOT been accepted by the mailing list yet.
Greetings,

Following the example on the AS page for Naive Bayes using Dataset<Row>
https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes

I want to predict the outcome of another set of data. So instead of splitting the data into training and testing, I have 1 set of training and one set of testing. i.e.;
                Dataset<Row> training = spark.createDataFrame(dataTraining, schemaForFrame);
                Dataset<Row> testing = spark.createDataFrame(dataTesting, schemaForFrame);

                NaiveBayes nb = new NaiveBayes();
                NaiveBayesModel model = nb.fit(train);
                Dataset<Row> predictions = model.transform(testing);
                predictions.show();

But I get the error.

17/07/11 13:40:38 INFO DAGScheduler: Job 2 finished: collect at NaiveBayes.scala:171, took 3.942413 s
Exception in thread "main" org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (vector) => vector)
        at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1075)
        at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:144)
        at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:48)
        at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:30)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

...
...
...


How do I perform predictions on other datasets that were not created at a split?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Testing another Dataset after ML training

mckunkel
This post has NOT been accepted by the mailing list yet.
Im not sure why I cannot subscribe, so that everyone can view the conversation.
Help?
Loading...