Hi,
Is there a way to get the confidence value of a prediction with MLlib's implementation of Naive Baye's classification. I wish to eliminate the samples that were classified with low confidence. Thanks, Jatin 
Not directly. If you could access brzPi and brzTheta in the NaiveBayesModel, you could repeat its same computation in predict() and exponentiate it to get back class probabilities, since input and internal values are in log space. Hm I wonder how people feel about exposing those fields or a different method to expose class probabilities? Seems useful since it is conceptually directly available. On Nov 10, 2014 5:46 AM, "jatinpreet" <[hidden email]> wrote:
Hi, 
This post was updated on .
Thanks for the answer. The variables brzPi and brzTheta are declared private. I am writing my code with Java otherwise I could have replicated the scala class and performed desired computation, which is as I observed a multiplication of brzTheta with test vector and addition with brzPi.
Any suggestions of a way out other than replicating the whole functionality of Naive Baye's model in Java? That would be a time consuming process. 
It's hacky, but you could access these fields via reflection. It'd be
better to propose opening them up in a PR. On Mon, Nov 10, 2014 at 9:25 AM, jatinpreet <[hidden email]> wrote: > Thanks for the answer. The variables brzPi and brzTheta are declared private. > I am writing my code with Java otherwise I could have replicated the scala > class and performed desired computation, which is as I observed a > multiplication of brzTheta with test vector and adding this value to brzPi. > > Any suggestions of a way out other than replicating the whole functionality > of Naive Baye's model in Java? That would be a time consuming process. >  To unsubscribe, email: [hidden email] For additional commands, email: [hidden email] 
Thanks, I will try it out and raise a request for making the variables accessible.
An unrelated question, do you think the probability value thus calculated will be a good measure of confidence in prediction? I have been reading mixed opinions about the same. Jatin 
In reply to this post by sowen
Are we sure that exponentiating will give us the probabilities? I did some tests by cloning the MLLIb class and adding the required code but the calculated probabilities do not add up to 1.
I tried something like : def predictProbs(testData: Vector): (BDV[Double], BDV[Double]) = { val logProbs = brzPi + brzTheta * new BDV[Double](testData.toArray) val probs = logProbs.map(x => math.exp(x)) (logProbs, probs) } This was because I need the actual probs to process downstream from this... 
Probabilities won't sum to 1 since this expression doesn't incorporate
the probability of the evidence, I imagine? it's constant across classes so is usually excluded. It would appear as a " log(P(evidence))" term. On Tue, Dec 2, 2014 at 10:44 AM, MariusFS <[hidden email]> wrote: > Are we sure that exponentiating will give us the probabilities? I did some > tests by cloning the MLLIb class and adding the required code but the > calculated probabilities do not add up to 1. > > I tried something like : > > def predictProbs(testData: Vector): (BDV[Double], BDV[Double]) = { > val logProbs = brzPi + brzTheta * new BDV[Double](testData.toArray) > val probs = logProbs.map(x => math.exp(x)) > (logProbs, probs) > } > > This was because I need the actual probs to process downstream from this... > > > > >  > View this message in context: http://apachesparkuserlist.1001560.n3.nabble.com/MLlibNaiveBayesclassifierconfidencetp18456p20175.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > >  > To unsubscribe, email: [hidden email] > For additional commands, email: [hidden email] >  To unsubscribe, email: [hidden email] For additional commands, email: [hidden email] 
That was it, Thanks. (Posting here so people know it's the right answer in case they have the same need :) ).

This post has NOT been accepted by the mailing list yet.
This post was updated on .
In reply to this post by sowen
Hi,
From what I've inferred, you can multiply the theta matrix with the (evidence) feature values, add it to the class priors (pi), exponentiate these values, and you're supposed to get the conditional probability of class given evidence, for each class value. My question is, why does this work? Specifically, why are the multiplication, addition and exponentiating steps supposed to give you the probability? What is happening under the hood? What are the theta values exactly. PS: I know how Naive Bayes works theoretically. I also read spark's short description of theta  log class conditional probabilities, but what does that mean? 
Free forum by Nabble  Edit this page 