MLlib Naive Bayes classifier confidence

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

MLlib Naive Bayes classifier confidence

jatinpreet
Hi,

Is there a way to get the confidence value of a prediction with  MLlib's implementation of Naive Baye's classification. I wish to eliminate the samples that were classified with low confidence.

Thanks,
Jatin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MLlib Naive Bayes classifier confidence

sowen

Not directly. If you could access brzPi and brzTheta in the NaiveBayesModel, you could repeat its same computation in predict() and exponentiate it to get back class probabilities, since input and internal values are in log space.

Hm I wonder how people feel about exposing those fields or a different method to expose class probabilities? Seems useful since it is conceptually directly available.

On Nov 10, 2014 5:46 AM, "jatinpreet" <[hidden email]> wrote:
Hi,

Is there a way to get the confidence value of a prediction with  MLlib's
implementation of Naive Baye's classification. I wish to eliminate the
samples that were classified with low confidence.

Thanks,
Jatin



-----
Novice Big Data Programmer
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MLlib Naive Bayes classifier confidence

jatinpreet
This post was updated on .
Thanks for the answer. The variables brzPi and brzTheta are declared private. I am writing my code with Java otherwise I could have replicated the scala class and performed desired computation, which is as I observed  a multiplication of brzTheta with test vector and addition with brzPi.

Any suggestions of a way out other than replicating the whole functionality of Naive Baye's model in Java? That would be a time consuming process.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MLlib Naive Bayes classifier confidence

sowen
It's hacky, but you could access these fields via reflection. It'd be
better to propose opening them up in a PR.

On Mon, Nov 10, 2014 at 9:25 AM, jatinpreet <[hidden email]> wrote:
> Thanks for the answer. The variables brzPi and brzTheta are declared private.
> I am writing my code with Java otherwise I could have replicated the scala
> class and performed desired computation, which is as I observed  a
> multiplication of brzTheta  with test vector and adding this value to brzPi.
>
> Any suggestions of a way out other than replicating the whole functionality
> of Naive Baye's model in Java? That would be a time consuming process.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MLlib Naive Bayes classifier confidence

jatinpreet
Thanks, I will try it out and raise a request for making the variables accessible.

An unrelated question, do you think the probability value thus calculated will be a good measure of confidence in prediction? I have been reading mixed opinions about the same.

Jatin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MLlib Naive Bayes classifier confidence

MariusFS
In reply to this post by sowen
Are we sure that exponentiating will give us the probabilities? I did some tests by cloning the MLLIb class and adding the required code but the calculated probabilities do not add up to 1.

I tried something like :

  def predictProbs(testData: Vector): (BDV[Double], BDV[Double]) = {
    val logProbs = brzPi + brzTheta * new BDV[Double](testData.toArray)
    val probs = logProbs.map(x => math.exp(x))
    (logProbs, probs)
  }

This was because I need the actual probs to process downstream from this...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MLlib Naive Bayes classifier confidence

sowen
Probabilities won't sum to 1 since this expression doesn't incorporate
the probability of the evidence, I imagine? it's constant across
classes so is usually excluded. It would appear as a "-
log(P(evidence))" term.

On Tue, Dec 2, 2014 at 10:44 AM, MariusFS <[hidden email]> wrote:

> Are we sure that exponentiating will give us the probabilities? I did some
> tests by cloning the MLLIb class and adding the required code but the
> calculated probabilities do not add up to 1.
>
> I tried something like :
>
>   def predictProbs(testData: Vector): (BDV[Double], BDV[Double]) = {
>     val logProbs = brzPi + brzTheta * new BDV[Double](testData.toArray)
>     val probs = logProbs.map(x => math.exp(x))
>     (logProbs, probs)
>   }
>
> This was because I need the actual probs to process downstream from this...
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456p20175.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MLlib Naive Bayes classifier confidence

MariusFS
That was it, Thanks. (Posting here so people know it's the right answer in case they have the same need :) ).


sowen wrote
Probabilities won't sum to 1 since this expression doesn't incorporate
the probability of the evidence, I imagine? it's constant across
classes so is usually excluded. It would appear as a "-
log(P(evidence))" term.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MLlib Naive Bayes classifier confidence

singhkanak986
This post has NOT been accepted by the mailing list yet.
This post was updated on .
In reply to this post by sowen
Hi,

From what I've inferred, you can multiply the theta matrix with the (evidence) feature values, add it to the class priors (pi), exponentiate these values, and you're supposed to get the conditional probability of class given evidence, for each class value.

My question is, why does this work? Specifically, why are the multiplication, addition and exponentiating steps supposed to give you the probability? What is happening under the hood? What are the theta values exactly.

PS: I know how Naive Bayes works theoretically. I also read spark's short description of theta - log class conditional probabilities, but what does that mean?
Loading...