Cosine Similarity between documents - Rows

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cosine Similarity between documents - Rows

Donni Khan

I have spark job to compute the similarity between text documents:

RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd());      
CoordinateMatrix  rowsimilarity=rowMatrix.columnSimilarities(0.5);
JavaRDD<MatrixEntry> entries = rowsimilarity.entries().toJavaRDD();

List<MatrixEntry> list = entries.collect();

for(MatrixEntry s : list) System.out.println(s);
the MatrixEntry(i, j, value) represents the similarity between columns(let's say the features of documents).
But how can I show the similarity between rows?
suppose I have five documents Doc1,.... Doc5, We would like to show the similarity between all those documnts.
 How do I get that? any help?

Thank you
Donni
Yao
Reply | Threaded
Open this post in threaded view
|

Re: Cosine Similarity between documents - Rows

Yao
You are essential doing document clustering. K-means will do it. You do have to specify the number of clusters up front.

Sent from Email+ secured by MobileIron




From: "Donni Khan" <[hidden email]>
Date: Monday, November 27, 2017 at 7:27:33 AM
To: "[hidden email]" <[hidden email]>
Subject: Cosine Similarity between documents - Rows

I have spark job to compute the similarity between text documents:

RowMatrix rowMatrix = new RowMatrix(vectorsRDD.rdd());      
CoordinateMatrix  rowsimilarity=rowMatrix.columnSimilarities(0.5);
JavaRDD<MatrixEntry> entries = rowsimilarity.entries().toJavaRDD();

List<MatrixEntry> list = entries.collect();

for(MatrixEntry s : list) System.out.println(s);
the MatrixEntry(i, j, value) represents the similarity between columns(let's say the features of documents).
But how can I show the similarity between rows?
suppose I have five documents Doc1,.... Doc5, We would like to show the similarity between all those documnts.
 How do I get that? any help?

Thank you
Donni