Is it possible to customize Spark TF-IDF implementation

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to customize Spark TF-IDF implementation

Soheil Pourbafrani
Hi, I want to know is it possible to customize the logic of TF_IDF in Apache Spark?
In typical TF_IDF the TF is computed for each word regarding its documents. For example, the TF of word "A" can be differentiated in documents D1 and D2, but I want to see the TF as term frequency among whole documents (like word count). I implemented it using Spark RDDs but I was wondering is it possible to bring it to Spark TF-IDF so I can work with other Spark ML tools such as normalizer and hashing.

Thanks.