Fwd: [pyspark][MLlib] Getting WARN FPGrowth: Input data is not cached for cached data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Fwd: [pyspark][MLlib] Getting WARN FPGrowth: Input data is not cached for cached data

Anu B Nair
Hi,

Following is my pyspark code, (attached input sample_fpgrowth.txt and python code along with this mail. Even after I have done cache, I am getting Warning: Input data is not cached.


from pyspark.mllib.fpm import FPGrowth

import pyspark
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')


data = sc.textFile("sample_fpgrowth.txt")
transactions = data.map(lambda line: line.strip().split(' ')).cache()

model = FPGrowth.train(transactions, minSupport=0.2, numPartitions=10)

result = model.freqItemsets().collect()

print(result)



Understood that it is a warning, but just wanted to know in detail

--

Anu




---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

sample_fpgrowth.txt (94 bytes) Download Attachment
rdd_example.py (564 bytes) Download Attachment