I have googled and find similar question without good answer, http://stackoverflow.com/questions/24520225/writing-to-hadoop-distributed-file-system-multiple-times-with-spark
in short, I would like to separate raw data and divide by some key, for example, create date, and put the in directory named by date, so that I can easily access portion of data later.
for now I have to extract all keys and then filter by key and save to file repeatly. are there any good way to do this? or maybe I shouldn't do such thing?
1. be careful, HDFS are better for large files, not bunches of small files.
2. if that's really what you want, roll it your own.
2014-08-12 21:34 GMT+08:00 Fengyun RAO <[hidden email]>:
understand, thank you
small file is a problem, I am considering process data before put them in hdfs.
On Tue, Aug 12, 2014 at 9:37 PM, Fengyun RAO <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|