Quantcast

reducing by only count within a list

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

reducing by only count within a list

bluefrog
This post has NOT been accepted by the mailing list yet.
Hi
I am attempting to aggregate a large file. It looks something like this:
[['a', [1, 2]], ['b', [3, 0]] , ['a', [3,2]] ]

I've have to aggregate the file in a variety of ways, these are some of the ways that I have completed thus far, but I cannot figure out the one below. Can anybody suggest a way?
fc_rdd = sc.parallelize( [['a', [1, 2]], ['b', [3, 0]] , ['a', [3,2]] ])
print (fc_rdd.collect())

value_result = fc_rdd.map(lambda x: x[1]).reduce(lambda x,y: [ x[0]+y[0], x[1]+y[1]] ) 
unique_key_result = fc_rdd.map(lambda x: x[0]).distinct() 
print(unique_key_result.collect())
# aggregated_key_value_result = = fc_rdd.map(lambda x,y: x[0] [ y[0]  ])

How can I end up with RDD result such as this:
[ ['a', [4, 4]], ['b', [3, 0]] ]

?

Thanks

Loading...