groupBy RDD does not have grouping column ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

groupBy RDD does not have grouping column ?

Manoj Samel
Hi,

If I create a groupBy('a)(Sum('b) as 'foo, Sum('c) as 'bar), then the resulting RDD should have 'a, 'foo and 'bar.

The result RDD just shows 'foo and 'bar and is missing 'a

Thoughts?

Thanks,

Manoj
Reply | Threaded
Open this post in threaded view
|

Re: groupBy RDD does not have grouping column ?

Michael Armbrust
This is similar to how SQL works, items in the GROUP BY clause are not included in the output by default.  You will need to include 'a in the second parameter list (which is similar to the SELECT clause) as well if you want it included in the output.


On Sun, Mar 30, 2014 at 9:52 PM, Manoj Samel <[hidden email]> wrote:
Hi,

If I create a groupBy('a)(Sum('b) as 'foo, Sum('c) as 'bar), then the resulting RDD should have 'a, 'foo and 'bar.

The result RDD just shows 'foo and 'bar and is missing 'a

Thoughts?

Thanks,

Manoj

Reply | Threaded
Open this post in threaded view
|

Re: groupBy RDD does not have grouping column ?

Manoj Samel
Thanks, that works.

It wasn't clear if the second part is just the aggregate specification or any expression.


On Mon, Mar 31, 2014 at 9:03 AM, Michael Armbrust <[hidden email]> wrote:
This is similar to how SQL works, items in the GROUP BY clause are not included in the output by default.  You will need to include 'a in the second parameter list (which is similar to the SELECT clause) as well if you want it included in the output.


On Sun, Mar 30, 2014 at 9:52 PM, Manoj Samel <[hidden email]> wrote:
Hi,

If I create a groupBy('a)(Sum('b) as 'foo, Sum('c) as 'bar), then the resulting RDD should have 'a, 'foo and 'bar.

The result RDD just shows 'foo and 'bar and is missing 'a

Thoughts?

Thanks,

Manoj