providing a list parameter for sum function

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

providing a list parameter for sum function

bigel_p
hi,
im using spark dataframe API.
i'm trying to give sum() a list parameter containing columns names as
strings.
when i'm putting columns names directly into the function- the script works'
when i'm trying to provide it to the function as a parameter of type list- i
get the error:
"
py4j.protocol.Py4JJavaError: An error occurred while calling o155.sum.
: java.lang.ClassCastException: java.util.ArrayList cannot be cast to
java.lang.String
"
using same kind of list parameter for groupBy() is working.
this is my script:

groupBy_cols = ['date_expense_int', 'customer_id']
agged_cols_list = ['total_customer_exp_last_m','total_customer_exp_last_3m']

df = df.groupBy(groupBy_cols).sum(agged_cols_list)


when i write it like so it works:
df =
df.groupBy(groupBy_cols).sum('total_customer_exp_last_m','total_customer_exp_last_3m')

i tryied also to give sum() a list of column by using

agged_cols_list2 = []
for i in agged_cols_list:
    agged_cols_list2.append(col(i))

also didn't work



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: providing a list parameter for sum function

nick.gustafson
You’ll need to “unpack” the array using an asterisk in python like so:

df = df.groupBy(groupBy_cols).sum(*agged_cols_list)



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]