Casting nested columns and updated nested struct fields.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Casting nested columns and updated nested struct fields.

Colin Williams-2
Hello,

I'm currently trying to update the schema for a dataframe with nested
columns. I would either like to update the schema itself or cast the
column without having to explicitly select all the columns just to
cast one.

In regards to updating the schema it looks like I would probably need
to write a more complex map on the schema to find the StructFields I
want  to update and update them. I haven't found any examples of this
but it seems like there should be a simpler way to do it.

In regards to changing the column on the dataframe itself, using E.G.

val newDF = df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))

I end up with a new column named "existing.top.level.FIELD_NAME" at
the root level vs updating the nested column to the new type. Then has
anybody worked out how to both update nested column datatype and also
how to update the column type from the nested schema StructType? Are
there any easy ways to do this or is there a reason it is not trivial?

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Casting nested columns and updated nested struct fields.

Colin Williams-2
Seems like it's worthy of filing a bug against withColumn

On Wed, Nov 21, 2018, 6:25 PM Colin Williams <[hidden email] wrote:
Hello,

I'm currently trying to update the schema for a dataframe with nested
columns. I would either like to update the schema itself or cast the
column without having to explicitly select all the columns just to
cast one.

In regards to updating the schema it looks like I would probably need
to write a more complex map on the schema to find the StructFields I
want  to update and update them. I haven't found any examples of this
but it seems like there should be a simpler way to do it.

In regards to changing the column on the dataframe itself, using E.G.

val newDF = df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))

I end up with a new column named "existing.top.level.FIELD_NAME" at
the root level vs updating the nested column to the new type. Then has
anybody worked out how to both update nested column datatype and also
how to update the column type from the nested schema StructType? Are
there any easy ways to do this or is there a reason it is not trivial?
Reply | Threaded
Open this post in threaded view
|

Re: Casting nested columns and updated nested struct fields.

Colin Williams-2
Looks like it's been reported already. It's too bad it's been a year
but should be released into spark 3:
https://issues.apache.org/jira/browse/SPARK-22231
On Fri, Nov 23, 2018 at 8:42 AM Colin Williams
<[hidden email]> wrote:

>
> Seems like it's worthy of filing a bug against withColumn
>
> On Wed, Nov 21, 2018, 6:25 PM Colin Williams <[hidden email] wrote:
>>
>> Hello,
>>
>> I'm currently trying to update the schema for a dataframe with nested
>> columns. I would either like to update the schema itself or cast the
>> column without having to explicitly select all the columns just to
>> cast one.
>>
>> In regards to updating the schema it looks like I would probably need
>> to write a more complex map on the schema to find the StructFields I
>> want  to update and update them. I haven't found any examples of this
>> but it seems like there should be a simpler way to do it.
>>
>> In regards to changing the column on the dataframe itself, using E.G.
>>
>> val newDF = df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))
>>
>> I end up with a new column named "existing.top.level.FIELD_NAME" at
>> the root level vs updating the nested column to the new type. Then has
>> anybody worked out how to both update nested column datatype and also
>> how to update the column type from the nested schema StructType? Are
>> there any easy ways to do this or is there a reason it is not trivial?

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]