Dataset - withColumn and withColumnRenamed that accept Column type

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Dataset - withColumn and withColumnRenamed that accept Column type

Nirav Patel

Is there a version of withColumn or withColumnRenamed that accept Column instead of String? That way I can specify FQN in case when there is duplicate column names. 

I can Drop column based on Column type argument then why can't I rename them based on same type argument.

Use case is, I have Dataframe with duplicate columns at end of the join. Most of the time I drop duplicate but now I need to rename one of those column. I can not do it because there is no API that . I can rename it before the join but that is not preferred. 


def
withColumn(colName: String, col: Column): DataFrame
Returns a new Dataset by adding a column or replacing the existing column that has the same name.

def
withColumnRenamed(existingName: String, newName: String): DataFrame
Returns a new Dataset with a column renamed.



I think there should also be this one:

def
withColumnRenamed(existingName: Column, newName: Column): DataFrame
Returns a new Dataset with a column renamed.





What's New with Xactly

        
Reply | Threaded
Open this post in threaded view
|

Re: Dataset - withColumn and withColumnRenamed that accept Column type

Sathi Chowdhury
Hi,
My question is about ability to integrate spark streaming with multiple clusters.Is it a supported use case. An example of that is that two topics owned by different group and they have their own kakka infra .
Can i have two dataframes as a result of spark.readstream listening to different kafka clueters in the same spark screaming job?
Any one has solved this usecase before? 


Thanks.
Sathi
Reply | Threaded
Open this post in threaded view
|

Spark streaming connecting to two kafka clusters

Sathi Chowdhury
Hi,
My question is about ability to integrate spark streaming with multiple clusters.Is it a supported use case. An example of that is that two topics owned by different group and they have their own kakka infra .
Can i have two dataframes as a result of spark.readstream listening to different kafka clueters in the same spark screaming job?
Any one has solved this usecase before? 


Thanks.
Sathi
Reply | Threaded
Open this post in threaded view
|

Re: Dataset - withColumn and withColumnRenamed that accept Column type

sathich
In reply to this post by Nirav Patel
this may work
val df_post= listCustomCols
        .foldLeft(df_pre){(tempDF, listValue) =>
          tempDF.withColumn(
            listValue.name,
            new Column(listValue.name.toString + funcUDF(listValue.name))
            )

and outsource the renaming to an udf

or  you can rename the column of one of the datasets before join itself.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dataset - withColumn and withColumnRenamed that accept Column type

Tathagata Das
In reply to this post by Sathi Chowdhury
Yes. Yes you can.

On Tue, Jul 17, 2018 at 11:42 AM, Sathi Chowdhury <[hidden email]> wrote:
Hi,
My question is about ability to integrate spark streaming with multiple clusters.Is it a supported use case. An example of that is that two topics owned by different group and they have their own kakka infra .
Can i have two dataframes as a result of spark.readstream listening to different kafka clueters in the same spark screaming job?
Any one has solved this usecase before? 


Thanks.
Sathi