Difference between Typed and untyped transformation in dataset API

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Difference between Typed and untyped transformation in dataset API

Akhilanand
What is the key difference between Typed and untyped transformation in dataset API?
How do I determine if its typed or untyped?
Any gotchas when to use what apart from the reason that it does the job for me?
 

Reply | Threaded
Open this post in threaded view
|

RE: Difference between Typed and untyped transformation in dataset API

Yeikel

From what I understand , if the transformation is untyped it will return a Dataframe , otherwise it will return a Dataset.  In the source code you will see that return type is a Dataframe instead of a Dataset and they should also be annotated with @group untypedrel. Thus , you could check the signature of the method to determine if it is untyped or not.

 

In general , anything that changes the type of a column or adds a new column in a Dataset will be untyped. The idea of a Dataset is to stay constant when it comes to the schema. The moment you try to modify the schema , we need to fallback to a Dataframe.

 

For example , withColumn is untyped because it transforms the Dataset(typed) to an untyped structure(Dataframe).

 

From: Akhilanand <[hidden email]>
Sent: Thursday, February 21, 2019 7:35 PM
To: user <[hidden email]>
Subject: Difference between Typed and untyped transformation in dataset API

 

What is the key difference between Typed and untyped transformation in dataset API?

How do I determine if its typed or untyped?

Any gotchas when to use what apart from the reason that it does the job for me?