array_sort function behaviour

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

array_sort function behaviour

neerajbhadani
Hi All,
   I need to sort the array<struct> based on a particular element from a struct. I am trying to use the "array_sort" function and could see that by default it is sorting the array but based on the first numerical element. Is this the expected behaviour? PFB sample code and output.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
//// SAMPLE CODE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
val jsonData = """
{
"topping":
[
{ "id": "5001", "id1": "5001", "type": "None" },
{ "id": "5002", "id1": "5008", "type": "Glazed" },
{ "id": "5005", "id1": "5007", "type": "Sugar" },
{ "id": "5007", "id1": "5002", "type": "Powdered Sugar" },
{ "id": "5006", "id1": "5005", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "id1": "5004", "type": "Chocolate" },
{ "id": "5004", "id1": "5003", "type": "Maple" }
]
}
"""
val json_df = spark.read.json(Seq(jsonData).toDS)
val sort_df = json_df.select(array_sort($"topping").as("sort_col"))
display(sort_df)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
//// OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Screenshot 2020-05-19 12.06.30.png
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

As you could see the above output is sorted based on the "id" element which is the first numerical element in the struct.

Is there any way to specify the element based on which sorting can be done?

Regards,
Neeraj
Reply | Threaded
Open this post in threaded view
|

回复: array_sort function behaviour

Liu Genie
I will extract the element I want to sort, then combine it with the old struct as a new struct whose first element is what I want to sort. 

发件人: neeraj bhadani <[hidden email]>
发送时间: 2020年5月19日 19:09
收件人: user <[hidden email]>
主题: array_sort function behaviour
 
Hi All,
   I need to sort the array<struct> based on a particular element from a struct. I am trying to use the "array_sort" function and could see that by default it is sorting the array but based on the first numerical element. Is this the expected behaviour? PFB sample code and output.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
//// SAMPLE CODE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
val jsonData = """
{
"topping":
[
{ "id": "5001", "id1": "5001", "type": "None" },
{ "id": "5002", "id1": "5008", "type": "Glazed" },
{ "id": "5005", "id1": "5007", "type": "Sugar" },
{ "id": "5007", "id1": "5002", "type": "Powdered Sugar" },
{ "id": "5006", "id1": "5005", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "id1": "5004", "type": "Chocolate" },
{ "id": "5004", "id1": "5003", "type": "Maple" }
]
}
"""
val json_df = spark.read.json(Seq(jsonData).toDS)
val sort_df = json_df.select(array_sort($"topping").as("sort_col"))
display(sort_df)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
//// OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Screenshot 2020-05-19 12.06.30.png
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

As you could see the above output is sorted based on the "id" element which is the first numerical element in the struct.

Is there any way to specify the element based on which sorting can be done?

Regards,
Neeraj