How to use schema from one of the columns of a dataset to parse another column and create a flattened dataset using Spark Streaming 2.2.0?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

How to use schema from one of the columns of a dataset to parse another column and create a flattened dataset using Spark Streaming 2.2.0?

kant kodali
Hi All,

How to use value (schema) of one of the columns of a dataset to parse another column and create a flattened dataset using Spark Streaming 2.2.0?

I have the following source data frame that I create from reading messages from Kafka

col1: string
col2: json string


      col1    |   col2 
---------------------------------------------------------------------------
   schemaUri1 | "{"name": "foo", "zipcode": 11111}"
   schemaUri2 | "{"name": "bar", "zipcode": 11112, "id": 1234}"
   schemaUri1 | "{"name": "foobar", "zipcode": 11113}"
   schemaUri2 | "{"name": "barfoo", "zipcode": 11114, "id": 1235, "interest": "reading"}"

My target data frame

name   | zipcode | id  | interest
-------------------------------- 
foo    | 11111  | null | null
bar    | 11112  | 1234 | null
foobar | 11113  | null | null
barfoo | 11114  | 1235 | reading

Assume you have the following function

// This function returns a StructType that represents a schema for a given schemaUri

public StructType getSchema(String schemaUri)