Single spark streaming job to read incoming events with dynamic schema

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Single spark streaming job to read incoming events with dynamic schema

act_coder
This post was updated on .
I am trying to create a spark structured streaming job which reads from a
Kafka topic and the events coming from that Kafka topic will have different
schemas (There is no standard schema for the incoming events).

Sample incoming events:

event1: {timestamp:2018-09-28T15:50:57.2420418+00:00, value: 11}
event2: {timestamp:2018-09-28T15:50:57.2420418+00:00, value: 11,
location:abc}
event3: {order_id:1, ordervalue: 11}

How can I create a spark structured streaming to read the above events
without stopping the spark job for making any new schema changes ?

Also we may need to provide schema while using spark.readStream(). I thought
of reading a small subset of incoming data and derive the schema from it.
But, that might not work here as the incoming data is disparate and we may
have different schemas for each of the incoming events.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org