re: spark streaming / AnalysisException on collect()
have a quick question regarding how to share data (a small data
collection) between a kafka producer and consumer using spark streaming
(A) the data published by a kafka producer is received in order on the kafka consumer side (see (a) copied below).
(B) however, collect() or cache() on a streaming dataframe does not seem to be supported (see links in (b) below): i got this: Exception
in thread "DataProducer" org.apache.spark.sql.AnalysisException:
Queries with streaming sources must be executed with
My question would be:
How can I use the collection data (on a streaming dataframe) arrived on
the consumer side, e.g convert it to an array of objects?
Maybe there's another quick way to use kafka for sharing static data
(instead of streaming) between two spark application services (without
any common spark context and session etc.)?
I have copied some code snippet in (c).
It seems to be a very simple use case scenario to share a global
collection between a spark producer and consumer.
But I spent entire day to try various options and gone thru online
resources such as
Any help would be very much appreciated!
(a) streaming data (df) received on the consumer side (console sink):