I wonder if someone could help me in finding the solution to a rather vague exception that we are getting. I am attaching the STDOUT & STDERR files when we execute spark-submit. The exception message that we are getting is per below
“org.apache.spark.util.TaskCompletionListenerException: org.codehaus.jackson.JsonGenerationException: Incomplete surrogate pair: first char 0xdf46, second 0x5b”
This normally happens and according to stack trace is from the code (excerpt).
collect on component") val
distinctComps = ss.sql("SELECT
CAST(componentID AS VARCHAR) componentID FROM components_DF GROUP BY componentID") // .repartition(repartition_size) .collect()
What makes it interesting is that the same dataset when re-invoking the spark-submit again will complete.