I wonder if someone could help me in finding the solution to a rather vague exception that we are getting.   I am attaching the STDOUT & STDERR files when we execute spark-submit.   The exception message that we are getting is per below excerpt.


“org.apache.spark.util.TaskCompletionListenerException: org.codehaus.jackson.JsonGenerationException: Incomplete surrogate pair: first char 0xdf46, second 0x5b”


This normally happens and according to stack trace is from the code (excerpt).



GraphToTableLogger.warn("running collect on component")
val distinctComps = ss.sql("SELECT CAST(componentID AS VARCHAR) componentID FROM components_DF GROUP BY componentID")
// .repartition(repartition_size)





What makes it interesting is that the same dataset when re-invoking the spark-submit again will complete.   

