Spark Streaming logging on Yarn : issue with rolling in yarn-client mode for driver log

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark Streaming logging on Yarn : issue with rolling in yarn-client mode for driver log

chandan prakash
Hi All,
I am running my spark streaming in yarn-client mode.
I want to enable rolling and aggregation  in node manager container.
I am using configs as suggested in spark doc${spark.yarn.app.container.log.dir}/spark.log  in log4j.properties

Also for Aggregation on yarn I have enabled these properties : 
spark.yarn.rolledLog.includePattern=spark*
yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds=3600
 on spark and yarn side respectively.

At executors, my logs are getting rolled up and aggregated after every 1 hour as expected.
But the issue is:
 for driver,  in yarn-client mode, ${spark.yarn.app.container.log.dir} value is not available when driver starts and so for driver ,so I am not able to see driver logs in yarn app container directory.
My restrictions are:
1. want to use yarn-client mode only
2. want to enable logging in yarn container only so that it is aggregated and backed up by yarn every hour to hdfs/s3

How can I get a workaround this to enable driver logs rolling and aggregation as well?

Any pointers will be helpful.
thanks in advance.

--
Chandan Prakash