Spark History Server log files questions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark History Server log files questions

Hung Vu
Hi,

I have couple questions to ask regarding the Spark history server:

1. Is there a way for a cluster to selectively clean old files? For example, if we want to keep some logs from 3 days ago but also cleaned some logs from 2 days ago, is there a filter or config to do that?
2. We have over 1000 log files each day. If we want to keep those jobs for a week (7000 jobs in total), this would potentially make the load time longer. Is there any suggestion on doing this?
3. We plan to have 2 paths to long-term history server and short-term history server. We can move some log files from short-term to long-term server if we need to do some investigation on that, would this be a good idea. Do you have any input on this?

Thank you in advance!
Reply | Threaded
Open this post in threaded view
|

Re: Spark History Server log files questions

German Schiavon Matteo
Hey!

I don't think you can do selectively removals, never heard of it but who knows..

You can refer here to see all the available options -> https://spark.apache.org/docs/latest/monitoring.html .

In my experience having 4 days worth of logs is enough, usually if something fails you check it right away unless it is the weekend, but depending on the use case you could store more days..



On Mon, 22 Mar 2021 at 23:52, Hung Vu <[hidden email]> wrote:
Hi,

I have couple questions to ask regarding the Spark history server:

1. Is there a way for a cluster to selectively clean old files? For example, if we want to keep some logs from 3 days ago but also cleaned some logs from 2 days ago, is there a filter or config to do that?
2. We have over 1000 log files each day. If we want to keep those jobs for a week (7000 jobs in total), this would potentially make the load time longer. Is there any suggestion on doing this?
3. We plan to have 2 paths to long-term history server and short-term history server. We can move some log files from short-term to long-term server if we need to do some investigation on that, would this be a good idea. Do you have any input on this?

Thank you in advance!