Application Timeout

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Application Timeout

Brett Spark
Hello!
When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our standalone Spark "applications" timeout and show as "Finished" after around an hour of time.

Here is a screenshot from the Spark master before it's marked as finished.
image.png
Here is a screenshot from the Spark master after it's marked as finished. (After over an hour of idle time).
image.png
Here are the logs from the Spark Master / Worker:

spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:47,282 INFO master.Master: 172.32.3.66:34570 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:36556 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:37305 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,096 INFO master.Master: Removing app app-20210119204911-0000
spark-worker-2d733568b2a7e82de7b2b09b6daa17e9-7bbb75f9b6-8mv2b worker 2021-01-19 21:55:52,112 INFO shuffle.ExternalShuffleBlockResolver: Application app-20210119204911-0000 removed, cleanupLocalDirs = true

Is there a setting that causes an application to timeout after an hour of a Spark application or Spark worker being idle?

I would like to keep our Spark applications alive as long as possible.

I haven't been able to find a setting in the Spark confs documentation that corresponds to this so i'm wondering if this is something that's hard coded.

Please let me know,
Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: Application Timeout

Jacek Laskowski
Hi Brett,

No idea why it happens, but got curious about this "Cores" column being 0. Is this always the case?

On Tue, Jan 19, 2021 at 11:27 PM Brett Spark <[hidden email]> wrote:
Hello!
When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our standalone Spark "applications" timeout and show as "Finished" after around an hour of time.

Here is a screenshot from the Spark master before it's marked as finished.
image.png
Here is a screenshot from the Spark master after it's marked as finished. (After over an hour of idle time).
image.png
Here are the logs from the Spark Master / Worker:

spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:47,282 INFO master.Master: 172.32.3.66:34570 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:36556 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:37305 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,096 INFO master.Master: Removing app app-20210119204911-0000
spark-worker-2d733568b2a7e82de7b2b09b6daa17e9-7bbb75f9b6-8mv2b worker 2021-01-19 21:55:52,112 INFO shuffle.ExternalShuffleBlockResolver: Application app-20210119204911-0000 removed, cleanupLocalDirs = true

Is there a setting that causes an application to timeout after an hour of a Spark application or Spark worker being idle?

I would like to keep our Spark applications alive as long as possible.

I haven't been able to find a setting in the Spark confs documentation that corresponds to this so i'm wondering if this is something that's hard coded.

Please let me know,
Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: Application Timeout

Brett Spark
Jacek,
Turns out that this was the RPC connection to the master (7077) from the driver closing. We had Istio closing this out as there was a silly idle timeout setting they had after one hour.  

I was able to re-create this by running lsof on the driver for port 7077 and then killing that process. After this, I would see the application mark as "finished"

The fix was to exclude port 7077 on the istio sidecar... it only took me over 6 months to figure this out, so I wanted to share. :)

On Thu, Jan 21, 2021 at 5:39 AM Jacek Laskowski <[hidden email]> wrote:
Hi Brett,

No idea why it happens, but got curious about this "Cores" column being 0. Is this always the case?

On Tue, Jan 19, 2021 at 11:27 PM Brett Spark <[hidden email]> wrote:
Hello!
When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our standalone Spark "applications" timeout and show as "Finished" after around an hour of time.

Here is a screenshot from the Spark master before it's marked as finished.
image.png
Here is a screenshot from the Spark master after it's marked as finished. (After over an hour of idle time).
image.png
Here are the logs from the Spark Master / Worker:

spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:47,282 INFO master.Master: 172.32.3.66:34570 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:36556 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:37305 got disassociated, removing it.
spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master 2021-01-19 21:55:52,096 INFO master.Master: Removing app app-20210119204911-0000
spark-worker-2d733568b2a7e82de7b2b09b6daa17e9-7bbb75f9b6-8mv2b worker 2021-01-19 21:55:52,112 INFO shuffle.ExternalShuffleBlockResolver: Application app-20210119204911-0000 removed, cleanupLocalDirs = true

Is there a setting that causes an application to timeout after an hour of a Spark application or Spark worker being idle?

I would like to keep our Spark applications alive as long as possible.

I haven't been able to find a setting in the Spark confs documentation that corresponds to this so i'm wondering if this is something that's hard coded.

Please let me know,
Thank you!