@Guillaume, thanks for the tip! It was useful, though, this isn't what is happening in our case.
Turns out, other non-spark tasks run on these nodes intermittently, which we failed to notice. Because these tasks take a lot of memory, spark executors failed to launch on some nodes. Master.scala has a hard, non-configurable limit on the maximum number of executors allowed to fail before removing an application. This shows up in the master's logs (Since, it was the executors that failed and not the tasks, this did not appear in the driver logs).
The thing is, even though, the master has failed the app (it says so on the master ui as well), the app driver continues to run. Its web ui is available, even though its not printing anything.
This seems to happen on spark-0.9.0 as well. The master has removed the application, but the app driver process is still running and its web ui is still available. On spark-0.8.1, we lost slaves as well when this happened. This has not happened so far on spark-0.9.0.