Pyspark application hangs (no error messages) on Python RDD .map

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Pyspark application hangs (no error messages) on Python RDD .map

Daniel Stojanov

This code will hang indefinitely at the last line (the .map()). Interestingly, if I run the same code at the beginning of my application (removing the .write step) it executes as expected. Otherwise, the code appears further along in my application which is where it hangs. The debugging message "I saw a row" never appears in the executor's standard output.

Note, this error occurs when running on a yarn cluster, but not on a standalone cluster or in local mode. I have tried running with num-cores=1 and 1 executor.

I have been working on this for a long time, any clues would be appreciated.


def map_to_keys(row):
    print("I saw a row", row["id"])
    return (hash(row["id"]), row)

df ="orc").load("/tmp/df_full")
rdd =