回复: [Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

回复: [Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

maqy1995@outlook.com

    Today I meet the same problem using rdd.collect (), the format of rdd is Tuple2 [Int, Int]. And this problem will appear when the amount of data reaches about 100GB.

    I guess there may be something wrong with deserialization. Has anyone else encountered this problem?

 

Best regards,

maqy

 

发件人: [hidden email]
发送时间: 2020420 10:33
收件人: [hidden email]
主题: [Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

 

Hi all,

I get a Dataset[Row] through the following code:

 
val df: Dataset[Row] = spark.read.format("csv).schema(schema).load("hdfs://master:9000/mydata")
 

After that I want to collect it to the driver:

 
val df_rows: Array[Row] = df.collect()
 

The Spark web ui shows that all tasks have run successfully, but the application did not stop. After more than ten minutes, an error will be generated in the shell:

 

java.io.EOFException: Premature EOF: no length prefix available

 

Environment:
    Spark 2.4.3
    Hadoop 2.7.7
    Total rows of data about 800,000,000, 12GB

   

    More detailed information can be seen here:

https://stackoverflow.com/questions/61202566/spark-sql-datasetrow-collect-to-driver-throw-java-io-eofexception-premature-e

    Does anyone know the reason?

 

Best regards,

maqy