Error in show()

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Error in show()

dimitris plakas
Hello everyone, I am new in Pyspark and i am facing an issue. Let me explain what exactly is the problem.

I have a dataframe and i apply on this a map() function (dataframe2=datframe1.rdd.map(custom_function())
dataframe = sqlContext.createDataframe(dataframe2)

when i have 

dataframe.show(30,True) it shows the result,

when i am using dataframe.show(60, True) i get the error. The Error is in the attachement Pyspark_Error.txt.

Could you please explain me what is this error and how to overpass it?



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Pyspark_Error.txt (24K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Error in show()

Apostolos N. Papadopoulos

Can you isolate the row that is causing the problem? I mean start using show(31) up to show(60).

Perhaps this will help you to understand the problem.

regards,

Apostolos



On 07/09/2018 01:11 πμ, dimitris plakas wrote:
Hello everyone, I am new in Pyspark and i am facing an issue. Let me explain what exactly is the problem.

I have a dataframe and i apply on this a map() function (dataframe2=datframe1.rdd.map(custom_function())
dataframe = sqlContext.createDataframe(dataframe2)

when i have 

dataframe.show(30,True) it shows the result,

when i am using dataframe.show(60, True) i get the error. The Error is in the attachement Pyspark_Error.txt.

Could you please explain me what is this error and how to overpass it?



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

-- 
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: [hidden email]
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol
Reply | Threaded
Open this post in threaded view
|

Re: Error in show()

Sonal Goyal
It says serialization error - could there be a column value which is not getting parsed as int in one of the rows 31-60? The relevant Python code in serializers.py which is throwing the error is

def read_int(stream):
    length = stream.read(4)
    if not length:
        raise EOFError
    return struct.unpack("!i", length)[0]


Thanks,
Sonal
Nube Technologies 





On Fri, Sep 7, 2018 at 12:22 PM, Apostolos N. Papadopoulos <[hidden email]> wrote:

Can you isolate the row that is causing the problem? I mean start using show(31) up to show(60).

Perhaps this will help you to understand the problem.

regards,

Apostolos



On 07/09/2018 01:11 πμ, dimitris plakas wrote:
Hello everyone, I am new in Pyspark and i am facing an issue. Let me explain what exactly is the problem.

I have a dataframe and i apply on this a map() function (dataframe2=datframe1.rdd.map(custom_function())
dataframe = sqlContext.createDataframe(dataframe2)

when i have 

dataframe.show(30,True) it shows the result,

when i am using dataframe.show(60, True) i get the error. The Error is in the attachement Pyspark_Error.txt.

Could you please explain me what is this error and how to overpass it?



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

-- 
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: [hidden email]
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol

Reply | Threaded
Open this post in threaded view
|

Re: Error in show()

Prakash Joshi
In reply to this post by dimitris plakas
Pls checke the specific ERORR lines of the text file .
Chaces are are few Columns are not properly delimited in specific rows.

Regards
Prakash

On Fri, Sep 7, 2018, 3:41 AM dimitris plakas <[hidden email]> wrote:
Hello everyone, I am new in Pyspark and i am facing an issue. Let me explain what exactly is the problem.

I have a dataframe and i apply on this a map() function (dataframe2=datframe1.rdd.map(custom_function())
dataframe = sqlContext.createDataframe(dataframe2)

when i have 

dataframe.show(30,True) it shows the result,

when i am using dataframe.show(60, True) i get the error. The Error is in the attachement Pyspark_Error.txt.

Could you please explain me what is this error and how to overpass it?


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]