Unable to read multiple JSON.Gz File.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unable to read multiple JSON.Gz File.

Mahender Sarangam

   

I’m trying to read multiple .json.gz files from a Blob storage path using the below scala code. But I’m unable to read the data from the files or print the schema. If the files are not compressed as .gz then we are able to read all the files into the Dataframe.

I’ve even tried giving *.gz but no luck.

 val df = spark.read.json([hidden email])

Reply | Threaded
Open this post in threaded view
|

RE: Unable to read multiple JSON.Gz File.

Jyoti Ranjan Mahapatra

Hi Mahendar,

Which version of spark and Hadoop are you using?

I tried it on spark2.3.1 with Hadoop 2.7.3 and it works for a folder containing multiple gz files.

 

 

From: Mahender Sarangam <[hidden email]>
Sent: Monday, October 1, 2018 2:00 AM
To: [hidden email]
Subject: Unable to read multiple JSON.Gz File.

 

   

I’m trying to read multiple .json.gz files from a Blob storage path using the below scala code. But I’m unable to read the data from the files or print the schema. If the files are not compressed as .gz then we are able to read all the files into the Dataframe.

I’ve even tried giving *.gz but no luck.

 val df = spark.read.json([hidden email])

Reply | Threaded
Open this post in threaded view
|

Re: Unable to read multiple JSON.Gz File.

Mahender Sarangam

Hi Jyoti,

We are using HDInsight Spark 2.2 . Is there any setting differences for latest version of cluster


/mahender

 

On 10/2/2018 1:48 PM, Jyoti Ranjan Mahapatra wrote:

Hi Mahendar,

Which version of spark and Hadoop are you using?

I tried it on spark2.3.1 with Hadoop 2.7.3 and it works for a folder containing multiple gz files.

 

 

From: Mahender Sarangam [hidden email]
Sent: Monday, October 1, 2018 2:00 AM
To: [hidden email]
Subject: Unable to read multiple JSON.Gz File.

 

   

I’m trying to read multiple .json.gz files from a Blob storage path using the below scala code. But I’m unable to read the data from the files or print the schema. If the files are not compressed as .gz then we are able to read all the files into the Dataframe.

I’ve even tried giving *.gz but no luck.

 val df = spark.read.json([hidden email])