reading a csv.gz file from sagemaker using pyspark kernel mode

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

reading a csv.gz file from sagemaker using pyspark kernel mode

cloudytech43
I am trying to read a compressed CSV file in pyspark. but I am unable to read
in pyspark kernel mode in sagemaker.

The same file I can read using pandas when the kernel is conda-python3 (in
sagemaker)

What I tried :

file1 =  's3://testdata/output1.csv.gz'
file1_df = spark.read.csv(file1, sep='\t')

Error message :

An error was encountered:
An error occurred while calling 104.csv.
: java.io.IOException:
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
Access Denied (Service: Amazon S3; Status Code: 403; Error Code:
AccessDenied; Request ID: 7FF77313; S3 Extended Request ID:

Kindly let me know if I am missing anything



______________________
Trainer for  Spark Training in Hyderabad
<https://intellipaat.com/apache-spark-scala-training-hyderabad/>  .



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: reading a csv.gz file from sagemaker using pyspark kernel mode

Daniel Jankovic
Hi, 

I don't work much with either technology, but it seems that you didn't fill all the infos needed for connecting/reading to your s3. You need full s3 path (I doubt your bucket is really s3://testdata) as well as access information. The message you are getting is Access denied because you didn't fill all the required information.

BR,
Daniel

On Wed, Oct 7, 2020 at 3:44 PM cloudytech43 <[hidden email]> wrote:
I am trying to read a compressed CSV file in pyspark. but I am unable to read
in pyspark kernel mode in sagemaker.

The same file I can read using pandas when the kernel is conda-python3 (in
sagemaker)

What I tried :

file1 =  's3://testdata/output1.csv.gz'
file1_df = spark.read.csv(file1, sep='\t')

Error message :

An error was encountered:
An error occurred while calling 104.csv.
: java.io.IOException:
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
Access Denied (Service: Amazon S3; Status Code: 403; Error Code:
AccessDenied; Request ID: 7FF77313; S3 Extended Request ID:

Kindly let me know if I am missing anything



______________________
Trainer for  Spark Training in Hyderabad
<https://intellipaat.com/apache-spark-scala-training-hyderabad/>  .



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]