Issue with accessing S3 from EKS spark pod

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with accessing S3 from EKS spark pod

Rishabh Jain
Hi,

We are trying to access S3 from spark job running on EKS cluster pod. I have a service account that has an IAM role attached with full S3 permission. We are using DefaultCredentialsProviderChain.  But still we are getting 403 Forbidden from S3.


Is there anything wrong with our approach? 

Thanks,

Rishabh Jain
Application Developer
Email[hidden email]
Telephone<a href="tel:+91+626+427+7897" style="color:rgb(106,37,105)" target="_blank">+91 6264277897
ThoughtWorks


Reply | Threaded
Open this post in threaded view
|

Re: Issue with accessing S3 from EKS spark pod

Vladimir Prus


On 9 Feb 2021, at 19:46, Rishabh Jain <[hidden email]> wrote:

Hi,

We are trying to access S3 from spark job running on EKS cluster pod. I have a service account that has an IAM role attached with full S3 permission. We are using DefaultCredentialsProviderChain.  But still we are getting 403 Forbidden from S3.

It’s hard to say without any information, but some things you might want to double-check

- Make sure the Spark job is using sufficiently new AWS SDK, so that IAM for service account is supported
- Modify your job to print the effective role, e.g.

    val stsClient = AWSSecurityTokenServiceClientBuilder.standard().build();
    val request = new GetCallerIdentityRequest()
    val identity = stsClient.getCallerIdentity(request)
    println(identity.getArn())

- If the above does not print the expected role, verify that the pods actually have the right service account, and that  AWS_ROLE_ARN/AWS_WEB_IDENTITY_TOKEN_FILE variables are set on the pod, and that
  the assume policy for the role does allow EKS to assume that role.
- If the above prints the expected role, then 403 error means you did not setup IAM policies on your role/bucket.

Is there anything wrong with our approach? 

Generally speaking, IAM for service accounts in EKS + Spark works, it's just there's a lot of things that can go wrong the first time you do it.


HTH,
Reply | Threaded
Open this post in threaded view
|

Re: Issue with accessing S3 from EKS spark pod

Rishabh Jain
Hi,

I tried doing what Vladimir suggested. But no luck there either. My guess is that it has something to do with securityContext.fsGroup. I am trying to pass yaml file path along with spark submit command. My yaml file content is 
```

apiVersion: v1

kind: Pod

spec:

  securityContext:

    fsGroup: 65534

  serviceAccount: <service accoun>

  serviceAccountName: <service account name>

```


Is there anything wrong with this yaml file?


~                               

Thanks,

Rishabh Jain
Application Developer
Email[hidden email]
Telephone<a href="tel:+91+626+427+7897" style="color:rgb(106,37,105)" target="_blank">+91 6264277897
ThoughtWorks




On Tue, Feb 9, 2021 at 10:44 PM Vladimir Prus <[hidden email]> wrote:


On 9 Feb 2021, at 19:46, Rishabh Jain <[hidden email]> wrote:

Hi,

We are trying to access S3 from spark job running on EKS cluster pod. I have a service account that has an IAM role attached with full S3 permission. We are using DefaultCredentialsProviderChain.  But still we are getting 403 Forbidden from S3.

It’s hard to say without any information, but some things you might want to double-check

- Make sure the Spark job is using sufficiently new AWS SDK, so that IAM for service account is supported
- Modify your job to print the effective role, e.g.

    val stsClient = AWSSecurityTokenServiceClientBuilder.standard().build();
    val request = new GetCallerIdentityRequest()
    val identity = stsClient.getCallerIdentity(request)
    println(identity.getArn())

- If the above does not print the expected role, verify that the pods actually have the right service account, and that  AWS_ROLE_ARN/AWS_WEB_IDENTITY_TOKEN_FILE variables are set on the pod, and that
  the assume policy for the role does allow EKS to assume that role.
- If the above prints the expected role, then 403 error means you did not setup IAM policies on your role/bucket.

Is there anything wrong with our approach? 

Generally speaking, IAM for service accounts in EKS + Spark works, it's just there's a lot of things that can go wrong the first time you do it.


HTH,
Reply | Threaded
Open this post in threaded view
|

Re: Issue with accessing S3 from EKS spark pod

Vladimir Prus
Hi,

the fsGroup setting should match the id Spark is running at. When building from source, that id is 185, and you can use "docker inspect <image-name>" to double-check.

On Wed, Feb 10, 2021 at 11:43 AM Rishabh Jain <[hidden email]> wrote:
Hi,

I tried doing what Vladimir suggested. But no luck there either. My guess is that it has something to do with securityContext.fsGroup. I am trying to pass yaml file path along with spark submit command. My yaml file content is 
```

apiVersion: v1

kind: Pod

spec:

  securityContext:

    fsGroup: 65534

  serviceAccount: <service accoun>

  serviceAccountName: <service account name>

```


Is there anything wrong with this yaml file?


~                               

Thanks,

Rishabh Jain
Application Developer
Email[hidden email]
Telephone<a href="tel:+91+626+427+7897" style="color:rgb(106,37,105)" target="_blank">+91 6264277897
ThoughtWorks




On Tue, Feb 9, 2021 at 10:44 PM Vladimir Prus <[hidden email]> wrote:


On 9 Feb 2021, at 19:46, Rishabh Jain <[hidden email]> wrote:

Hi,

We are trying to access S3 from spark job running on EKS cluster pod. I have a service account that has an IAM role attached with full S3 permission. We are using DefaultCredentialsProviderChain.  But still we are getting 403 Forbidden from S3.

It’s hard to say without any information, but some things you might want to double-check

- Make sure the Spark job is using sufficiently new AWS SDK, so that IAM for service account is supported
- Modify your job to print the effective role, e.g.

    val stsClient = AWSSecurityTokenServiceClientBuilder.standard().build();
    val request = new GetCallerIdentityRequest()
    val identity = stsClient.getCallerIdentity(request)
    println(identity.getArn())

- If the above does not print the expected role, verify that the pods actually have the right service account, and that  AWS_ROLE_ARN/AWS_WEB_IDENTITY_TOKEN_FILE variables are set on the pod, and that
  the assume policy for the role does allow EKS to assume that role.
- If the above prints the expected role, then 403 error means you did not setup IAM policies on your role/bucket.

Is there anything wrong with our approach? 

Generally speaking, IAM for service accounts in EKS + Spark works, it's just there's a lot of things that can go wrong the first time you do it.


HTH,


--
Reply | Threaded
Open this post in threaded view
|

Re: Issue with accessing S3 from EKS spark pod

Rishabh Jain
Seemed like I was not able connect to sts.amazonaws.com. Fixed that error. Now spark write to s3 is able to create folder structure on s3 but on final file write it fails with below big error:

org.apache.spark.SparkException: Job aborted.

at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:226)

at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:178)

at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)

at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)

at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)

at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)

at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)

at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)

at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)

at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)

at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)

at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)

at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)

at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)

at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)

at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)

at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415)

at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399)

at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)

at org.apache.spark.sql.DataFrameWriter.text(DataFrameWriter.scala:897)

Exception occurred while running transaction extracts job: Job aborted.

at com.gpn.batch.writer.S3Writer.write(S3Writer.java:9)

at com.gpn.batch.PostedTransactionsJob.main(PostedTransactionsJob.java:47)

at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)

at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.base/java.lang.reflect.Method.invoke(Method.java:564)

at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)

at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)

at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)

at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 6.0 failed 4 times, most recent failure: Lost task 1.3 in stage 6.0 (TID 17, 10.37.2.40, executor 1): java.nio.file.AccessDeniedException: s3a://gpn-corebatch-posting-extracts/totals-extract-1612978376492/_temporary/0/_temporary/attempt_20210210173339_0006_m_000001_17/part-00001-43be031c-5f3d-4b4f-bd2d-dc19ed99c7b4-c000.txt: getFileStatus on s3a://gpn-corebatch-posting-extracts/totals-extract-1612978376492/_temporary/0/_temporary/attempt_20210210173339_0006_m_000001_17/part-00001-43be031c-5f3d-4b4f-bd2d-dc19ed99c7b4-c000.txt: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 86B9CEF5EDA607F8; S3 Extended Request ID: 1XOprWwxqw0OV9mhb4wFkB3cOhwcI/kaFHctXEgGaovT8VTRWjnW6DwaMyO0laeCNUmn1nTbQYY=; Proxy: null), S3 Extended Request ID: 1XOprWwxqw0OV9mhb4wFkB3cOhwcI/kaFHctXEgGaovT8VTRWjnW6DwaMyO0laeCNUmn1nTbQYY=:403 Forbidden

at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:230)

at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:151)

at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2198)

at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)

at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2102)

at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:752)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)

at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)

at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:81)

at org.apache.spark.sql.execution.datasources.text.TextOutputWriter.<init>(TextOutputWriter.scala:33)

at org.apache.spark.sql.execution.datasources.text.TextFileFormat$$anon$1.newInstance(TextFileFormat.scala:84)

at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)

at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:111)

at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:264)

at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:205)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

at org.apache.spark.scheduler.Task.run(Task.scala:127)

at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)

at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)

at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)

at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)

at java.base/java.lang.Thread.run(Thread.java:832)

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 86B9CEF5EDA607F8; S3 Extended Request ID: 1XOprWwxqw0OV9mhb4wFkB3cOhwcI/kaFHctXEgGaovT8VTRWjnW6DwaMyO0laeCNUmn1nTbQYY=; Proxy: null), S3 Extended Request ID: 1XOprWwxqw0OV9mhb4wFkB3cOhwcI/kaFHctXEgGaovT8VTRWjnW6DwaMyO0laeCNUmn1nTbQYY=

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)

at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5259)

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5206)

at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1360)

at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1249)

at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)

at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285)

at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1246)

at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2183)

... 21 more



Can someone help me with this issue? If it is the IAM permission issue, then what permission might be missing that I am getting this issue. If not then what is the root cause?


Thanks,

Rishabh Jain
Application Developer
Email[hidden email]
Telephone<a href="tel:+91+626+427+7897" style="color:rgb(106,37,105)" target="_blank">+91 6264277897
ThoughtWorks




On Wed, Feb 10, 2021 at 2:26 PM Vladimir Prus <[hidden email]> wrote:
Hi,

the fsGroup setting should match the id Spark is running at. When building from source, that id is 185, and you can use "docker inspect <image-name>" to double-check.

On Wed, Feb 10, 2021 at 11:43 AM Rishabh Jain <[hidden email]> wrote:
Hi,

I tried doing what Vladimir suggested. But no luck there either. My guess is that it has something to do with securityContext.fsGroup. I am trying to pass yaml file path along with spark submit command. My yaml file content is 
```

apiVersion: v1

kind: Pod

spec:

  securityContext:

    fsGroup: 65534

  serviceAccount: <service accoun>

  serviceAccountName: <service account name>

```


Is there anything wrong with this yaml file?


~                               

Thanks,

Rishabh Jain
Application Developer
Email[hidden email]
Telephone<a href="tel:+91+626+427+7897" style="color:rgb(106,37,105)" target="_blank">+91 6264277897
ThoughtWorks




On Tue, Feb 9, 2021 at 10:44 PM Vladimir Prus <[hidden email]> wrote:


On 9 Feb 2021, at 19:46, Rishabh Jain <[hidden email]> wrote:

Hi,

We are trying to access S3 from spark job running on EKS cluster pod. I have a service account that has an IAM role attached with full S3 permission. We are using DefaultCredentialsProviderChain.  But still we are getting 403 Forbidden from S3.

It’s hard to say without any information, but some things you might want to double-check

- Make sure the Spark job is using sufficiently new AWS SDK, so that IAM for service account is supported
- Modify your job to print the effective role, e.g.

    val stsClient = AWSSecurityTokenServiceClientBuilder.standard().build();
    val request = new GetCallerIdentityRequest()
    val identity = stsClient.getCallerIdentity(request)
    println(identity.getArn())

- If the above does not print the expected role, verify that the pods actually have the right service account, and that  AWS_ROLE_ARN/AWS_WEB_IDENTITY_TOKEN_FILE variables are set on the pod, and that
  the assume policy for the role does allow EKS to assume that role.
- If the above prints the expected role, then 403 error means you did not setup IAM policies on your role/bucket.

Is there anything wrong with our approach? 

Generally speaking, IAM for service accounts in EKS + Spark works, it's just there's a lot of things that can go wrong the first time you do it.


HTH,


--