[Spark for kubernetes] Azure Blob Storage credentials issue

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark for kubernetes] Azure Blob Storage credentials issue

Oscar Bonilla

Hello,

I'm having the following issue while trying to run Spark for kubernetes:

2018-10-18 08:48:54 INFO  DAGScheduler:54 - Job 0 failed: reduce at SparkPi.scala:38, took 1.743177 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 10.244.1.11, executor 2): org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1086)
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538)
    at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1366)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3242)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3291)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3259)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:470)
    at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1897)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:694)
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:476)
    at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:755)
    at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:747)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:747)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:863)
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081)
    ... 24 more

The command I use to launch the job is:

/opt/spark/bin/spark-submit
    --master k8s://<my-k8s-master>
    --deploy-mode cluster
    --name spark-pi
    --class org.apache.spark.examples.SparkPi
    --conf spark.executor.instances=5
    --conf spark.kubernetes.container.image=<my-image-built-with-wasb>
    --conf spark.kubernetes.namespace=<my-namespace>
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
    --conf spark.kubernetes.driver.secrets.spark=/opt/spark/conf
    --conf spark.kubernetes.executor.secrets.spark=/opt/spark/conf
wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar 10000

I have a k8s secret named spark with the following content:

apiVersion: v1
kind: Secret
metadata:
  name: spark
  labels:
    app: spark
    stack: service
type: Opaque
data:
  core-site.xml: |-
    {% filter b64encode %}
    <configuration>
        <property>
            <name>fs.azure.account.key.<my-account-name>.blob.core.windows.net</name>
            <value><my-account-key></value>
        </property>
        <property>
            <name>fs.AbstractFileSystem.wasb.Impl</name>
            <value>org.apache.hadoop.fs.azure.Wasb</value>
        </property>
    </configuration>
    {% endfilter %}

The driver pod manages to download the jar dependencies as stored in a container in Azure Blob Storage. As can be seen in this log snippet:

2018-10-18 08:48:16 INFO  Utils:54 - Fetching wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp8575879929413871510.tmp
2018-10-18 08:48:16 INFO  SparkPodInitContainer:54 - Finished downloading application dependencies.

How can I get the executor pods to get the credentials as stored in the core-site.xml file that's mounted from the k8s secret? What am I missing?

Thank you,

Oscar Bonilla.

Reply | Threaded
Open this post in threaded view
|

Re: [Spark for kubernetes] Azure Blob Storage credentials issue

Matt Cheah

Hi there,

 

Can you check if HADOOP_CONF_DIR is being set on the executors to /opt/spark/conf? One should set an executor environment variable for that.

 

A kubectl describe pod output for the executors would be helpful here.

 

-Matt Cheah

 

From: Oscar Bonilla <[hidden email]>
Date: Friday, October 19, 2018 at 1:03 AM
To: "[hidden email]" <[hidden email]>
Subject: [Spark for kubernetes] Azure Blob Storage credentials issue

 

Hello,

I'm having the following issue while trying to run Spark for kubernetes [spark.apache.org]:

2018-10-18 08:48:54 INFO  DAGScheduler:54 - Job 0 failed: reduce at SparkPi.scala:38, took 1.743177 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 10.244.1.11, executor 2): org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net [datasets83d858296fd0c49b.blob.core.windows.net] in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1086)
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538)
    at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1366)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3242)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3291)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3259)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:470)
    at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1897)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:694)
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:476)
    at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:755)
    at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:747)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.executor.Executor.org [org.apache.spark.executor.executor.org]$apache$spark$executor$Executor$$updateDependencies(Executor.scala:747)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net [datasets83d858296fd0c49b.blob.core.windows.net] in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:863)
    at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081)
    ... 24 more

The command I use to launch the job is:

/opt/spark/bin/spark-submit
    --master k8s://<my-k8s-master>
    --deploy-mode cluster
    --name spark-pi
    --class org.apache.spark.examples.SparkPi
    --conf spark.executor.instances=5
    --conf spark.kubernetes.container.image=<my-image-built-with-wasb>
    --conf spark.kubernetes.namespace=<my-namespace>
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
    --conf spark.kubernetes.driver.secrets.spark=/opt/spark/conf
    --conf spark.kubernetes.executor.secrets.spark=/opt/spark/conf
wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar [blob.core.windows.net] 10000

I have a k8s secret named spark with the following content:

apiVersion: v1
kind: Secret
metadata:
  name: spark
  labels:
    app: spark
    stack: service
type: Opaque
data:
  core-site.xml: |-
    {% filter b64encode %}
    <configuration>
        <property>
            <name>fs.azure.account.key.<my-account-name>.blob.core.windows.net [blob.core.windows.net]</name>
            <value><my-account-key></value>
        </property>
        <property>
            <name>fs.AbstractFileSystem.wasb.Impl</name>
            <value>org.apache.hadoop.fs.azure.Wasb</value>
        </property>
    </configuration>
    {% endfilter %}

The driver pod manages to download the jar dependencies as stored in a container in Azure Blob Storage. As can be seen in this log snippet:

2018-10-18 08:48:16 INFO  Utils:54 - Fetching wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar [blob.core.windows.net] to /var/spark-data/spark-jars/fetchFileTemp8575879929413871510.tmp
2018-10-18 08:48:16 INFO  SparkPodInitContainer:54 - Finished downloading application dependencies.

How can I get the executor pods to get the credentials as stored in the core-site.xml file that's mounted from the k8s secret? What am I missing?

Thank you,

Oscar Bonilla.


smime.p7s (6K) Download Attachment