Spark on Kubernetes : unable to write files to HDFS

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark on Kubernetes : unable to write files to HDFS

Loic DESCOTTE
Hello,

I am using Spark On Kubernetes and I have the following error when I try to write data on HDFS : "no filesystem for scheme hdfs"

More details : 

I am submitting my application with Spark submit like this : 

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this : 

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark submit command:

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
    def main(args: Array[String])
    {

        val spark = SparkSession
          .builder()
          .appName("Hello Spark")
          .getOrCreate()

        import spark.implicits._

        val now = Calendar.getInstance().getTime().toString
        val data = List(Data(now)).toDF()
        data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
    }
}

Thanks for your help,
Loïc
Reply | Threaded
Open this post in threaded view
|

RE: Spark on Kubernetes : unable to write files to HDFS

Loic DESCOTTE
So I've tried several other things, including building a fat jar with hdfs dependency inside my app jar, and added this to the Spark configuration in the code : 

val spark = SparkSession
          .builder()
          .appName("Hello Spark 7")
          .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
          .getOrCreate()


But still the same error...


De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:27
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
I think it'll have to be part of the Spark distro, but I'm not 100% sure. I also think these get registered via manifest files in the JARs; if some process is stripping those when creating a bundled up JAR, could be it. Could be that it's failing to initialize too for some reason.

On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <[hidden email]> wrote:
I've tried with this spark-submit option : 

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But it did't solve the issue.
Should I add more jars?

Thanks
Loïc

De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:20
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
Seems like your Spark cluster doesn't somehow have the Hadoop JARs?

On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <[hidden email]> wrote:
Hello,

I am using Spark On Kubernetes and I have the following error when I try to write data on HDFS : "no filesystem for scheme hdfs"

More details : 

I am submitting my application with Spark submit like this : 

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this : 

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark submit command:

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
    def main(args: Array[String])
    {

        val spark = SparkSession
          .builder()
          .appName("Hello Spark")
          .getOrCreate()

        import spark.implicits._

        val now = Calendar.getInstance().getTime().toString
        val data = List(Data(now)).toDF()
        data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
    }
}

Thanks for your help,
Loïc
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Kubernetes : unable to write files to HDFS

German Schiavon Matteo
Hi,

seems that you have a typo no?

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds

  data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")


On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE <[hidden email]> wrote:
So I've tried several other things, including building a fat jar with hdfs dependency inside my app jar, and added this to the Spark configuration in the code : 

val spark = SparkSession
          .builder()
          .appName("Hello Spark 7")
          .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
          .getOrCreate()


But still the same error...


De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:27
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
I think it'll have to be part of the Spark distro, but I'm not 100% sure. I also think these get registered via manifest files in the JARs; if some process is stripping those when creating a bundled up JAR, could be it. Could be that it's failing to initialize too for some reason.

On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <[hidden email]> wrote:
I've tried with this spark-submit option : 

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But it did't solve the issue.
Should I add more jars?

Thanks
Loïc

De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:20
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
Seems like your Spark cluster doesn't somehow have the Hadoop JARs?

On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <[hidden email]> wrote:
Hello,

I am using Spark On Kubernetes and I have the following error when I try to write data on HDFS : "no filesystem for scheme hdfs"

More details : 

I am submitting my application with Spark submit like this : 

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this : 

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark submit command:

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
    def main(args: Array[String])
    {

        val spark = SparkSession
          .builder()
          .appName("Hello Spark")
          .getOrCreate()

        import spark.implicits._

        val now = Calendar.getInstance().getTime().toString
        val data = List(Data(now)).toDF()
        data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
    }
}

Thanks for your help,
Loïc
Reply | Threaded
Open this post in threaded view
|

RE: Spark on Kubernetes : unable to write files to HDFS

Loic DESCOTTE
Oh thank you you're right!! I feel shameful ��


De : German Schiavon <[hidden email]>
Envoyé : mercredi 16 décembre 2020 18:01
À : Loic DESCOTTE <[hidden email]>
Cc : [hidden email] <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
Hi,

seems that you have a typo no?

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds

  data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")


On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE <[hidden email]> wrote:
So I've tried several other things, including building a fat jar with hdfs dependency inside my app jar, and added this to the Spark configuration in the code : 

val spark = SparkSession
          .builder()
          .appName("Hello Spark 7")
          .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
          .getOrCreate()


But still the same error...


De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:27
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
I think it'll have to be part of the Spark distro, but I'm not 100% sure. I also think these get registered via manifest files in the JARs; if some process is stripping those when creating a bundled up JAR, could be it. Could be that it's failing to initialize too for some reason.

On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <[hidden email]> wrote:
I've tried with this spark-submit option : 

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But it did't solve the issue.
Should I add more jars?

Thanks
Loïc

De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:20
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
Seems like your Spark cluster doesn't somehow have the Hadoop JARs?

On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <[hidden email]> wrote:
Hello,

I am using Spark On Kubernetes and I have the following error when I try to write data on HDFS : "no filesystem for scheme hdfs"

More details : 

I am submitting my application with Spark submit like this : 

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this : 

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark submit command:

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
    def main(args: Array[String])
    {

        val spark = SparkSession
          .builder()
          .appName("Hello Spark")
          .getOrCreate()

        import spark.implicits._

        val now = Calendar.getInstance().getTime().toString
        val data = List(Data(now)).toDF()
        data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
    }
}

Thanks for your help,
Loïc
Reply | Threaded
Open this post in threaded view
|

Re: Spark on Kubernetes : unable to write files to HDFS

German Schiavon Matteo
We all been there! no reason to be ashamed :) 

On Wed, 16 Dec 2020 at 18:14, Loic DESCOTTE <[hidden email]> wrote:
Oh thank you you're right!! I feel shameful 😄


De : German Schiavon <[hidden email]>
Envoyé : mercredi 16 décembre 2020 18:01
À : Loic DESCOTTE <[hidden email]>
Cc : [hidden email] <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
Hi,

seems that you have a typo no?

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds

  data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")


On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE <[hidden email]> wrote:
So I've tried several other things, including building a fat jar with hdfs dependency inside my app jar, and added this to the Spark configuration in the code : 

val spark = SparkSession
          .builder()
          .appName("Hello Spark 7")
          .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
          .getOrCreate()


But still the same error...


De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:27
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
I think it'll have to be part of the Spark distro, but I'm not 100% sure. I also think these get registered via manifest files in the JARs; if some process is stripping those when creating a bundled up JAR, could be it. Could be that it's failing to initialize too for some reason.

On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <[hidden email]> wrote:
I've tried with this spark-submit option : 

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But it did't solve the issue.
Should I add more jars?

Thanks
Loïc

De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:20
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
Seems like your Spark cluster doesn't somehow have the Hadoop JARs?

On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <[hidden email]> wrote:
Hello,

I am using Spark On Kubernetes and I have the following error when I try to write data on HDFS : "no filesystem for scheme hdfs"

More details : 

I am submitting my application with Spark submit like this : 

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this : 

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark submit command:

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
    def main(args: Array[String])
    {

        val spark = SparkSession
          .builder()
          .appName("Hello Spark")
          .getOrCreate()

        import spark.implicits._

        val now = Calendar.getInstance().getTime().toString
        val data = List(Data(now)).toDF()
        data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
    }
}

Thanks for your help,
Loïc
Reply | Threaded
Open this post in threaded view
|

RE: Spark on Kubernetes : unable to write files to HDFS

Loic DESCOTTE
Everything is working fine now 🙂
Thanks again

Loïc

De : German Schiavon <[hidden email]>
Envoyé : mercredi 16 décembre 2020 19:23
À : Loic DESCOTTE <[hidden email]>
Cc : [hidden email] <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
We all been there! no reason to be ashamed :) 

On Wed, 16 Dec 2020 at 18:14, Loic DESCOTTE <[hidden email]> wrote:
Oh thank you you're right!! I feel shameful 😄


De : German Schiavon <[hidden email]>
Envoyé : mercredi 16 décembre 2020 18:01
À : Loic DESCOTTE <[hidden email]>
Cc : [hidden email] <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
Hi,

seems that you have a typo no?

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds

  data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")


On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE <[hidden email]> wrote:
So I've tried several other things, including building a fat jar with hdfs dependency inside my app jar, and added this to the Spark configuration in the code : 

val spark = SparkSession
          .builder()
          .appName("Hello Spark 7")
          .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
          .getOrCreate()


But still the same error...


De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:27
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
I think it'll have to be part of the Spark distro, but I'm not 100% sure. I also think these get registered via manifest files in the JARs; if some process is stripping those when creating a bundled up JAR, could be it. Could be that it's failing to initialize too for some reason.

On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <[hidden email]> wrote:
I've tried with this spark-submit option : 

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But it did't solve the issue.
Should I add more jars?

Thanks
Loïc

De : Sean Owen <[hidden email]>
Envoyé : mercredi 16 décembre 2020 14:20
À : Loic DESCOTTE <[hidden email]>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS
 
Seems like your Spark cluster doesn't somehow have the Hadoop JARs?

On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <[hidden email]> wrote:
Hello,

I am using Spark On Kubernetes and I have the following error when I try to write data on HDFS : "no filesystem for scheme hdfs"

More details : 

I am submitting my application with Spark submit like this : 

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this : 

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark submit command:

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
    def main(args: Array[String])
    {

        val spark = SparkSession
          .builder()
          .appName("Hello Spark")
          .getOrCreate()

        import spark.implicits._

        val now = Calendar.getInstance().getTime().toString
        val data = List(Data(now)).toDF()
        data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
    }
}

Thanks for your help,
Loïc