Spark standalone - reading kerberos hdfs

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark standalone - reading kerberos hdfs

sbpothineni
I spin up a spark standalone cluster (spark.autheticate=false), submitted a job which reads remote kerberized HDFS, 

val spark = SparkSession.builder()
                  .master("spark://spark-standalone:7077")
                  .getOrCreate()

UserGroupInformation.loginUserFromKeytab(principal, keytab)
val df = spark.read.parquet("hdfs://namenode:8020/test/parquet/")

Ran into following exception:

Caused by:
java.io.IOException: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "..."; destination host is: "...":10346; 


Any suggestions?

Thanks
Sudhir
Reply | Threaded
Open this post in threaded view
|

Re: Spark standalone - reading kerberos hdfs

Gabor Somogyi
TGT is not enough, you need HDFS token which can be obtained by Spark. Please check the logs...

On Fri, 8 Jan 2021, 18:51 Sudhir Babu Pothineni, <[hidden email]> wrote:
I spin up a spark standalone cluster (spark.autheticate=false), submitted a job which reads remote kerberized HDFS, 

val spark = SparkSession.builder()
                  .master("spark://spark-standalone:7077")
                  .getOrCreate()

UserGroupInformation.loginUserFromKeytab(principal, keytab)
val df = spark.read.parquet("hdfs://namenode:8020/test/parquet/")

Ran into following exception:

Caused by:
java.io.IOException: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "..."; destination host is: "...":10346; 


Any suggestions?

Thanks
Sudhir
Reply | Threaded
Open this post in threaded view
|

Re: Spark standalone - reading kerberos hdfs

sbpothineni
Incase of Spark on Yarn, Application Master shares the token. 

I think incase of spark stand alone the token is not shared to executor, any example how to get the HDFS token for executor?

On Fri, Jan 8, 2021 at 12:13 PM Gabor Somogyi <[hidden email]> wrote:
TGT is not enough, you need HDFS token which can be obtained by Spark. Please check the logs...

On Fri, 8 Jan 2021, 18:51 Sudhir Babu Pothineni, <[hidden email]> wrote:
I spin up a spark standalone cluster (spark.autheticate=false), submitted a job which reads remote kerberized HDFS, 

val spark = SparkSession.builder()
                  .master("spark://spark-standalone:7077")
                  .getOrCreate()

UserGroupInformation.loginUserFromKeytab(principal, keytab)
val df = spark.read.parquet("hdfs://namenode:8020/test/parquet/")

Ran into following exception:

Caused by:
java.io.IOException: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "..."; destination host is: "...":10346; 


Any suggestions?

Thanks
Sudhir
Reply | Threaded
Open this post in threaded view
|

Re: Spark standalone - reading kerberos hdfs

sbpothineni
Any other insights into this issue? I tried multiple way to supply keytab to executor 

Does spark standalone doesn’t support Kerberos?

On Jan 8, 2021, at 1:53 PM, Sudhir Babu Pothineni <[hidden email]> wrote:


Incase of Spark on Yarn, Application Master shares the token. 

I think incase of spark stand alone the token is not shared to executor, any example how to get the HDFS token for executor?

On Fri, Jan 8, 2021 at 12:13 PM Gabor Somogyi <[hidden email]> wrote:
TGT is not enough, you need HDFS token which can be obtained by Spark. Please check the logs...

On Fri, 8 Jan 2021, 18:51 Sudhir Babu Pothineni, <[hidden email]> wrote:
I spin up a spark standalone cluster (spark.autheticate=false), submitted a job which reads remote kerberized HDFS, 

val spark = SparkSession.builder()
                  .master("spark://spark-standalone:7077")
                  .getOrCreate()

UserGroupInformation.loginUserFromKeytab(principal, keytab)
val df = spark.read.parquet("hdfs://namenode:8020/test/parquet/")

Ran into following exception:

Caused by:
java.io.IOException: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "..."; destination host is: "...":10346; 


Any suggestions?

Thanks
Sudhir
Reply | Threaded
Open this post in threaded view
|

Re: Spark standalone - reading kerberos hdfs

Gábor Rőczei
Hi Sudhir,

> On 21 Jan 2021, at 16:24, Sudhir Babu Pothineni <[hidden email]> wrote:
>
> Any other insights into this issue? I tried multiple way to supply keytab to executor
>
> Does spark standalone doesn’t support Kerberos?

Spark standalone mode does not support Kerberos authentication. Related source code:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L346


    // Kerberos is not supported in standalone mode, and keytab support is not yet available
    // in Mesos cluster mode.
    if (clusterManager != STANDALONE
        && !isMesosCluster
        && args.principal != null
        && args.keytab != null) {
      // If client mode, make sure the keytab is just a local path.
      if (deployMode == CLIENT && Utils.isLocalUri(args.keytab)) {
        args.keytab = new URI(args.keytab).getPath()
      }

If you want to test your application with Kerberos, I recommend for you local mode.

https://spark.apache.org/docs/latest/submitting-applications.html#master-urls

For example:

spark-shell --master local

and if you want to access a HDFS filesystem, then you need to add the following parameter as well: spark.kerberos.access.hadoopFileSystems (in old Spark versions this is spark.yarn.access.hadoopFileSystems)

spark-shell --master local --conf spark.kerberos.access.hadoopFileSystems=hdfs://namenode.example.com:8020

It will create for you the necessary HDFS delegation token. Here is a very good documentation about the delegation token handling in Spark

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/README.md

Best regards,

      Gabor
---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark standalone - reading kerberos hdfs

jelmer
In reply to this post by sbpothineni

On Fri, 8 Jan 2021 at 18:49, Sudhir Babu Pothineni <[hidden email]> wrote:
I spin up a spark standalone cluster (spark.autheticate=false), submitted a job which reads remote kerberized HDFS, 

val spark = SparkSession.builder()
                  .master("spark://spark-standalone:7077")
                  .getOrCreate()

UserGroupInformation.loginUserFromKeytab(principal, keytab)
val df = spark.read.parquet("hdfs://namenode:8020/test/parquet/")

Ran into following exception:

Caused by:
java.io.IOException: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "..."; destination host is: "...":10346; 


Any suggestions?

Thanks
Sudhir