Query on Spark Hive with kerberos Enabled on Kubernetes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Query on Spark Hive with kerberos Enabled on Kubernetes

Garlapati, Suryanarayana (Nokia - IN/Bangalore)

Hi All,

I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries?

 

Thanks.

 

Regards

Surya

Reply | Threaded
Open this post in threaded view
|

RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Garlapati, Suryanarayana (Nokia - IN/Bangalore)

Hi Sandeep,

Thx for the response:

I am using following commands: (xml files hive-site.xml, core-site.xml and hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR option).

 

For HDFS Access which succeeds:

./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/wordcount.py hdfs://<HDFS_IP>:8020/tmp/wordcount.txt

 

 

For Hive Access (this is failing):

./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, ,../examples/src/main/resources/kv1.txt --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/sql/hive.py

 

Following is the error:

2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.

2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with URI thrift://vm-10-75-145-54:9083

2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)

        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)

        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)

        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)

        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)

        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

 

If I don’t provide the krb5.conf in the above spark-submit:

I get an error saying unable to find any default realm.

 

One work around I had found, if I generate any tgt by doing the kinit and copy it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess this should not be the way to do it. It should generate automatically the tgt and should access the hive metastore. Please let me know, if doing wrong.

 

Regards

Surya

 

From: Sandeep Katta [mailto:[hidden email]]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]>
Cc: [hidden email]; [hidden email]
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

 

Can you please tell us what exception you ve got,any logs for the same ?

 

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hi All,

I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries?

 

Thanks.

 

Regards

Surya

Reply | Threaded
Open this post in threaded view
|

RE: Query on Spark Hive with kerberos Enabled on Kubernetes

Garlapati, Suryanarayana (Nokia - IN/Bangalore)

Hi Sandeep,

Any inputs on this?

 

Regards

Surya

 

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Saturday, July 21, 2018 6:50 PM
To: Sandeep Katta <[hidden email]>
Cc: [hidden email]; [hidden email]
Subject: RE: Query on Spark Hive with kerberos Enabled on Kubernetes

 

Hi Sandeep,

Thx for the response:

I am using following commands: (xml files hive-site.xml, core-site.xml and hdfs-site.xml are made available by exporting through the HADOOP_CONF_DIR option).

 

For HDFS Access which succeeds:

./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/wordcount.py hdfs://<HDFS_IP>:8020/tmp/wordcount.txt

 

 

For Hive Access (this is failing):

./spark-submit --deploy-mode cluster --master k8s://https://k8s-apiserver.bcmt.cluster.local:8443 --kubernetes-namespace default --conf spark.kubernetes.kerberos.enabled=true --files /etc/krb5.conf, ,../examples/src/main/resources/kv1.txt --conf spark.kubernetes.kerberos.principal=<principal> --conf spark.kubernetes.kerberos.keytab=<keytab> --conf spark.kubernetes.driver.docker.image=<driver_img> --conf spark.kubernetes.executor.docker.image=<executor_img> --conf spark.kubernetes.initcontainer.docker.image=<init_img> --conf spark.kubernetes.resourceStagingServer.uri=http://<RSS_IP>:10000 ../examples/src/main/python/sql/hive.py

 

Following is the error:

2018-07-19 04:15:55 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.

2018-07-19 04:15:56 INFO  metastore:376 - Trying to connect to metastore with URI thrift://vm-10-75-145-54:9083

2018-07-19 04:15:56 ERROR TSaslTransport:315 - SASL negotiation failure

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)

        at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)

        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)

        at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)

        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)

        at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

 

If I don’t provide the krb5.conf in the above spark-submit:

I get an error saying unable to find any default realm.

 

One work around I had found, if I generate any tgt by doing the kinit and copy it into the driver pod  into location /tmp/krb5cc_0, it works fine. I guess this should not be the way to do it. It should generate automatically the tgt and should access the hive metastore. Please let me know, if doing wrong.

 

Regards

Surya

 

From: Sandeep Katta [[hidden email]]
Sent: Friday, July 20, 2018 9:59 PM
To: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]>
Cc: [hidden email]; [hidden email]
Subject: Re: Query on Spark Hive with kerberos Enabled on Kubernetes

 

Can you please tell us what exception you ve got,any logs for the same ?

 

On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) <[hidden email]> wrote:

Hi All,

I am trying to use Spark 2.2.0 Kubernetes(https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) code to run the Hive Query on Kerberos Enabled cluster. Spark-submit’s fail for the Hive Queries, but pass when I am trying to access the hdfs. Is this a known limitation or am I doing something wrong. Please let me know. If this is working, can you please specify an example for running Hive Queries?

 

Thanks.

 

Regards

Surya