Change the owner of hdfs file being saved

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Change the owner of hdfs file being saved

Sunita
Hello Experts,

I am required to use a specific user id to save files on a remote hdfs cluster. Remote in the sense, spark jobs run on EMR and write to a CDH cluster. Hence I cannot change the hdfs-site.xml etc to point to the destination cluster. As a result I am using webhdfs to save the files into it.

There are few challenges I have with this approach
1. I cannot use nameservice of the namenode and have to specify the IP address of the active namenode, which is risky when there is a failover

2. I cannot change the owner/group of the file being written by spark. I see no option to provide owner for files being written (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala)

3. Using jdbc such that I can specify the user name and password would mean I will end up creating managed tables only. This is not acceptable for our usecase.

Is there a way to change the owner of files written by Spark?

regards
Sunita