pyspark execution

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

pyspark execution

anudeep
Hi All,

I have a python file which I am executing directly with spark-submit command.

Inside the python file, I have sql written using hive context.I created a generic variable for the  database name inside sql 

The problem is : How can I pass the value for this variable dynamically just as we give in hive like --hivevar parameter.

Thanks!
Anudeep






Reply | Threaded
Open this post in threaded view
|

Re: pyspark execution

hemant singh
If it contains only SQL then you can use a function as below -

import subprocess

def run_sql(sql_file_path, your_db_name ,location):
    subprocess.call(["spark-sql","-S","--hivevar","<DBName>",<your_db_name>,"--hivevar","LOCATION",location,"-f",sql_file_path])

In you have other pieces like spark code and not only sql in that file-

Write a parse function which parse you sql and replace the placeholders like DB Name etc in your sql and then execute the new formed sql.

Maintaining your sql in a separate file though de-couples the code and sql and make it easier from maintenance perspective.

On Tue, Apr 17, 2018 at 8:11 AM, anudeep <[hidden email]> wrote:
Hi All,

I have a python file which I am executing directly with spark-submit command.

Inside the python file, I have sql written using hive context.I created a generic variable for the  database name inside sql 

The problem is : How can I pass the value for this variable dynamically just as we give in hive like --hivevar parameter.

Thanks!
Anudeep