[Spark SQL] does pyspark udf support spark.sql inside def

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Spark SQL] does pyspark udf support spark.sql inside def

Lakshmi Nivedita
Here is a spark udf structure as an example

Def sampl_fn(x):
           Spark.sql(“select count(Id) from sample Where Id = x ”)


Spark.udf.register(“sample_fn”, sample_fn)

Spark.sql(“select id, sampl_fn(Id) from example”)

Advance Thanks for the help
--
k.Lakshmi Nivedita

Reply | Threaded
Open this post in threaded view
|

Re: [Spark SQL] does pyspark udf support spark.sql inside def

srowen
No, you can't use the SparkSession from within a function executed by Spark tasks.

On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <[hidden email]> wrote:
Here is a spark udf structure as an example

Def sampl_fn(x):
           Spark.sql(“select count(Id) from sample Where Id = x ”)


Spark.udf.register(“sample_fn”, sample_fn)

Spark.sql(“select id, sampl_fn(Id) from example”)

Advance Thanks for the help
--
k.Lakshmi Nivedita

Reply | Threaded
Open this post in threaded view
|

Re: [Spark SQL] does pyspark udf support spark.sql inside def

Lakshmi Nivedita
Thank you for the clarification.I would like to how can I  proceed for this kind of scenario in pyspark

I have a scenario subtracting the total number of days with the number of holidays in pyspark by using dataframes

I have a table with dates  date1  date2 in one table and number of holidays in another table
df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1) totalnumberofdays  - df2.holidays  from table A;

df2 = select count(holiays)
from table B
where holidate >= 'date1'(table A)
and holidate < = date2(table A)
and country = A.ctry(table A)

Except country no other column is not a unique key




On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <[hidden email]> wrote:
No, you can't use the SparkSession from within a function executed by Spark tasks.

On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <[hidden email]> wrote:
Here is a spark udf structure as an example

Def sampl_fn(x):
           Spark.sql(“select count(Id) from sample Where Id = x ”)


Spark.udf.register(“sample_fn”, sample_fn)

Spark.sql(“select id, sampl_fn(Id) from example”)

Advance Thanks for the help
--
k.Lakshmi Nivedita





Reply | Threaded
Open this post in threaded view
|

Re: [Spark SQL] does pyspark udf support spark.sql inside def

Amit Joshi
Can you pls post the schema of both the tables.

On Wednesday, September 30, 2020, Lakshmi Nivedita <[hidden email]> wrote:
Thank you for the clarification.I would like to how can I  proceed for this kind of scenario in pyspark

I have a scenario subtracting the total number of days with the number of holidays in pyspark by using dataframes

I have a table with dates  date1  date2 in one table and number of holidays in another table
df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1) totalnumberofdays  - df2.holidays  from table A;

df2 = select count(holiays)
from table B
where holidate >= 'date1'(table A)
and holidate < = date2(table A)
and country = A.ctry(table A)

Except country no other column is not a unique key




On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <[hidden email]> wrote:
No, you can't use the SparkSession from within a function executed by Spark tasks.

On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <[hidden email]> wrote:
Here is a spark udf structure as an example

Def sampl_fn(x):
           Spark.sql(“select count(Id) from sample Where Id = x ”)


Spark.udf.register(“sample_fn”, sample_fn)

Spark.sql(“select id, sampl_fn(Id) from example”)

Advance Thanks for the help
--
k.Lakshmi Nivedita





Reply | Threaded
Open this post in threaded view
|

Re: [Spark SQL] does pyspark udf support spark.sql inside def

Lakshmi Nivedita
Sure, will do that.I am using impala in pyspark. to retrieve the data

A table schema
date1 Bigint
date2 Bigint
ctry   string
sample data for table A:
date1                  date2     ctry
22-12-2012   06-01-2013  IN 


B table schema

holidate Bigint 
Holiday =0/1 —string

0 means holiday—-
1 means working

Country string 

Sample data for table B :holidate    holiday country
                      25-12-2012  0        IN
                      01-01-2013  0    IN

Thanks
Nivedita




On Thu, Oct 1, 2020 at 9:25 AM Amit Joshi <[hidden email]> wrote:
Can you pls post the schema of both the tables.

On Wednesday, September 30, 2020, Lakshmi Nivedita <[hidden email]> wrote:
Thank you for the clarification.I would like to how can I  proceed for this kind of scenario in pyspark

I have a scenario subtracting the total number of days with the number of holidays in pyspark by using dataframes

I have a table with dates  date1  date2 in one table and number of holidays in another table
df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1) totalnumberofdays  - df2.holidays  from table A;

df2 = select count(holiays)
from table B
where holidate >= 'date1'(table A)
and holidate < = date2(table A)
and country = A.ctry(table A)

Except country no other column is not a unique key




On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <[hidden email]> wrote:
No, you can't use the SparkSession from within a function executed by Spark tasks.

On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <[hidden email]> wrote:
Here is a spark udf structure as an example

Def sampl_fn(x):
           Spark.sql(“select count(Id) from sample Where Id = x ”)


Spark.udf.register(“sample_fn”, sample_fn)

Spark.sql(“select id, sampl_fn(Id) from example”)

Advance Thanks for the help
--
k.Lakshmi Nivedita