How to get all input tables of a SPARK SQL 'select' statement

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

How to get all input tables of a SPARK SQL 'select' statement

luby
Hi, All,

We need to get all input tables of several SPARK SQL 'select' statements.

We can get those information of Hive SQL statements by using 'explain dependency select....'.
But I can't find the equivalent command for SPARK SQL.

Does anyone know how to get this information of a SPARK SQL 'select' statement?

Thanks

Boying
 




   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.



Reply | Threaded
Open this post in threaded view
|

Re: How to get all input tables of a SPARK SQL 'select' statement

Tomas Bartalos

This might help:

show tables;


st 23. 1. 2019 o 10:43 <[hidden email]> napísal(a):
Hi, All,

We need to get all input tables of several SPARK SQL 'select' statements.

We can get those information of Hive SQL statements by using 'explain dependency select....'.
But I can't find the equivalent command for SPARK SQL.

Does anyone know how to get this information of a SPARK SQL 'select' statement?

Thanks

Boying
 




   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.



Reply | Threaded
Open this post in threaded view
|

Re: How to get all input tables of a SPARK SQL 'select' statement

ramannanda9@gmail.com
Explain extended or explain would list the plan along with the tables. Not aware of any statements that explicitly list dependencies or tables directly. 

Regards,
Ramandeep Singh

On Wed, Jan 23, 2019, 11:05 Tomas Bartalos <[hidden email] wrote:

This might help:

show tables;


st 23. 1. 2019 o 10:43 <[hidden email]> napísal(a):
Hi, All,

We need to get all input tables of several SPARK SQL 'select' statements.

We can get those information of Hive SQL statements by using 'explain dependency select....'.
But I can't find the equivalent command for SPARK SQL.

Does anyone know how to get this information of a SPARK SQL 'select' statement?

Thanks

Boying
 




   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.



Reply | Threaded
Open this post in threaded view
|

Re: How to get all input tables of a SPARK SQL 'select' statement

Shahab Yunus
Could be a tangential idea but might help: Why not use queryExecution and logicalPlan objects that are available when you execute a query using SparkSession and get a DataFrame back? The Json representation contains almost all the info that you need and you don't need to go to Hive to get this info.


On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda <[hidden email]> wrote:
Explain extended or explain would list the plan along with the tables. Not aware of any statements that explicitly list dependencies or tables directly. 

Regards,
Ramandeep Singh

On Wed, Jan 23, 2019, 11:05 Tomas Bartalos <[hidden email] wrote:

This might help:

show tables;


st 23. 1. 2019 o 10:43 <[hidden email]> napísal(a):
Hi, All,

We need to get all input tables of several SPARK SQL 'select' statements.

We can get those information of Hive SQL statements by using 'explain dependency select....'.
But I can't find the equivalent command for SPARK SQL.

Does anyone know how to get this information of a SPARK SQL 'select' statement?

Thanks

Boying
 




   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.



Reply | Threaded
Open this post in threaded view
|

答复: Re: How to get all input tables of a SPARK SQL 'select' statement

luby
Thanks all for your help.

I'll try your suggestions.

Thanks again :)
 



发件人: "Shahab Yunus" <[hidden email]>
收件人: "Ramandeep Singh Nanda" <[hidden email]>
抄送: "Tomas Bartalos" <[hidden email]>, [hidden email], "user @spark/'user @spark'/spark users/user@spark" <[hidden email]>
日期: 2019/01/24 06:45
主题: Re: How to get all input tables of a SPARK SQL 'select' statement





Could be a tangential idea but might help: Why not use queryExecution and logicalPlan objects that are available when you execute a query using SparkSession and get a DataFrame back? The Json representation contains almost all the info that you need and you don't need to go to Hive to get this info.

Some details here:https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html#queryExecution

On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda <ramannanda9@...> wrote:
Explain extended or explain would list the plan along with the tables. Not aware of any statements that explicitly list dependencies or tables directly. 

Regards,
Ramandeep Singh


On Wed, Jan 23, 2019, 11:05 Tomas Bartalos <tomas.bartalos@... wrote:
This might help:
show tables;

st 23. 1. 2019 o 10:43 <[hidden email]> napísal(a):
Hi, All,

We need to get all input tables of several SPARK SQL 'select' statements.


We can get those information of Hive SQL statements by using 'explain dependency select....'.

But I can't find the equivalent command for SPARK SQL.


Does anyone know how to get this information of a SPARK SQL 'select' statement?


Thanks


Boying

 





   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.









   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.



Reply | Threaded
Open this post in threaded view
|

答复: Re: How to get all input tables of a SPARK SQL 'select' statement

luby
In reply to this post by Shahab Yunus
Hi, All,

I tried the suggested approach and it works, but it requires to 'run' the SQL statement first.

I just want to parse the SQL statement without running it, so I can do this in my laptop without connecting to our production environment.

I tried to write a tool which uses the SqlBase.g4 bundled with SPARK SQL to extract names of the input tables and it works as expected.

But I have a question:

The parser generated by SqlBase.g4 only accepts 'select' statement with all keywords such as 'SELECT', 'FROM' and table names capitalized
e.g. it accepts 'SELECT * FROM FOO', but it doesn't accept 'select * from foo'.

But I can run the spark.sql("select * from foo") in the spark2-shell without any problem.

Is there another 'layer' in the SPARK SQL to capitalize those 'tokens' before invoking the parser?

If so, why not just modify the SqlBase.g4 to accept lower cases keywords?

Thanks

Boying



发件人: "Shahab Yunus" <[hidden email]>
收件人: "Ramandeep Singh Nanda" <[hidden email]>
抄送: "Tomas Bartalos" <[hidden email]>, [hidden email], "user @spark/'user @spark'/spark users/user@spark" <[hidden email]>
日期: 2019/01/24 06:45
主题: Re: How to get all input tables of a SPARK SQL 'select' statement





Could be a tangential idea but might help: Why not use queryExecution and logicalPlan objects that are available when you execute a query using SparkSession and get a DataFrame back? The Json representation contains almost all the info that you need and you don't need to go to Hive to get this info.

Some details here:https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html#queryExecution

On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda <ramannanda9@...> wrote:
Explain extended or explain would list the plan along with the tables. Not aware of any statements that explicitly list dependencies or tables directly. 

Regards,
Ramandeep Singh


On Wed, Jan 23, 2019, 11:05 Tomas Bartalos <tomas.bartalos@... wrote:
This might help:
show tables;

st 23. 1. 2019 o 10:43 <[hidden email]> napísal(a):
Hi, All,

We need to get all input tables of several SPARK SQL 'select' statements.


We can get those information of Hive SQL statements by using 'explain dependency select....'.

But I can't find the equivalent command for SPARK SQL.


Does anyone know how to get this information of a SPARK SQL 'select' statement?


Thanks


Boying

 





   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.









   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.



Reply | Threaded
Open this post in threaded view
|

Re: Re: How to get all input tables of a SPARK SQL 'select' statement

ramannanda9@gmail.com
Hi, 

You don't have to run the SQL statement. You can parse it, that will be the logical parsing.

val logicalPlan = ss.sessionState.sqlParser.parsePlan(sqlText = query)
println(logicalPlan.prettyJson)
[ {
  "class" : "org.apache.spark.sql.catalyst.plans.logical.Project",
  "num-children" : 1,
  "projectList" : [ [ {
    "class" : "org.apache.spark.sql.catalyst.analysis.UnresolvedStar",
    "num-children" : 0
  } ] ],
  "child" : 0
}, {
  "class" : "org.apache.spark.sql.catalyst.analysis.UnresolvedRelation",
  "num-children" : 0,
  "tableIdentifier" : {
    "product-class" : "org.apache.spark.sql.catalyst.TableIdentifier",
    "table" : "abc"
  }
} ]



On Fri, Jan 25, 2019 at 6:07 AM <[hidden email]> wrote:
Hi, All,

I tried the suggested approach and it works, but it requires to 'run' the SQL statement first.

I just want to parse the SQL statement without running it, so I can do this in my laptop without connecting to our production environment.

I tried to write a tool which uses the SqlBase.g4 bundled with SPARK SQL to extract names of the input tables and it works as expected.

But I have a question:

The parser generated by SqlBase.g4 only accepts 'select' statement with all keywords such as 'SELECT', 'FROM' and table names capitalized
e.g. it accepts 'SELECT * FROM FOO', but it doesn't accept 'select * from foo'.

But I can run the spark.sql("select * from foo") in the spark2-shell without any problem.

Is there another 'layer' in the SPARK SQL to capitalize those 'tokens' before invoking the parser?

If so, why not just modify the SqlBase.g4 to accept lower cases keywords?

Thanks

Boying



发件人: "Shahab Yunus" <[hidden email]>
收件人: "Ramandeep Singh Nanda" <[hidden email]>
抄送: "Tomas Bartalos" <[hidden email]>, [hidden email], "user @spark/'user @spark'/spark users/user@spark" <[hidden email]>
日期: 2019/01/24 06:45
主题: Re: How to get all input tables of a SPARK SQL 'select' statement





Could be a tangential idea but might help: Why not use queryExecution and logicalPlan objects that are available when you execute a query using SparkSession and get a DataFrame back? The Json representation contains almost all the info that you need and you don't need to go to Hive to get this info.

Some details here:https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html#queryExecution

On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda <[hidden email]> wrote:
Explain extended or explain would list the plan along with the tables. Not aware of any statements that explicitly list dependencies or tables directly. 

Regards,
Ramandeep Singh


On Wed, Jan 23, 2019, 11:05 Tomas Bartalos <[hidden email] wrote:
This might help:
show tables;

st 23. 1. 2019 o 10:43 <[hidden email]> napísal(a):
Hi, All,

We need to get all input tables of several SPARK SQL 'select' statements.


We can get those information of Hive SQL statements by using 'explain dependency select....'.

But I can't find the equivalent command for SPARK SQL.


Does anyone know how to get this information of a SPARK SQL 'select' statement?


Thanks


Boying

 





   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.









   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.





--
Regards,
Ramandeep Singh
http://orastack.com
+13474792296
[hidden email]
Reply | Threaded
Open this post in threaded view
|

答复: Re: Re: How to get all input tables of a SPARK SQL 'select' statement

luby
Thank you so much. I tried your suggestion and it really works!




发件人: "Ramandeep Singh Nanda" <[hidden email]>
收件人: [hidden email]
抄送: "Shahab Yunus" <[hidden email]>, "Tomas Bartalos" <[hidden email]>, "user @spark/'user @spark'/spark users/user@spark" <[hidden email]>
日期: 2019/01/26 05:42
主题: Re: Re: How to get all input tables of a SPARK SQL 'select' statement





Hi, 

You don't have to run the SQL statement. You can parse it, that will be the logical parsing.

val logicalPlan = ss.sessionState.sqlParser.parsePlan(sqlText = query)
println
(logicalPlan.prettyJson)

[ {
  "class" : "org.apache.spark.sql.catalyst.plans.logical.Project",
  "num-children" : 1,
  "projectList" : [ [ {
    "class" : "org.apache.spark.sql.catalyst.analysis.UnresolvedStar",
    "num-children" : 0
  } ] ],
  "child" : 0
}, {
  "class" : "org.apache.spark.sql.catalyst.analysis.UnresolvedRelation",
  "num-children" : 0,
  "tableIdentifier" : {
    "product-class" : "org.apache.spark.sql.catalyst.TableIdentifier",
    "table" : "abc"
  }
} ]



On Fri, Jan 25, 2019 at 6:07 AM <[hidden email]> wrote:
Hi, All,

I tried the suggested approach and it works, but it requires to 'run' the SQL statement first.


I just want to parse the SQL statement without running it, so I can do this in my laptop without connecting to our production environment.


I tried to write a tool which uses the SqlBase.g4 bundled with SPARK SQL to extract names of the input tables and it works as expected.


But I have a question:


The parser generated by SqlBase.g4 only accepts 'select' statement with all keywords such as 'SELECT', 'FROM' and table names capitalized
e.g. it accepts 'SELECT * FROM FOO', but it doesn't accept 'select * from foo'.


But I can run the spark.sql("select * from foo") in the spark2-shell without any problem.


Is there another 'layer' in the SPARK SQL to capitalize those 'tokens' before invoking the parser?


If so, why not just modify the SqlBase.g4 to accept lower cases keywords?


Thanks


Boying



发件人: "Shahab Yunus" <shahab.yunus@...>
收件人: "Ramandeep Singh Nanda" <ramannanda9@...>
抄送: "Tomas Bartalos" <tomas.bartalos@...>, [hidden email], "user @spark/'user @spark'/spark users/user@spark" <user@...>
日期: 2019/01/24 06:45
主题: Re: How to get all input tables of a SPARK SQL 'select' statement






Could be a tangential idea but might help: Why not use queryExecution and logicalPlan objects that are available when you execute a query using SparkSession and get a DataFrame back? The Json representation contains almost all the info that you need and you don't need to go to Hive to get this info.

Some details here:
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html#queryExecution

On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda <
ramannanda9@...> wrote:
Explain extended or explain would list the plan along with the tables. Not aware of any statements that explicitly list dependencies or tables directly. 

Regards,
Ramandeep Singh

On Wed, Jan 23, 2019, 11:05 Tomas Bartalos <
tomas.bartalos@... wrote:
This might help:

show tables
;

st 23. 1. 2019 o 10:43 <
[hidden email]> napísal(a):
Hi, All,


We need to get all input tables of several SPARK SQL 'select' statements.


We can get those information of Hive SQL statements by using 'explain dependency select....'.

But I can't find the equivalent command for SPARK SQL.


Does anyone know how to get this information of a SPARK SQL 'select' statement?


Thanks


Boying

 




   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.








   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.




--
Regards,
Ramandeep Singh
http://orastack.com
+13474792296

ramannanda9@...





   
本邮件内容包含保密信息。如阁下并非拟发送的收件人,请您不要阅读、保存、对外披露或复制本邮件的任何内容,或者打开本邮件的任何附件。请即回复邮件告知发件人,并立刻将该邮件及其附件从您的电脑系统中全部删除,不胜感激。


     
This email message may contain confidential and/or privileged information. If you are not the intended recipient, please do not read, save, forward, disclose or copy the contents of this email or open any file attached to this email. We will be grateful if you could advise the sender immediately by replying this email, and delete this email and any attachment or links to this email completely and immediately from your computer system.