Spark SQL query

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Spark SQL query

Arpan Bhandari
Hi ,

Is there a way to track back spark sql after it has been already run i.e.
query has been already submitted by a person and i have to back trace what
query actually got submitted.


Appreciate any help on this.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Sachit Murarka
Hi Arpan,

Was it executed using spark shell?
If yes type :history

Do u have history server enabled?
If yes , go to the history and go to the SQL tab in History UI.

Thanks
Sachit

On Fri, 29 Jan 2021, 19:19 Arpan Bhandari, <[hidden email]> wrote:
Hi ,

Is there a way to track back spark sql after it has been already run i.e.
query has been already submitted by a person and i have to back trace what
query actually got submitted.


Appreciate any help on this.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
Hi Sachit,

Yes it was executed using spark shell, history is already enabled. already
checked sql tab but it is not showing the query. My spark version is 2.4.5

Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Sachit Murarka
  Hi Arpan,

Launch spark shell and in the shell type ":history" , you will see the query executed.

In the Spark UI under SQL Tab you can see the query plan when you click on the details button(Though it won't show you the complete query). But by looking at the plan you can get your query.

Hope this helps!


Kind Regards,
Sachit Murarka


On Fri, Jan 29, 2021 at 9:33 PM Arpan Bhandari <[hidden email]> wrote:
Hi Sachit,

Yes it was executed using spark shell, history is already enabled. already
checked sql tab but it is not showing the query. My spark version is 2.4.5

Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Mich Talebzadeh
In reply to this post by Arpan Bhandari
Hi Arpan,

I presume you are interested in what client was doing.

If you have access to the edge node (where spark code is submitted), look for the following file

${HOME/.spark_history

example

-rw-r--r--. 1 hduser hadoop 111997 Jun  2  2018 .spark_history

just use shell tools (cat, grep etc) to have a look

Or put it in HDFS somewhere

hdfs dfs -put .spark_history /misc/spark_history ## Spark cannot read a hidden file

#and read it as text file through sparkRDD in spark-shell

scala> val historyRDD = spark.sparkContext.textFile("/misc/spark_history")
historyRDD: org.apache.spark.rdd.RDD[String] = /misc/spark_history MapPartitionsRDD[11] at textFile at <console>:23

#print it out 

 historyRDD.collect().foreach(f=>{println(f)})


HTH





LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Fri, 29 Jan 2021 at 13:49, Arpan Bhandari <[hidden email]> wrote:
Hi ,

Is there a way to track back spark sql after it has been already run i.e.
query has been already submitted by a person and i have to back trace what
query actually got submitted.


Appreciate any help on this.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
In reply to this post by Sachit Murarka
Hey Sachit,

It shows the query plan, which is difficult to diagnose out and depict the
actual query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
In reply to this post by Mich Talebzadeh
Hey Mich,

Thanks for the suggestions, but i don't see any such folder created on the
edge node.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Sachit Murarka
In reply to this post by Arpan Bhandari
Hi arpan,

In spark shell when you type 
:history.
then also it is not showing?

Thanks
Sachit

On Mon, 1 Feb 2021, 21:13 Arpan Bhandari, <[hidden email]> wrote:
Hey Sachit,

It shows the query plan, which is difficult to diagnose out and depict the
actual query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
Sachit,

That is showing all the queries that got executed, but how it would get
mapped to specific application Id it was associated with ?

Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Sachit Murarka
Application wise it wont show as such.
You can try to corelate it with explain plain output using some filters or attribute.

Or else if you do not have too much queries in history. Just take queries and find plan of those queries and match it with shown in UI.

I know thats the tedious task. But I dont think that there is other way.

Thanks
Sachit

On Mon, 1 Feb 2021, 22:32 Arpan Bhandari, <[hidden email]> wrote:
Sachit,

That is showing all the queries that got executed, but how it would get
mapped to specific application Id it was associated with ?

Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Mich Talebzadeh
In reply to this post by Arpan Bhandari
Hi Arpan,

log in as any user that has execution right for spark. type spark-shell, do some simple commands then exit. go to home directory of that user and look for that hidden file


${HOME/.spark_history

it will be there.

HTH,



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Mon, 1 Feb 2021 at 15:44, Arpan Bhandari <[hidden email]> wrote:
Hey Mich,

Thanks for the suggestions, but i don't see any such folder created on the
edge node.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
This post was updated on .
In reply to this post by Sachit Murarka
Sanchit,

It seems I have to do some sort of analysis from the plan to get the query.
Appreciate all your previous help/suggestions on this.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
In reply to this post by Mich Talebzadeh
Hi Mich,

Repeated the steps as suggested, but still there is no such folder created
in the home directory. Do we need to enable some property so that it creates
one.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Mich Talebzadeh
Hi Arpan.

I believe all applications including spark and scala create a hidden history file

You can go to home directory 

cd

# see list of all hidden files

ls -a | egrep '^\.'

If you are using scala do you see .scala_history file?

.scala_history

HTH



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Tue, 2 Feb 2021 at 10:16, Arpan Bhandari <[hidden email]> wrote:
Hi Mich,

Repeated the steps as suggested, but still there is no such folder created
in the home directory. Do we need to enable some property so that it creates
one.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
Hi Mich,

I do see the .scala_history directory, but it contains all the queries which
got executed uptill now, but if i have to map a specific query to an
application Id in yarn that would not correlate, hence this method alone
won't suffice

Thanks,
Arpan Bhandari
 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Mich Talebzadeh
Ok

on host starting the job on port 8088, do you have access to all applications like shown in the attached file. If you look at history can you see the jobs?

Also if you go to history next to Tracking URL: History 

HTH


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Tue, 2 Feb 2021 at 14:47, Arpan Bhandari <[hidden email]> wrote:
Hi Mich,

I do see the .scala_history directory, but it contains all the queries which
got executed uptill now, but if i have to map a specific query to an
application Id in yarn that would not correlate, hence this method alone
won't suffice

Thanks,
Arpan Bhandari




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Capture.PNG (260K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
Yes i can see the jobs on 8088 and also on the spark history url. spark
history server is showing up the plan details on the sql tab but not giving
the query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Mich Talebzadeh
create a directory in hdfs 

hdfs dfs -mkdir /spark_event_logs

modify file $SPARK_HOME/conf/spark-defaults.conf and add these two lines

spark.eventLog.enabled=true
# do not use quotes below
spark.eventLog.dir=hdfs://rhes75:9000/spark_event_logs

Then run a job and check it

hdfs dfs -ls /spark_event_logs

-rw-rw----   3 hduser supergroup   33795834 2021-02-02 19:48 /spark_event_logs/yarn-1612295234284

That should have all the info you need

Make sure the directory hdfs://<NAME_NODE>:9000/spark_event_logs is writable by spark


HTH




LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Tue, 2 Feb 2021 at 15:59, Arpan Bhandari <[hidden email]> wrote:
Yes i can see the jobs on 8088 and also on the spark history url. spark
history server is showing up the plan details on the sql tab but not giving
the query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Arpan Bhandari
Mich,

The directory is already there and event logs are getting generated, I have
checked them it contains the query plan but not the actual query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark SQL query

Mich Talebzadeh
I gather what you are after is a code sniffer for Spark that provides a form of GUI to get the code that applications run against spark.

I don't think Spark has this type of plug-in although it would be potentially useful. Some RDBMS provide this. Usually stored on some form of persistent storage or database. I have not come across it in Spark.

HTH




LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Wed, 3 Feb 2021 at 05:10, Arpan Bhandari <[hidden email]> wrote:
Mich,

The directory is already there and event logs are getting generated, I have
checked them it contains the query plan but not the actual query.


Thanks,
Arpan Bhandari



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

12