Spark hangs while reading from jdbc - does nothing

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark hangs while reading from jdbc - does nothing

Ruijing Li
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing

Mich Talebzadeh
Hi,

Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging?

set pagesize 9999
set linesize 140
set heading off
select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD YYYY HH:MI AM') from v$database;
set heading on
column spid heading "OS PID" format a6
column process format a13 heading "Client ProcID"
column username  format a15
column sid       format 999
column serial#   format 99999
column STATUS    format a3 HEADING 'ACT'
column last      format 9,999.99
column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
column phyRds    format 999,999,999 HEADING 'Physical I/O'
column total_memory format 999,999,999 HEADING 'MEM/KB'
--
SELECT
          substr(a.username,1,15) "LOGIN"
        , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#"
        , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
        , substr(a.machine,1,10) HOST
        , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
        , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
        , substr(a.program,1,15) PROGRAM
        --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
        , (
                select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn
                where ss.sid = a.sid and
                        sn.statistic# = ss.statistic# and
                        -- sn.name in ('session pga memory')
                        sn.name in ('session pga memory','session uga memory')
          ) AS total_memory
        , (b.block_gets + b.consistent_gets) TotGets
        , b.physical_reads phyRds
        , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
        , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO"
FROM
         v$process p
        ,v$session a
        ,v$sess_io b
WHERE
a.paddr = p.addr
AND p.background IS NULL
--AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
AND a.sid = b.sid
AND a.username is not null
--AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
--AND CURRENT_DATE - logon_time > 0

--AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  -- exclude me
--AND (b.block_gets + b.consistent_gets) > 0
ORDER BY a.username;
exit


HTH

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing

jane thorpe
In reply to this post by Ruijing Li

You seem to be implying the error is intermittent. 

You seem to be implying data is being ingested  via JDBC. So the connection has proven itself to be working unless no data is arriving from the  JDBC channel at all.  If no data is arriving then one could say it could be  the JDBC.

If the error is intermittent  then it is likely a resource involved in processing is filling to capacity.

Try reducing the data ingestion volume and see if that completes, then increase the data ingested  incrementally.

I assume you have  run the job on small amount of data so you have  completed your prototype stage successfully.




On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]> wrote:

Hi,

Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging?

set pagesize 9999
set linesize 140
set heading off
select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD YYYY HH:MI AM') from v$database;
set heading on
column spid heading "OS PID" format a6
column process format a13 heading "Client ProcID"
column username  format a15
column sid       format 999
column serial#   format 99999
column STATUS    format a3 HEADING 'ACT'
column last      format 9,999.99
column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
column phyRds    format 999,999,999 HEADING 'Physical I/O'
column total_memory format 999,999,999 HEADING 'MEM/KB'
--
SELECT
          substr(a.username,1,15) "LOGIN"
        , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#"
        , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
        , substr(a.machine,1,10) HOST
        , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
        , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
        , substr(a.program,1,15) PROGRAM
        --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
        , (
                select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn
                where ss.sid = a.sid and
                        sn.statistic# = ss.statistic# and
                        -- sn.name in ('session pga memory')
                        sn.name in ('session pga memory','session uga memory')
          ) AS total_memory
        , (b.block_gets + b.consistent_gets) TotGets
        , b.physical_reads phyRds
        , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
        , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO"
FROM
         v$process p
        ,v$session a
        ,v$sess_io b
WHERE
a.paddr = p.addr
AND p.background IS NULL
--AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
AND a.sid = b.sid
AND a.username is not null
--AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
--AND CURRENT_DATE - logon_time > 0

--AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  -- exclude me
--AND (b.block_gets + b.consistent_gets) > 0
ORDER BY a.username;
exit


HTH

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

jane thorpe


This tool may be useful for you to trouble shoot your problems away.

https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html


"APM tools typically use a waterfall-type view to show the blocking time of different components cascading through the control flow within an application.
These types of visualizations are useful, and AppOptics has them, but they can be difficult to understand for those of us without a PhD."

Especially  helpful if you want to understand through visualisation and you do not have a phD.


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
CC: user <[hidden email]>
Sent: Sun, 12 Apr 2020 4:35
Subject: Re: Spark hangs while reading from jdbc - does nothing

You seem to be implying the error is intermittent. 
You seem to be implying data is being ingested  via JDBC. So the connection has proven itself to be working unless no data is arriving from the  JDBC channel at all.  If no data is arriving then one could say it could be  the JDBC.
If the error is intermittent  then it is likely a resource involved in processing is filling to capacity.
Try reducing the data ingestion volume and see if that completes, then increase the data ingested  incrementally.
I assume you have  run the job on small amount of data so you have  completed your prototype stage successfully.


On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]> wrote:
Hi,

Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging?

set pagesize 9999
set linesize 140
set heading off
select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD YYYY HH:MI AM') from v$database;
set heading on
column spid heading "OS PID" format a6
column process format a13 heading "Client ProcID"
column username  format a15
column sid       format 999
column serial#   format 99999
column STATUS    format a3 HEADING 'ACT'
column last      format 9,999.99
column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
column phyRds    format 999,999,999 HEADING 'Physical I/O'
column total_memory format 999,999,999 HEADING 'MEM/KB'
--
SELECT
          substr(a.username,1,15) "LOGIN"
        , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#"
        , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
        , substr(a.machine,1,10) HOST
        , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
        , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
        , substr(a.program,1,15) PROGRAM
        --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
        , (
                select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn
                where ss.sid = a.sid and
                        sn.statistic# = ss.statistic# and
                        -- sn.name in ('session pga memory')
                        sn.name in ('session pga memory','session uga memory')
          ) AS total_memory
        , (b.block_gets + b.consistent_gets) TotGets
        , b.physical_reads phyRds
        , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
        , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO"
FROM
         v$process p
        ,v$session a
        ,v$sess_io b
WHERE
a.paddr = p.addr
AND p.background IS NULL
--AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
AND a.sid = b.sid
AND a.username is not null
--AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
--AND CURRENT_DATE - logon_time > 0

--AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  -- exclude me
--AND (b.block_gets + b.consistent_gets) > 0
ORDER BY a.username;
exit


HTH

Dr Mich Talebzadeh
 
 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
 


On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

jane thorpe
Here a is another tool I use Logic Analyser  7:55

you could take some suggestions for improving performance  queries.
https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: janethorpe1 <[hidden email]>; mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
Sent: Mon, 13 Apr 2020 8:32
Subject: Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting



This tool may be useful for you to trouble shoot your problems away.

https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html


"APM tools typically use a waterfall-type view to show the blocking time of different components cascading through the control flow within an application.
These types of visualizations are useful, and AppOptics has them, but they can be difficult to understand for those of us without a PhD."

Especially  helpful if you want to understand through visualisation and you do not have a phD.


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
CC: user <[hidden email]>
Sent: Sun, 12 Apr 2020 4:35
Subject: Re: Spark hangs while reading from jdbc - does nothing

You seem to be implying the error is intermittent. 
You seem to be implying data is being ingested  via JDBC. So the connection has proven itself to be working unless no data is arriving from the  JDBC channel at all.  If no data is arriving then one could say it could be  the JDBC.
If the error is intermittent  then it is likely a resource involved in processing is filling to capacity.
Try reducing the data ingestion volume and see if that completes, then increase the data ingested  incrementally.
I assume you have  run the job on small amount of data so you have  completed your prototype stage successfully.


On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]> wrote:
Hi,

Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging?

set pagesize 9999
set linesize 140
set heading off
select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD YYYY HH:MI AM') from v$database;
set heading on
column spid heading "OS PID" format a6
column process format a13 heading "Client ProcID"
column username  format a15
column sid       format 999
column serial#   format 99999
column STATUS    format a3 HEADING 'ACT'
column last      format 9,999.99
column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
column phyRds    format 999,999,999 HEADING 'Physical I/O'
column total_memory format 999,999,999 HEADING 'MEM/KB'
--
SELECT
          substr(a.username,1,15) "LOGIN"
        , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#"
        , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
        , substr(a.machine,1,10) HOST
        , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
        , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
        , substr(a.program,1,15) PROGRAM
        --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
        , (
                select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn
                where ss.sid = a.sid and
                        sn.statistic# = ss.statistic# and
                        -- sn.name in ('session pga memory')
                        sn.name in ('session pga memory','session uga memory')
          ) AS total_memory
        , (b.block_gets + b.consistent_gets) TotGets
        , b.physical_reads phyRds
        , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
        , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO"
FROM
         v$process p
        ,v$session a
        ,v$sess_io b
WHERE
a.paddr = p.addr
AND p.background IS NULL
--AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
AND a.sid = b.sid
AND a.username is not null
--AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
--AND CURRENT_DATE - logon_time > 0

--AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  -- exclude me
--AND (b.block_gets + b.consistent_gets) > 0
ORDER BY a.username;
exit


HTH

Dr Mich Talebzadeh
 
 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
 


On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Gabor Somogyi
The simplest way is to do thread dump which doesn't require any fancy tool (it's available on Spark UI).
Without thread dump it's hard to say anything...


On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]> wrote:
Here a is another tool I use Logic Analyser  7:55

you could take some suggestions for improving performance  queries.


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: janethorpe1 <[hidden email]>; mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
Sent: Mon, 13 Apr 2020 8:32
Subject: Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting



This tool may be useful for you to trouble shoot your problems away.



"APM tools typically use a waterfall-type view to show the blocking time of different components cascading through the control flow within an application.
These types of visualizations are useful, and AppOptics has them, but they can be difficult to understand for those of us without a PhD."

Especially  helpful if you want to understand through visualisation and you do not have a phD.


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
CC: user <[hidden email]>
Sent: Sun, 12 Apr 2020 4:35
Subject: Re: Spark hangs while reading from jdbc - does nothing

You seem to be implying the error is intermittent. 
You seem to be implying data is being ingested  via JDBC. So the connection has proven itself to be working unless no data is arriving from the  JDBC channel at all.  If no data is arriving then one could say it could be  the JDBC.
If the error is intermittent  then it is likely a resource involved in processing is filling to capacity.
Try reducing the data ingestion volume and see if that completes, then increase the data ingested  incrementally.
I assume you have  run the job on small amount of data so you have  completed your prototype stage successfully.


On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]> wrote:
Hi,

Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging?

set pagesize 9999
set linesize 140
set heading off
select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD YYYY HH:MI AM') from v$database;
set heading on
column spid heading "OS PID" format a6
column process format a13 heading "Client ProcID"
column username  format a15
column sid       format 999
column serial#   format 99999
column STATUS    format a3 HEADING 'ACT'
column last      format 9,999.99
column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
column phyRds    format 999,999,999 HEADING 'Physical I/O'
column total_memory format 999,999,999 HEADING 'MEM/KB'
--
SELECT
          substr(a.username,1,15) "LOGIN"
        , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#"
        , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
        , substr(a.machine,1,10) HOST
        , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
        , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
        , substr(a.program,1,15) PROGRAM
        --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
        , (
                select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn
                where ss.sid = a.sid and
                        sn.statistic# = ss.statistic# and
                        -- sn.name in ('session pga memory')
                        sn.name in ('session pga memory','session uga memory')
          ) AS total_memory
        , (b.block_gets + b.consistent_gets) TotGets
        , b.physical_reads phyRds
        , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
        , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO"
FROM
         v$process p
        ,v$session a
        ,v$sess_io b
WHERE
a.paddr = p.addr
AND p.background IS NULL
--AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
AND a.sid = b.sid
AND a.username is not null
--AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
--AND CURRENT_DATE - logon_time > 0

--AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  -- exclude me
--AND (b.block_gets + b.consistent_gets) > 0
ORDER BY a.username;
exit


HTH

Dr Mich Talebzadeh
 
 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
 


On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Ruijing Li
Once I do. thread dump, what should I be looking for to tell where it is hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is also being blocked by spark UI. If there are no tasks, is there a point to do thread dump of executors?

On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]> wrote:
The simplest way is to do thread dump which doesn't require any fancy tool (it's available on Spark UI).
Without thread dump it's hard to say anything...


On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]> wrote:
Here a is another tool I use Logic Analyser  7:55

you could take some suggestions for improving performance  queries.


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: janethorpe1 <[hidden email]>; mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
Sent: Mon, 13 Apr 2020 8:32
Subject: Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting



This tool may be useful for you to trouble shoot your problems away.



"APM tools typically use a waterfall-type view to show the blocking time of different components cascading through the control flow within an application.
These types of visualizations are useful, and AppOptics has them, but they can be difficult to understand for those of us without a PhD."

Especially  helpful if you want to understand through visualisation and you do not have a phD.


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
CC: user <[hidden email]>
Sent: Sun, 12 Apr 2020 4:35
Subject: Re: Spark hangs while reading from jdbc - does nothing

You seem to be implying the error is intermittent. 
You seem to be implying data is being ingested  via JDBC. So the connection has proven itself to be working unless no data is arriving from the  JDBC channel at all.  If no data is arriving then one could say it could be  the JDBC.
If the error is intermittent  then it is likely a resource involved in processing is filling to capacity.
Try reducing the data ingestion volume and see if that completes, then increase the data ingested  incrementally.
I assume you have  run the job on small amount of data so you have  completed your prototype stage successfully.


On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]> wrote:
Hi,

Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging?

set pagesize 9999
set linesize 140
set heading off
select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD YYYY HH:MI AM') from v$database;
set heading on
column spid heading "OS PID" format a6
column process format a13 heading "Client ProcID"
column username  format a15
column sid       format 999
column serial#   format 99999
column STATUS    format a3 HEADING 'ACT'
column last      format 9,999.99
column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
column phyRds    format 999,999,999 HEADING 'Physical I/O'
column total_memory format 999,999,999 HEADING 'MEM/KB'
--
SELECT
          substr(a.username,1,15) "LOGIN"
        , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#"
        , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
        , substr(a.machine,1,10) HOST
        , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
        , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
        , substr(a.program,1,15) PROGRAM
        --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
        , (
                select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn
                where ss.sid = a.sid and
                        sn.statistic# = ss.statistic# and
                        -- sn.name in ('session pga memory')
                        sn.name in ('session pga memory','session uga memory')
          ) AS total_memory
        , (b.block_gets + b.consistent_gets) TotGets
        , b.physical_reads phyRds
        , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
        , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO"
FROM
         v$process p
        ,v$session a
        ,v$sess_io b
WHERE
a.paddr = p.addr
AND p.background IS NULL
--AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
AND a.sid = b.sid
AND a.username is not null
--AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
--AND CURRENT_DATE - logon_time > 0

--AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  -- exclude me
--AND (b.block_gets + b.consistent_gets) > 0
ORDER BY a.username;
exit


HTH

Dr Mich Talebzadeh
 
 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
 


On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Jungtaek Lim-2
Do thread dump continuously, per specific period (like 1s) and see the change of stack / lock for each thread. (This is not easy to be done in UI so maybe doing manually would be the only option. Not sure Spark UI will provide the same, haven't used at all.) 

It will tell which thread is being blocked (even it's shown as running) and which point to look at.

On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
Once I do. thread dump, what should I be looking for to tell where it is hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is also being blocked by spark UI. If there are no tasks, is there a point to do thread dump of executors?

On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]> wrote:
The simplest way is to do thread dump which doesn't require any fancy tool (it's available on Spark UI).
Without thread dump it's hard to say anything...


On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]> wrote:
Here a is another tool I use Logic Analyser  7:55

you could take some suggestions for improving performance  queries.


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: janethorpe1 <[hidden email]>; mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
Sent: Mon, 13 Apr 2020 8:32
Subject: Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting



This tool may be useful for you to trouble shoot your problems away.



"APM tools typically use a waterfall-type view to show the blocking time of different components cascading through the control flow within an application.
These types of visualizations are useful, and AppOptics has them, but they can be difficult to understand for those of us without a PhD."

Especially  helpful if you want to understand through visualisation and you do not have a phD.


Jane thorpe
[hidden email]


-----Original Message-----
From: jane thorpe <[hidden email]>
To: mich.talebzadeh <[hidden email]>; liruijing09 <[hidden email]>; user <[hidden email]>
CC: user <[hidden email]>
Sent: Sun, 12 Apr 2020 4:35
Subject: Re: Spark hangs while reading from jdbc - does nothing

You seem to be implying the error is intermittent. 
You seem to be implying data is being ingested  via JDBC. So the connection has proven itself to be working unless no data is arriving from the  JDBC channel at all.  If no data is arriving then one could say it could be  the JDBC.
If the error is intermittent  then it is likely a resource involved in processing is filling to capacity.
Try reducing the data ingestion volume and see if that completes, then increase the data ingested  incrementally.
I assume you have  run the job on small amount of data so you have  completed your prototype stage successfully.


On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]> wrote:
Hi,

Have you checked your JDBC connections from Spark to Oracle. What is Oracle saying? Is it doing anything or hanging?

set pagesize 9999
set linesize 140
set heading off
select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE, 'MON DD YYYY HH:MI AM') from v$database;
set heading on
column spid heading "OS PID" format a6
column process format a13 heading "Client ProcID"
column username  format a15
column sid       format 999
column serial#   format 99999
column STATUS    format a3 HEADING 'ACT'
column last      format 9,999.99
column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
column phyRds    format 999,999,999 HEADING 'Physical I/O'
column total_memory format 999,999,999 HEADING 'MEM/KB'
--
SELECT
          substr(a.username,1,15) "LOGIN"
        , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS "SID/serial#"
        , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
        , substr(a.machine,1,10) HOST
        , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
        , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
        , substr(a.program,1,15) PROGRAM
        --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
        , (
                select round(sum(ss.value)/1024) from v$sesstat ss, v$statname sn
                where ss.sid = a.sid and
                        sn.statistic# = ss.statistic# and
                        -- sn.name in ('session pga memory')
                        sn.name in ('session pga memory','session uga memory')
          ) AS total_memory
        , (b.block_gets + b.consistent_gets) TotGets
        , b.physical_reads phyRds
        , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
        , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1) THEN '<-- YOU' ELSE ' ' END "INFO"
FROM
         v$process p
        ,v$session a
        ,v$sess_io b
WHERE
a.paddr = p.addr
AND p.background IS NULL
--AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
AND a.sid = b.sid
AND a.username is not null
--AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
--AND CURRENT_DATE - logon_time > 0

--AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  -- exclude me
--AND (b.block_gets + b.consistent_gets) > 0
ORDER BY a.username;
exit


HTH

Dr Mich Talebzadeh
 
 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
 


On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
Hi all,

I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on mesos. I am ingesting from an oracle database using spark.read.jdbc. I am seeing a strange issue where spark just hangs and does nothing, not starting any new tasks. Normally this job finishes in 30 stages but sometimes it stops at 29 completed stages and doesn’t start the last stage. The spark job is idling and there is no pending or active task. What could be the problem? Thanks.
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

ZHANG Wei
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Ruijing Li
Strangely enough I found an old issue that is the exact same issue as mine 

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Ruijing Li
In thread dump, I do see this 
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor 
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
-  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Could the fact that 160 has the monitor but is not running be causing a deadlock preventing the job from finishing?

I do see my Finalizer and main method are waiting. I don’t see any other threads from 3rd party libraries or my code in the dump. I do see spark context cleaner has timed waiting.

Thanks


On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <[hidden email]> wrote:
Strangely enough I found an old issue that is the exact same issue as mine 

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Ruijing Li
After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160

On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <[hidden email]> wrote:
In thread dump, I do see this 
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor 
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
-  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Could the fact that 160 has the monitor but is not running be causing a deadlock preventing the job from finishing?

I do see my Finalizer and main method are waiting. I don’t see any other threads from 3rd party libraries or my code in the dump. I do see spark context cleaner has timed waiting.

Thanks


On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <[hidden email]> wrote:
Strangely enough I found an old issue that is the exact same issue as mine 

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Jungtaek Lim-2
If there's no third party libraries in the dump then why not share the thread dump? (I mean, the output of jstack)

stack trace would be more helpful to find which thing acquired lock and which other things are waiting for acquiring lock, if we suspect deadlock.

On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <[hidden email]> wrote:
After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160

On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <[hidden email]> wrote:
In thread dump, I do see this 
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor 
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
-  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Could the fact that 160 has the monitor but is not running be causing a deadlock preventing the job from finishing?

I do see my Finalizer and main method are waiting. I don’t see any other threads from 3rd party libraries or my code in the dump. I do see spark context cleaner has timed waiting.

Thanks


On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <[hidden email]> wrote:
Strangely enough I found an old issue that is the exact same issue as mine 

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Ruijing Li
I apologize, but I cannot share it, even if it is just typical spark libraries. I definitely understand that limits debugging help, but wanted to understand if anyone has encountered a similar issue.

On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim <[hidden email]> wrote:
If there's no third party libraries in the dump then why not share the thread dump? (I mean, the output of jstack)

stack trace would be more helpful to find which thing acquired lock and which other things are waiting for acquiring lock, if we suspect deadlock.

On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <[hidden email]> wrote:
After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160

On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <[hidden email]> wrote:
In thread dump, I do see this 
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor 
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
-  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Could the fact that 160 has the monitor but is not running be causing a deadlock preventing the job from finishing?

I do see my Finalizer and main method are waiting. I don’t see any other threads from 3rd party libraries or my code in the dump. I do see spark context cleaner has timed waiting.

Thanks


On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <[hidden email]> wrote:
Strangely enough I found an old issue that is the exact same issue as mine 

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Jungtaek Lim-2
No, that's not a thing to apologize for. It's just your call - less context would bring less reaction and interest.

On Wed, Apr 22, 2020 at 11:50 AM Ruijing Li <[hidden email]> wrote:
I apologize, but I cannot share it, even if it is just typical spark libraries. I definitely understand that limits debugging help, but wanted to understand if anyone has encountered a similar issue.

On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim <[hidden email]> wrote:
If there's no third party libraries in the dump then why not share the thread dump? (I mean, the output of jstack)

stack trace would be more helpful to find which thing acquired lock and which other things are waiting for acquiring lock, if we suspect deadlock.

On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <[hidden email]> wrote:
After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160

On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <[hidden email]> wrote:
In thread dump, I do see this 
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor 
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
-  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Could the fact that 160 has the monitor but is not running be causing a deadlock preventing the job from finishing?

I do see my Finalizer and main method are waiting. I don’t see any other threads from 3rd party libraries or my code in the dump. I do see spark context cleaner has timed waiting.

Thanks


On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <[hidden email]> wrote:
Strangely enough I found an old issue that is the exact same issue as mine 

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

ZHANG Wei
That's not dead locked. They are just trying acqure the same Monitor lock, and there are 3 threads. One acquired, and others are waiting for the lock being released. It's a common senario. You have to check the monitor lock object from callstack source code. There should be some operations after holding the lock, and a cause to wait for something, which could be a hint.
> After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160

But back to your case, those are normal SparkUI HTTP service behaviors:
> - SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor
> - SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
> -  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Since I have no your core dump, just use my local environment dump as a demo, and comment in line:

  29 SparkUI-29-acceptor-0@105d657c-ServerConnector@1e9a8907{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} RUNNABLE Monitor(java.lang.Object@1195548344})
  sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
                                           ^^^^^^^^^^^^^^ The native accept0() can be understanded as Linux accept() [1], this can block.
  sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
  sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) => holding Monitor(java.lang.Object@1195548344})
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I can find `synchronized(this.lock)` in this source localtion context
  org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:397)
  org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:601)
  org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
  org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
  java.lang.Thread.run(Thread.java:748)

So, you might have to find another clues, just let those "SparkUI-" go,  or check the log to see if someone is working on something.

Just my 2 cents.

--
Cheers,
-z

[1] -- https://linux.die.net/man/2/accept

________________________________________
From: Jungtaek Lim <[hidden email]>
Sent: Wednesday, April 22, 2020 11:21
To: Ruijing Li
Cc: Gabor Somogyi; Mich Talebzadeh; ZHANG Wei; user
Subject: Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

No, that's not a thing to apologize for. It's just your call - less context would bring less reaction and interest.

On Wed, Apr 22, 2020 at 11:50 AM Ruijing Li <[hidden email]<mailto:[hidden email]>> wrote:
I apologize, but I cannot share it, even if it is just typical spark libraries. I definitely understand that limits debugging help, but wanted to understand if anyone has encountered a similar issue.

On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim <[hidden email]<mailto:[hidden email]>> wrote:
If there's no third party libraries in the dump then why not share the thread dump? (I mean, the output of jstack)

stack trace would be more helpful to find which thing acquired lock and which other things are waiting for acquiring lock, if we suspect deadlock.

On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <[hidden email]<mailto:[hidden email]>> wrote:
After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160

On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <[hidden email]<mailto:[hidden email]>> wrote:
In thread dump, I do see this
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
-  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Could the fact that 160 has the monitor but is not running be causing a deadlock preventing the job from finishing?

I do see my Finalizer and main method are waiting. I don’t see any other threads from 3rd party libraries or my code in the dump. I do see spark context cleaner has timed waiting.

Thanks


On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <[hidden email]<mailto:[hidden email]>> wrote:
Strangely enough I found an old issue that is the exact same issue as mine
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fplugins%2Fservlet%2Fmobile%23issue%2FSPARK-18343&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280613776&sdata=NdPXb2Qdw6S5Tuqs1eRFfyHQoppi8JKI%2F2hTED%2BxOy8%3D&reserved=0>

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]<mailto:[hidden email]>> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]<mailto:[hidden email]>> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]<mailto:[hidden email]>> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]<mailto:[hidden email]>>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fyoutu.be%2FLnzuMJLZRdU&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280613776&sdata=7EiRropY7J%2Fyl7xmA6tk2H6hHDmw%2B3OMZaAW46vK9SY%3D&reserved=0>
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdzone.com%2Farticles%2Fwhy-you-should-not-use-select-in-sql-query-1&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280623770&sdata=NE1d2MrZEQi6rpRZcBaX7jAEF71CEPejWsuhEs6%2FYrg%3D&reserved=0>
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]<mailto:[hidden email]>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]<mailto:[hidden email]>>; mich.talebzadeh <
> >>> [hidden email]<mailto:[hidden email]>>; liruijing09 <[hidden email]<mailto:[hidden email]>>; user <
> >>> [hidden email]<mailto:[hidden email]>>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.javacodegeeks.com%2F2020%2F04%2Fsimplifying-apm-remove-the-guesswork-from-troubleshooting.html&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280623770&sdata=LB3uy3fnFruTTsJx%2B7kertajJEknhklgCnLOkLw%2FnAI%3D&reserved=0>
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]<mailto:[hidden email]>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]<mailto:[hidden email]>>; liruijing09 <
> >>> [hidden email]<mailto:[hidden email]>>; user <[hidden email]<mailto:[hidden email]>>
> >>> CC: user <[hidden email]<mailto:[hidden email]>>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]<mailto:[hidden email]>>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsn.name%2F&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280633763&sdata=cVdKW1uCKIITq9TyF%2FTJHYCX3hJflpHLVL9h57i2wvw%3D&reserved=0> in ('session pga memory')
> >>>                         sn.name<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsn.name%2F&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280633763&sdata=cVdKW1uCKIITq9TyF%2FTJHYCX3hJflpHLVL9h57i2wvw%3D&reserved=0> in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fprofile%2Fview%3Fid%3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280633763&sdata=VS0zVNx0OVroU%2BegWJsQ8%2FwhXRFPwJSud43ATNCXFMM%3D&reserved=0>
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fprofile%2Fview%3Fid%3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280643758&sdata=j9RDfn%2F3gppIvYlTfddih41ovBz12mQTJ1Jly2vwe9c%3D&reserved=0>>*
> >>>
> >>> http://talebzadehmich.wordpress.com<https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftalebzadehmich.wordpress.com%2F&data=02%7C01%7C%7Caffcf245a3b94e07c40408d7e66c566a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637231225280643758&sdata=OWyqEmpgy0x1gKGVntbEkycLwL6YjQXGCf3tNOcJIMA%3D&reserved=0>
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]<mailto:[hidden email]>> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Waleed Fateem
In reply to this post by Ruijing Li
Are you running this in local mode? If not, are you even sure that the hanging is occurring on the driver's side?

Did you check the Spark UI to see if there is a straggler task or not? If you do have a straggler/hanging task, and in case this is not an application running in local mode then you need to get the Java thread dump of the executor's JVM process. Once you do, you'll want to review the "Executor task launch worker for task XYZ" thread, whee XYZ is some integer value representing the task ID that was launched on that executor. In case you're running this is local mode that thread would be located in the same Java thread dump that you have already collected.


On Tue, Apr 21, 2020 at 9:51 PM Ruijing Li <[hidden email]> wrote:
I apologize, but I cannot share it, even if it is just typical spark libraries. I definitely understand that limits debugging help, but wanted to understand if anyone has encountered a similar issue.

On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim <[hidden email]> wrote:
If there's no third party libraries in the dump then why not share the thread dump? (I mean, the output of jstack)

stack trace would be more helpful to find which thing acquired lock and which other things are waiting for acquiring lock, if we suspect deadlock.

On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <[hidden email]> wrote:
After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160

On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <[hidden email]> wrote:
In thread dump, I do see this 
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor 
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
-  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Could the fact that 160 has the monitor but is not running be causing a deadlock preventing the job from finishing?

I do see my Finalizer and main method are waiting. I don’t see any other threads from 3rd party libraries or my code in the dump. I do see spark context cleaner has timed waiting.

Thanks


On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <[hidden email]> wrote:
Strangely enough I found an old issue that is the exact same issue as mine 

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
Reply | Threaded
Open this post in threaded view
|

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Ruijing Li
Wanted to update everyone on this, thanks for all the responses. I was able to solve this issue after doing a jstack dump - I found out this was the cause 

Lesson learned - I’ll use a safer json parser like json4s, seems like that one should be able to be thread-safe hopefully.

On Fri, Apr 24, 2020 at 4:34 AM Waleed Fateem <[hidden email]> wrote:
Are you running this in local mode? If not, are you even sure that the hanging is occurring on the driver's side?

Did you check the Spark UI to see if there is a straggler task or not? If you do have a straggler/hanging task, and in case this is not an application running in local mode then you need to get the Java thread dump of the executor's JVM process. Once you do, you'll want to review the "Executor task launch worker for task XYZ" thread, whee XYZ is some integer value representing the task ID that was launched on that executor. In case you're running this is local mode that thread would be located in the same Java thread dump that you have already collected.


On Tue, Apr 21, 2020 at 9:51 PM Ruijing Li <[hidden email]> wrote:
I apologize, but I cannot share it, even if it is just typical spark libraries. I definitely understand that limits debugging help, but wanted to understand if anyone has encountered a similar issue.

On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim <[hidden email]> wrote:
If there's no third party libraries in the dump then why not share the thread dump? (I mean, the output of jstack)

stack trace would be more helpful to find which thing acquired lock and which other things are waiting for acquiring lock, if we suspect deadlock.

On Wed, Apr 22, 2020 at 2:38 AM Ruijing Li <[hidden email]> wrote:
After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160

On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li <[hidden email]> wrote:
In thread dump, I do see this 
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor 
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock
-  SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock

Could the fact that 160 has the monitor but is not running be causing a deadlock preventing the job from finishing?

I do see my Finalizer and main method are waiting. I don’t see any other threads from 3rd party libraries or my code in the dump. I do see spark context cleaner has timed waiting.

Thanks


On Tue, Apr 21, 2020 at 9:58 AM Ruijing Li <[hidden email]> wrote:
Strangely enough I found an old issue that is the exact same issue as mine 

However I’m using spark 2.4.4 so the issue should have been solved by now.

Like the user in the jira issue I am using mesos, but I am reading from oracle instead of writing to Cassandra and S3.


On Thu, Apr 16, 2020 at 1:54 AM ZHANG Wei <[hidden email]> wrote:
The Thread dump result table of Spark UI can provide some clues to find out thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

--
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <[hidden email]> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
>
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
>
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <[hidden email]> wrote:
>
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <[hidden email]>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <[hidden email]>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: janethorpe1 <[hidden email]>; mich.talebzadeh <
> >>> [hidden email]>; liruijing09 <[hidden email]>; user <
> >>> [hidden email]>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> [hidden email]
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <[hidden email]>
> >>> To: mich.talebzadeh <[hidden email]>; liruijing09 <
> >>> [hidden email]>; user <[hidden email]>
> >>> CC: user <[hidden email]>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <[hidden email]>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <[hidden email]> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li
--
Cheers,
Ruijing Li