Debugging a local spark executor in pycharm

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Debugging a local spark executor in pycharm

Vitaliy Pisarev

I want to step through the work of a spark executor running locally on my machine, from Pycharm.

I am running explicit functionality, in the form of dataset.foreachPartition(f) and I want to see what is going on inside f.

Is there a straightforward way to do it or do I need to resort to remote debugging?

p.s

Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Debugging a local spark executor in pycharm

Michael Mansour

Vitaliy,

 

From what I understand, this is not possible to do.  However, let me share my workaround with you.

 

Assuming you have your debugger up and running on PyCharm, set a breakpoint at this line, Take|collect|sample  your data (could also consider doing a glom if its critical the data remain partitioned, then the take/collect), and pass it into the function directly (direct python, no spark).  Use the debugger to step through there on that small sample.

 

Alternatively, you can open up the PyCharm execution module.  In the execution module, do the same as above with the RDD, and pass it into the function.  This alleviates the need to write debugging code etc.  I find this model useful and a bit more fast, but it does not offer the step-through capability.

 

Best of luck!

M

-- 

Michael Mansour

Data Scientist

Symantec CASB

From: Vitaliy Pisarev <[hidden email]>
Date: Sunday, March 11, 2018 at 8:46 AM
To: "[hidden email]" <[hidden email]>
Subject: [EXT] Debugging a local spark executor in pycharm

 

I want to step through the work of a spark executor running locally on my machine, from Pycharm.

I am running explicit functionality, in the form of dataset.foreachPartition(f) and I want to see what is going on inside f.

Is there a straightforward way to do it or do I need to resort to remote debugging?

p.s

 

Posted this on SO as well. 

Reply | Threaded
Open this post in threaded view
|

Re: [EXT] Debugging a local spark executor in pycharm

Vitaliy Pisarev
Actually, I stumbled on this SO page. While it is not straightforward, it is a fairly simple solution. 

In short:

  • I made sure there is only one executing task at a time by calling repartition(1) - this made it easy to locate the one and only spark deamon
  • I set a BP wherever I needed to
  • In order to "catch" the BP, I set a print out and a time.sleep(15) right before it. The print out gives me a notice that the daemon is up and running
    and the sleep gives me time to push a few buttons so I can attache to the procesa
It worked fairly well, and I was able to debug the executor. I did notice two strange things: sometimes I got a strange error and the debugger didnt actually attach. It was not deterministic. 

Other times I noticed a big gap between the point I got the notification and attached to the process until the execution was resumed and I could actually step through (by big gap I mean a gap that is considerably bigger than the sleep period, usually about 1 minute).

Not perfect but worked most of the time.

 

On Wed, Mar 14, 2018 at 12:07 AM, Michael Mansour <[hidden email]> wrote:

Vitaliy,

 

From what I understand, this is not possible to do.  However, let me share my workaround with you.

 

Assuming you have your debugger up and running on PyCharm, set a breakpoint at this line, Take|collect|sample  your data (could also consider doing a glom if its critical the data remain partitioned, then the take/collect), and pass it into the function directly (direct python, no spark).  Use the debugger to step through there on that small sample.

 

Alternatively, you can open up the PyCharm execution module.  In the execution module, do the same as above with the RDD, and pass it into the function.  This alleviates the need to write debugging code etc.  I find this model useful and a bit more fast, but it does not offer the step-through capability.

 

Best of luck!

M

-- 

Michael Mansour

Data Scientist

Symantec CASB

From: Vitaliy Pisarev <[hidden email]>
Date: Sunday, March 11, 2018 at 8:46 AM
To: "[hidden email]" <[hidden email]>
Subject: [EXT] Debugging a local spark executor in pycharm

 

I want to step through the work of a spark executor running locally on my machine, from Pycharm.

I am running explicit functionality, in the form of dataset.foreachPartition(f) and I want to see what is going on inside f.

Is there a straightforward way to do it or do I need to resort to remote debugging?

p.s

 

Posted this on SO as well.