Equivalent to Hadoop's standard counters

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Equivalent to Hadoop's standard counters

dsiegmann
When I run Hadoop jobs, the job tracker shows a bunch of standard counters (in addition to any others I create). For example, in Hadoop I can get the count of "Map input records" without needing to write any code explicitly.

Is there any equivalent to this with Spark's accumulators?

--
Daniel Siegmann, Software Developer
Velos


440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: [hidden email] W: www.velos.io
Reply | Threaded
Open this post in threaded view
|

Re: Equivalent to Hadoop's standard counters

Matei Zaharia
Administrator
Hi Daniel,

There isn’t any built-in equivalent of this right now, though some metrics, such as shuffle bytes, are tracked on the application UI (http://driver:4040). In this particular case I guess you could add a map that updates an accumulator, but it’s kind of awkward. Feel free to open a JIRA about this and maybe someone will implement it, since it sounds like a useful metric.

Matei

On Feb 7, 2014, at 11:55 AM, Daniel Siegmann <[hidden email]> wrote:

When I run Hadoop jobs, the job tracker shows a bunch of standard counters (in addition to any others I create). For example, in Hadoop I can get the count of "Map input records" without needing to write any code explicitly.

Is there any equivalent to this with Spark's accumulators?

--
Daniel Siegmann, Software Developer
Velos


440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: [hidden email] W: www.velos.io

Reply | Threaded
Open this post in threaded view
|

Re: Equivalent to Hadoop's standard counters

dsiegmann
Thanks for the response Matei. Guess I'll have to give it some thought.

~Daniel

On Sat, Feb 8, 2014 at 6:38 PM, Matei Zaharia <[hidden email]> wrote:
Hi Daniel,

There isn’t any built-in equivalent of this right now, though some metrics, such as shuffle bytes, are tracked on the application UI (http://driver:4040). In this particular case I guess you could add a map that updates an accumulator, but it’s kind of awkward. Feel free to open a JIRA about this and maybe someone will implement it, since it sounds like a useful metric.

Matei

On Feb 7, 2014, at 11:55 AM, Daniel Siegmann <[hidden email]> wrote:

When I run Hadoop jobs, the job tracker shows a bunch of standard counters (in addition to any others I create). For example, in Hadoop I can get the count of "Map input records" without needing to write any code explicitly.

Is there any equivalent to this with Spark's accumulators?

--
Daniel Siegmann, Software Developer
Velos


440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: [hidden email] W: www.velos.io




--
Daniel Siegmann, Software Developer
Velos


440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: [hidden email] W: www.velos.io