Custom Metric Sink on Executor Always ClassNotFound

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom Metric Sink on Executor Always ClassNotFound

prosp4300
Hi, Spark Users

I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API 
A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar

It works for driver instance, but once enabled for executor instance, following ClassNotFoundException will be throw out. This seems due to MetricSystem is started very early for executor before application jar is loaded.

I wonder is there any way or best practice to add custom sink for executor instance? 

18/12/21 04:58:32 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated
18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
	at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
	at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
	at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
	at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
	... 4 more
stdout0,*container_e81_1541584460930_3814_01_000005�
	spark.log36118/12/21 04:58:00 ERROR org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated


 

Reply | Threaded
Open this post in threaded view
|

Re: Custom Metric Sink on Executor Always ClassNotFound

Marcelo Vanzin-2
First, it's really weird to use "org.apache.spark" for a class that is
not in Spark.

For executors, the jar file of the sink needs to be in the system
classpath; the application jar is not in the system classpath, so that
does not work. There are different ways for you to get it there, most
of them manual (YARN is, I think, the only RM supported in Spark where
the application itself can do it).

On Thu, Dec 20, 2018 at 1:48 PM prosp4300 <[hidden email]> wrote:

>
> Hi, Spark Users
>
> I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API
> A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar
>
> It works for driver instance, but once enabled for executor instance, following ClassNotFoundException will be throw out. This seems due to MetricSystem is started very early for executor before application jar is loaded.
>
> I wonder is there any way or best practice to add custom sink for executor instance?
>
> 18/12/21 04:58:32 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated
> 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933)
> at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
> at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
> at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
> at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
> at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
> at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
> at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
> at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223)
> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> ... 4 more
> stdout0,*container_e81_1541584460930_3814_01_000005�
> spark.log36118/12/21 04:58:00 ERROR org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated
>
>
>
>



--
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re:Re: Custom Metric Sink on Executor Always ClassNotFound

prosp4300


Thanks a lot for the explanation
Spark declare the Sink trait with package private, that's why the package looks weird, the metric system seems not intent to be extended
package org.apache.spark.metrics.sink
private[spark] trait Sink
Make the custom sink class available on every executor system classpath is what an application developer want to avoid, because the sink only required for specific application, and it can be difficult to maintain.
If it's possible to get MetricSystem at executor level and register the custom sink there, then the problem can be resolved in a better way, not sure how to achieve this.
Thanks a lot







At 2018-12-21 05:53:31, "Marcelo Vanzin" <[hidden email]> wrote: >First, it's really weird to use "org.apache.spark" for a class that is >not in Spark. > >For executors, the jar file of the sink needs to be in the system >classpath; the application jar is not in the system classpath, so that >does not work. There are different ways for you to get it there, most >of them manual (YARN is, I think, the only RM supported in Spark where >the application itself can do it). > >On Thu, Dec 20, 2018 at 1:48 PM prosp4300 <[hidden email]> wrote: >> >> Hi, Spark Users >> >> I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API >> A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar >> >> It works for driver instance, but once enabled for executor instance, following ClassNotFoundException will be throw out. This seems due to MetricSystem is started very early for executor before application jar is loaded. >> >> I wonder is there any way or best practice to add custom sink for executor instance? >> >> 18/12/21 04:58:32 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated >> 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink >> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException >> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933) >> at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) >> at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) >> at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284) >> at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) >> Caused by: java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:348) >> at org.apache.spark.util.Utils$.classForName(Utils.scala:230) >> at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198) >> at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194) >> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) >> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) >> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) >> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) >> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) >> at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194) >> at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102) >> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) >> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201) >> at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223) >> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) >> at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) >> ... 4 more >> stdout0,*container_e81_1541584460930_3814_01_000005� >> spark.log36118/12/21 04:58:00 ERROR org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated >> >> >> >> > > > >-- >Marcelo


 

Reply | Threaded
Open this post in threaded view
|

Re: Custom Metric Sink on Executor Always ClassNotFound

Thakrar, Jayesh
In reply to this post by Marcelo Vanzin-2
Just curious - is this HttpSink your own custom sink or Dropwizard configuration?

If your own custom code, I would suggest looking/trying out the Dropwizard.
See
http://spark.apache.org/docs/latest/monitoring.html#metrics
https://metrics.dropwizard.io/4.0.0/

Also, from what I know, the metrics from the tasks/executors are sent as accumulator values to the driver and the driver makes it available to the desired sink.

Furthermore, even without a custom HttpSink, there's already a builtin REST API available that provides you metrics
See http://spark.apache.org/docs/latest/monitoring.html#rest-api

While you can surely create your own custom sink (code), I would say try out custom configuration first as it will make Spark upgrades easy.

On 12/20/18, 3:53 PM, "Marcelo Vanzin" <[hidden email]> wrote:

    First, it's really weird to use "org.apache.spark" for a class that is
    not in Spark.
   
    For executors, the jar file of the sink needs to be in the system
    classpath; the application jar is not in the system classpath, so that
    does not work. There are different ways for you to get it there, most
    of them manual (YARN is, I think, the only RM supported in Spark where
    the application itself can do it).
   
    On Thu, Dec 20, 2018 at 1:48 PM prosp4300 <[hidden email]> wrote:
    >
    > Hi, Spark Users
    >
    > I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API
    > A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar
    >
    > It works for driver instance, but once enabled for executor instance, following ClassNotFoundException will be throw out. This seems due to MetricSystem is started very early for executor before application jar is loaded.
    >
    > I wonder is there any way or best practice to add custom sink for executor instance?
    >
    > 18/12/21 04:58:32 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated
    > 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink
    > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933)
    > at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
    > at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
    > at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
    > at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
    > Caused by: java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink
    > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    > at java.lang.Class.forName0(Native Method)
    > at java.lang.Class.forName(Class.java:348)
    > at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
    > at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
    > at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
    > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    > at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    > at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
    > at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
    > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
    > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
    > at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223)
    > at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
    > at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
    > at java.security.AccessController.doPrivileged(Native Method)
    > at javax.security.auth.Subject.doAs(Subject.java:422)
    > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
    > ... 4 more
    > stdout0,*container_e81_1541584460930_3814_01_000005�
    > spark.log36118/12/21 04:58:00 ERROR org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated
    >
    >
    >
    >
   
   
   
    --
    Marcelo
   
   


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]