using Log4j to log INFO level messages on workers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

using Log4j to log INFO level messages on workers

Shivani Rao
Hello Spark fans,

I am trying to log messages from my spark application. When the main() function attempts to log, using log.info() it works great, but when I try the same command from the code that probably runs on the worker, I initially got an serialization error. To solve that, I created a new logger in the code that operates on the data, which solved the serialization issue but now there is no output in the console or on the worker node logs. I don't see any application level log messages in the spark logs either. When I use println() instead, I do see console output being  generated.

I tried the following and none of them works

a) pass log4j.properties by using -Dlog4j.properties in my java command line initiation of the spark application
b) setting the properties within the worker by calling log.addAppender(new ConsoleAppender) 

None of them work. 

What am i missing?


Thanks,
Shivani
-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA
Reply | Threaded
Open this post in threaded view
|

Re: using Log4j to log INFO level messages on workers

Alex Gaudio
Hi,


I had the same problem with pyspark.  Here's how I resolved it:

What I've found in python (not sure about scala) is that if the function being serialized was written in the same python module as the main function, then logging fails.  If the serialized function is in a separate module, then logging does not fail.  I just created this gist to demo the situation and (python) solution.  Is there a similar way to do this in scala?



Alex


On Mon, Jun 2, 2014 at 7:18 PM, Shivani Rao <[hidden email]> wrote:
Hello Spark fans,

I am trying to log messages from my spark application. When the main() function attempts to log, using log.info() it works great, but when I try the same command from the code that probably runs on the worker, I initially got an serialization error. To solve that, I created a new logger in the code that operates on the data, which solved the serialization issue but now there is no output in the console or on the worker node logs. I don't see any application level log messages in the spark logs either. When I use println() instead, I do see console output being  generated.

I tried the following and none of them works

a) pass log4j.properties by using -Dlog4j.properties in my java command line initiation of the spark application
b) setting the properties within the worker by calling log.addAppender(new ConsoleAppender) 

None of them work. 

What am i missing?


Thanks,
Shivani
-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Reply | Threaded
Open this post in threaded view
|

Re: using Log4j to log INFO level messages on workers

Shivani Rao
Hello Alex

Thanks for the link. Yes creating a singleton object for logging outside the code that gets executed on the workers definitely works. The problem that i am facing though is related to configuration of the logger. I don't see any log messages in the worker logs of the application. 

a) when i use println, I see the messages from the worker being logged into the main driver of the application
b) when i use logger,  i see logger messages from the main() but not from the workers.

Maybe I should upload a MWE (minimum working example) to demonstrate my point. 

Thanks
Shivani


On Mon, Jun 2, 2014 at 10:33 PM, Alex Gaudio <[hidden email]> wrote:
Hi,


I had the same problem with pyspark.  Here's how I resolved it:

What I've found in python (not sure about scala) is that if the function being serialized was written in the same python module as the main function, then logging fails.  If the serialized function is in a separate module, then logging does not fail.  I just created this gist to demo the situation and (python) solution.  Is there a similar way to do this in scala?



Alex


On Mon, Jun 2, 2014 at 7:18 PM, Shivani Rao <[hidden email]> wrote:
Hello Spark fans,

I am trying to log messages from my spark application. When the main() function attempts to log, using log.info() it works great, but when I try the same command from the code that probably runs on the worker, I initially got an serialization error. To solve that, I created a new logger in the code that operates on the data, which solved the serialization issue but now there is no output in the console or on the worker node logs. I don't see any application level log messages in the spark logs either. When I use println() instead, I do see console output being  generated.

I tried the following and none of them works

a) pass log4j.properties by using -Dlog4j.properties in my java command line initiation of the spark application
b) setting the properties within the worker by calling log.addAppender(new ConsoleAppender) 

None of them work. 

What am i missing?


Thanks,
Shivani
-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA




--
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA