I'm forwarding this email along which contains a question from a Spark user Adrien (CC'd) who can't successfully get any emails through to the Apache mailing lists.
Please reply-all when responding to include Adrien. See below for his question.
---------- Forwarded message ---------- From: "Adrien Legrand" <[hidden email]> Date: May 22, 2014 1:06 AM Subject: Re: Post validation
To: "Andy Konwinski" <[hidden email]> Cc:
Thanks, it would be nice ! Here is the question:
Spark Streaming: Flume stream not found
Message: Hi everyone,
I am currently trying to process a flume (avro) stream with spark streaming on a yarn cluster. Everything is fine when I try to launch my code locally. To do so, I use the following args :
master = "local" host, port = the machine I'm sending the stream to with flume (I triple checked the concordance between flume / spark).
val ssc = new StreamingContext(master, "FlumeEventCount", batchInterval, System.getenv("SPARK_HOME"), StreamingContext.jarOfClass(this.getClass))
val stream = FlumeUtils.createStream(ssc, host, port.toInt)
But when I try to launch the same job parallelized (replacing "local" by "yarn-standalone"), the jar is launched (I can see some print I used to debug the code) but it shows the expected output (from the data processing) only 1 time out of 5 or 10. Here is the complete command line:
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.yarn.Client --jar /home/www/loganalysis-1.0-SNAPSHOT-jar-with-dependencies.jar --class com.loganalysis.Computation --args yarn-standalone --args receiver.priv.fr --args 9999 --num-workers 6 --master-memory 4g --worker-memory 2g --worker-cores 1 For no apparent reasons, sometimes the processing is done. The first guess was that, since I use a big jars with all dependencies in it, other machines don't have those dependencies and thus can't do the processing. That's why I tried to add my executed jar with -addJars but the result was the same.
One other idea occurred to me. You can try subscribing to the nabble mailing lists and send your emails to those. They will relay emails to the apache lists. Not sure if it'll help or not.
I'm really sorry to hear you're having problems with the lists.
If you forward me your questions I can relay them to the mailing list for you.
On Wed, May 21, 2014 at 1:56 AM, Adrien Legrand <[hidden email]> wrote:
Hello Andy, Thank you for your answer. I used the mailing list's form to submit both of the subjects. Is it still possible that the problem you've talked about may be the cause ? I am also directly checking my mails in gmail, I don't use any mailing client like outlook.
The only thing I'm thinking about the problem here is my company's proxy, but it seems really unlikely...
I posted 2 different topics (the important one is "Spark streaming: flume stream not found") on the spark user mailing list, but none of them was accepted.
I registered myself before posting each subject and I think I didn't made any mistakes.
After posting the last (and important) subject, I received the following mail:
Hi. This is the qmail-send program at apache.org <http://apache.org/> .
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.