spark 1.0.0 on yarn

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

spark 1.0.0 on yarn

Xu (Simon) Chen
Hi all,

I tried a couple ways, but couldn't get it to work..

The following seems to be what the online document (http://spark.apache.org/docs/latest/running-on-yarn.html) is suggesting:
SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client

Help info of spark-shell seems to be suggesting "--master yarn --deploy-mode cluster".

But either way, I am seeing the following messages:
14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

My guess is that spark-shell is trying to talk to resource manager to setup spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from though. I am running CDH5 with two resource managers in HA mode. Their IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up. 

Any ideas? Thanks.
-Simon
Reply | Threaded
Open this post in threaded view
|

Re: spark 1.0.0 on yarn

Patrick Wendell
I would agree with your guess, it looks like the yarn library isn't
correctly finding your yarn-site.xml file. If you look in
yarn-site.xml do you definitely the resource manager
address/addresses?

Also, you can try running this command with
SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
set-up correctly.

- Patrick

On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <[hidden email]> wrote:

> Hi all,
>
> I tried a couple ways, but couldn't get it to work..
>
> The following seems to be what the online document
> (http://spark.apache.org/docs/latest/running-on-yarn.html) is suggesting:
> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
> YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>
> Help info of spark-shell seems to be suggesting "--master yarn --deploy-mode
> cluster".
>
> But either way, I am seeing the following messages:
> 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at
> /0.0.0.0:8032
> 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
> 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
> 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>
> My guess is that spark-shell is trying to talk to resource manager to setup
> spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
> though. I am running CDH5 with two resource managers in HA mode. Their
> IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
> HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>
> Any ideas? Thanks.
> -Simon
Reply | Threaded
Open this post in threaded view
|

Re: spark 1.0.0 on yarn

Xu (Simon) Chen
Note that everything works fine in spark 0.9, which is packaged in CDH5: I can launch a spark-shell and interact with workers spawned on my yarn cluster.

So in my /opt/hadoop/conf/yarn-site.xml, I have:
    ...
    <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>controller-1.mycomp.com:23140</value>
    </property>
    ...
    <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>controller-2.mycomp.com:23140</value>
    </property>
    ...

And the other usual stuff.

So spark 1.0 is launched like this:
Spark Command: java -cp ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client --class org.apache.spark.repl.Main

I do see "/opt/hadoop/conf" included, but not sure it's the right place.

Thanks..
-Simon



On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <[hidden email]> wrote:
I would agree with your guess, it looks like the yarn library isn't
correctly finding your yarn-site.xml file. If you look in
yarn-site.xml do you definitely the resource manager
address/addresses?

Also, you can try running this command with
SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
set-up correctly.

- Patrick

On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <[hidden email]> wrote:
> Hi all,
>
> I tried a couple ways, but couldn't get it to work..
>
> The following seems to be what the online document
> (http://spark.apache.org/docs/latest/running-on-yarn.html) is suggesting:
> SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
> YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>
> Help info of spark-shell seems to be suggesting "--master yarn --deploy-mode
> cluster".
>
> But either way, I am seeing the following messages:
> 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at
> /0.0.0.0:8032
> 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
> 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
> 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
> 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>
> My guess is that spark-shell is trying to talk to resource manager to setup
> spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
> though. I am running CDH5 with two resource managers in HA mode. Their
> IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
> HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>
> Any ideas? Thanks.
> -Simon

Reply | Threaded
Open this post in threaded view
|

Re: spark 1.0.0 on yarn

Patrick Wendell
As a debugging step, does it work if you use a single resource manager
with the key "yarn.resourcemanager.address" instead of using two named
resource managers? I wonder if somehow the YARN client can't detect
this multi-master set-up.

On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <[hidden email]> wrote:

> Note that everything works fine in spark 0.9, which is packaged in CDH5: I
> can launch a spark-shell and interact with workers spawned on my yarn
> cluster.
>
> So in my /opt/hadoop/conf/yarn-site.xml, I have:
>     ...
>     <property>
>         <name>yarn.resourcemanager.address.rm1</name>
>         <value>controller-1.mycomp.com:23140</value>
>     </property>
>     ...
>     <property>
>         <name>yarn.resourcemanager.address.rm2</name>
>         <value>controller-2.mycomp.com:23140</value>
>     </property>
>     ...
>
> And the other usual stuff.
>
> So spark 1.0 is launched like this:
> Spark Command: java -cp
> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
> org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client --class
> org.apache.spark.repl.Main
>
> I do see "/opt/hadoop/conf" included, but not sure it's the right place.
>
> Thanks..
> -Simon
>
>
>
> On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <[hidden email]> wrote:
>>
>> I would agree with your guess, it looks like the yarn library isn't
>> correctly finding your yarn-site.xml file. If you look in
>> yarn-site.xml do you definitely the resource manager
>> address/addresses?
>>
>> Also, you can try running this command with
>> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>> set-up correctly.
>>
>> - Patrick
>>
>> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <[hidden email]>
>> wrote:
>> > Hi all,
>> >
>> > I tried a couple ways, but couldn't get it to work..
>> >
>> > The following seems to be what the online document
>> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>> > suggesting:
>> >
>> > SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>> >
>> > Help info of spark-shell seems to be suggesting "--master yarn
>> > --deploy-mode
>> > cluster".
>> >
>> > But either way, I am seeing the following messages:
>> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at
>> > /0.0.0.0:8032
>> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>> >
>> > My guess is that spark-shell is trying to talk to resource manager to
>> > setup
>> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
>> > though. I am running CDH5 with two resource managers in HA mode. Their
>> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>> >
>> > Any ideas? Thanks.
>> > -Simon
>
>
Reply | Threaded
Open this post in threaded view
|

Re: spark 1.0.0 on yarn

Xu (Simon) Chen
That helped a bit... Now I have a different failure: the start up process is stuck in an infinite loop outputting the following message:

14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
appMasterRpcPort: -1
appStartTime: 1401672868277
yarnAppState: ACCEPTED

I am using the hadoop 2 prebuild package. Probably it doesn't have the latest yarn client. 

-Simon




On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <[hidden email]> wrote:
As a debugging step, does it work if you use a single resource manager
with the key "yarn.resourcemanager.address" instead of using two named
resource managers? I wonder if somehow the YARN client can't detect
this multi-master set-up.

On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <[hidden email]> wrote:
> Note that everything works fine in spark 0.9, which is packaged in CDH5: I
> can launch a spark-shell and interact with workers spawned on my yarn
> cluster.
>
> So in my /opt/hadoop/conf/yarn-site.xml, I have:
>     ...
>     <property>
>         <name>yarn.resourcemanager.address.rm1</name>
>         <value>controller-1.mycomp.com:23140</value>
>     </property>
>     ...
>     <property>
>         <name>yarn.resourcemanager.address.rm2</name>
>         <value>controller-2.mycomp.com:23140</value>
>     </property>
>     ...
>
> And the other usual stuff.
>
> So spark 1.0 is launched like this:
> Spark Command: java -cp
> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
> org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client --class
> org.apache.spark.repl.Main
>
> I do see "/opt/hadoop/conf" included, but not sure it's the right place.
>
> Thanks..
> -Simon
>
>
>
> On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <[hidden email]> wrote:
>>
>> I would agree with your guess, it looks like the yarn library isn't
>> correctly finding your yarn-site.xml file. If you look in
>> yarn-site.xml do you definitely the resource manager
>> address/addresses?
>>
>> Also, you can try running this command with
>> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>> set-up correctly.
>>
>> - Patrick
>>
>> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <[hidden email]>
>> wrote:
>> > Hi all,
>> >
>> > I tried a couple ways, but couldn't get it to work..
>> >
>> > The following seems to be what the online document
>> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>> > suggesting:
>> >
>> > SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>> >
>> > Help info of spark-shell seems to be suggesting "--master yarn
>> > --deploy-mode
>> > cluster".
>> >
>> > But either way, I am seeing the following messages:
>> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at
>> > /0.0.0.0:8032
>> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>> >
>> > My guess is that spark-shell is trying to talk to resource manager to
>> > setup
>> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
>> > though. I am running CDH5 with two resource managers in HA mode. Their
>> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>> >
>> > Any ideas? Thanks.
>> > -Simon
>
>

Reply | Threaded
Open this post in threaded view
|

Re: spark 1.0.0 on yarn

Xu (Simon) Chen
OK, rebuilding the assembly jar file with cdh5 works now...
Thanks..

-Simon


On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <[hidden email]> wrote:
That helped a bit... Now I have a different failure: the start up process is stuck in an infinite loop outputting the following message:

14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application report from ASM: 
appMasterRpcPort: -1
appStartTime: 1401672868277
yarnAppState: ACCEPTED

I am using the hadoop 2 prebuild package. Probably it doesn't have the latest yarn client. 

-Simon




On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <[hidden email]> wrote:
As a debugging step, does it work if you use a single resource manager
with the key "yarn.resourcemanager.address" instead of using two named
resource managers? I wonder if somehow the YARN client can't detect
this multi-master set-up.

On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <[hidden email]> wrote:
> Note that everything works fine in spark 0.9, which is packaged in CDH5: I
> can launch a spark-shell and interact with workers spawned on my yarn
> cluster.
>
> So in my /opt/hadoop/conf/yarn-site.xml, I have:
>     ...
>     <property>
>         <name>yarn.resourcemanager.address.rm1</name>
>         <value>controller-1.mycomp.com:23140</value>
>     </property>
>     ...
>     <property>
>         <name>yarn.resourcemanager.address.rm2</name>
>         <value>controller-2.mycomp.com:23140</value>
>     </property>
>     ...
>
> And the other usual stuff.
>
> So spark 1.0 is launched like this:
> Spark Command: java -cp
> ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
> org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client --class
> org.apache.spark.repl.Main
>
> I do see "/opt/hadoop/conf" included, but not sure it's the right place.
>
> Thanks..
> -Simon
>
>
>
> On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <[hidden email]> wrote:
>>
>> I would agree with your guess, it looks like the yarn library isn't
>> correctly finding your yarn-site.xml file. If you look in
>> yarn-site.xml do you definitely the resource manager
>> address/addresses?
>>
>> Also, you can try running this command with
>> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>> set-up correctly.
>>
>> - Patrick
>>
>> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <[hidden email]>
>> wrote:
>> > Hi all,
>> >
>> > I tried a couple ways, but couldn't get it to work..
>> >
>> > The following seems to be what the online document
>> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>> > suggesting:
>> >
>> > SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>> >
>> > Help info of spark-shell seems to be suggesting "--master yarn
>> > --deploy-mode
>> > cluster".
>> >
>> > But either way, I am seeing the following messages:
>> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager at
>> > /0.0.0.0:8032
>> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>> >
>> > My guess is that spark-shell is trying to talk to resource manager to
>> > setup
>> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came from
>> > though. I am running CDH5 with two resource managers in HA mode. Their
>> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>> >
>> > Any ideas? Thanks.
>> > -Simon
>
>


Reply | Threaded
Open this post in threaded view
|

Re: spark 1.0.0 on yarn

Patrick Wendell
Okay I'm guessing that our upstreaming "Hadoop2" package isn't new
enough to work with CDH5. We should probably clarify this in our
downloads. Thanks for reporting this. What was the exact string you
used when building? Also which CDH-5 version are you building against?

On Mon, Jun 2, 2014 at 8:11 AM, Xu (Simon) Chen <[hidden email]> wrote:

> OK, rebuilding the assembly jar file with cdh5 works now...
> Thanks..
>
> -Simon
>
>
> On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <[hidden email]> wrote:
>>
>> That helped a bit... Now I have a different failure: the start up process
>> is stuck in an infinite loop outputting the following message:
>>
>> 14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application
>> report from ASM:
>> appMasterRpcPort: -1
>> appStartTime: 1401672868277
>> yarnAppState: ACCEPTED
>>
>> I am using the hadoop 2 prebuild package. Probably it doesn't have the
>> latest yarn client.
>>
>> -Simon
>>
>>
>>
>>
>> On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <[hidden email]>
>> wrote:
>>>
>>> As a debugging step, does it work if you use a single resource manager
>>> with the key "yarn.resourcemanager.address" instead of using two named
>>> resource managers? I wonder if somehow the YARN client can't detect
>>> this multi-master set-up.
>>>
>>> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <[hidden email]>
>>> wrote:
>>> > Note that everything works fine in spark 0.9, which is packaged in
>>> > CDH5: I
>>> > can launch a spark-shell and interact with workers spawned on my yarn
>>> > cluster.
>>> >
>>> > So in my /opt/hadoop/conf/yarn-site.xml, I have:
>>> >     ...
>>> >     <property>
>>> >         <name>yarn.resourcemanager.address.rm1</name>
>>> >         <value>controller-1.mycomp.com:23140</value>
>>> >     </property>
>>> >     ...
>>> >     <property>
>>> >         <name>yarn.resourcemanager.address.rm2</name>
>>> >         <value>controller-2.mycomp.com:23140</value>
>>> >     </property>
>>> >     ...
>>> >
>>> > And the other usual stuff.
>>> >
>>> > So spark 1.0 is launched like this:
>>> > Spark Command: java -cp
>>> >
>>> > ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
>>> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>>> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client
>>> > --class
>>> > org.apache.spark.repl.Main
>>> >
>>> > I do see "/opt/hadoop/conf" included, but not sure it's the right
>>> > place.
>>> >
>>> > Thanks..
>>> > -Simon
>>> >
>>> >
>>> >
>>> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <[hidden email]>
>>> > wrote:
>>> >>
>>> >> I would agree with your guess, it looks like the yarn library isn't
>>> >> correctly finding your yarn-site.xml file. If you look in
>>> >> yarn-site.xml do you definitely the resource manager
>>> >> address/addresses?
>>> >>
>>> >> Also, you can try running this command with
>>> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>>> >> set-up correctly.
>>> >>
>>> >> - Patrick
>>> >>
>>> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <[hidden email]>
>>> >> wrote:
>>> >> > Hi all,
>>> >> >
>>> >> > I tried a couple ways, but couldn't get it to work..
>>> >> >
>>> >> > The following seems to be what the online document
>>> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>>> >> > suggesting:
>>> >> >
>>> >> >
>>> >> > SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>>> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>>> >> >
>>> >> > Help info of spark-shell seems to be suggesting "--master yarn
>>> >> > --deploy-mode
>>> >> > cluster".
>>> >> >
>>> >> > But either way, I am seeing the following messages:
>>> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager
>>> >> > at
>>> >> > /0.0.0.0:8032
>>> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>>> >> > SECONDS)
>>> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>>> >> > SECONDS)
>>> >> >
>>> >> > My guess is that spark-shell is trying to talk to resource manager
>>> >> > to
>>> >> > setup
>>> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came
>>> >> > from
>>> >> > though. I am running CDH5 with two resource managers in HA mode.
>>> >> > Their
>>> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>>> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>>> >> >
>>> >> > Any ideas? Thanks.
>>> >> > -Simon
>>> >
>>> >
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: spark 1.0.0 on yarn

Xu (Simon) Chen
I built my new package like this:
"mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.0.1 -DskipTests clean package"

Spark-shell is working now, but pyspark is still broken. I reported the problem on a different thread. Please take a look if you can... Desperately need ideas..

Thanks.
-Simon


On Mon, Jun 2, 2014 at 2:47 PM, Patrick Wendell <[hidden email]> wrote:
Okay I'm guessing that our upstreaming "Hadoop2" package isn't new
enough to work with CDH5. We should probably clarify this in our
downloads. Thanks for reporting this. What was the exact string you
used when building? Also which CDH-5 version are you building against?

On Mon, Jun 2, 2014 at 8:11 AM, Xu (Simon) Chen <[hidden email]> wrote:
> OK, rebuilding the assembly jar file with cdh5 works now...
> Thanks..
>
> -Simon
>
>
> On Sun, Jun 1, 2014 at 9:37 PM, Xu (Simon) Chen <[hidden email]> wrote:
>>
>> That helped a bit... Now I have a different failure: the start up process
>> is stuck in an infinite loop outputting the following message:
>>
>> 14/06/02 01:34:56 INFO cluster.YarnClientSchedulerBackend: Application
>> report from ASM:
>> appMasterRpcPort: -1
>> appStartTime: 1401672868277
>> yarnAppState: ACCEPTED
>>
>> I am using the hadoop 2 prebuild package. Probably it doesn't have the
>> latest yarn client.
>>
>> -Simon
>>
>>
>>
>>
>> On Sun, Jun 1, 2014 at 9:03 PM, Patrick Wendell <[hidden email]>
>> wrote:
>>>
>>> As a debugging step, does it work if you use a single resource manager
>>> with the key "yarn.resourcemanager.address" instead of using two named
>>> resource managers? I wonder if somehow the YARN client can't detect
>>> this multi-master set-up.
>>>
>>> On Sun, Jun 1, 2014 at 12:49 PM, Xu (Simon) Chen <[hidden email]>
>>> wrote:
>>> > Note that everything works fine in spark 0.9, which is packaged in
>>> > CDH5: I
>>> > can launch a spark-shell and interact with workers spawned on my yarn
>>> > cluster.
>>> >
>>> > So in my /opt/hadoop/conf/yarn-site.xml, I have:
>>> >     ...
>>> >     <property>
>>> >         <name>yarn.resourcemanager.address.rm1</name>
>>> >         <value>controller-1.mycomp.com:23140</value>
>>> >     </property>
>>> >     ...
>>> >     <property>
>>> >         <name>yarn.resourcemanager.address.rm2</name>
>>> >         <value>controller-2.mycomp.com:23140</value>
>>> >     </property>
>>> >     ...
>>> >
>>> > And the other usual stuff.
>>> >
>>> > So spark 1.0 is launched like this:
>>> > Spark Command: java -cp
>>> >
>>> > ::/home/chenxu/spark-1.0.0-bin-hadoop2/conf:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/home/chenxu/spark-1.0.0-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar:/opt/hadoop/conf
>>> > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>>> > org.apache.spark.deploy.SparkSubmit spark-shell --master yarn-client
>>> > --class
>>> > org.apache.spark.repl.Main
>>> >
>>> > I do see "/opt/hadoop/conf" included, but not sure it's the right
>>> > place.
>>> >
>>> > Thanks..
>>> > -Simon
>>> >
>>> >
>>> >
>>> > On Sun, Jun 1, 2014 at 1:57 PM, Patrick Wendell <[hidden email]>
>>> > wrote:
>>> >>
>>> >> I would agree with your guess, it looks like the yarn library isn't
>>> >> correctly finding your yarn-site.xml file. If you look in
>>> >> yarn-site.xml do you definitely the resource manager
>>> >> address/addresses?
>>> >>
>>> >> Also, you can try running this command with
>>> >> SPARK_PRINT_LAUNCH_COMMAND=1 to make sure the classpath is being
>>> >> set-up correctly.
>>> >>
>>> >> - Patrick
>>> >>
>>> >> On Sat, May 31, 2014 at 5:51 PM, Xu (Simon) Chen <[hidden email]>
>>> >> wrote:
>>> >> > Hi all,
>>> >> >
>>> >> > I tried a couple ways, but couldn't get it to work..
>>> >> >
>>> >> > The following seems to be what the online document
>>> >> > (http://spark.apache.org/docs/latest/running-on-yarn.html) is
>>> >> > suggesting:
>>> >> >
>>> >> >
>>> >> > SPARK_JAR=hdfs://test/user/spark/share/lib/spark-assembly-1.0.0-hadoop2.2.0.jar
>>> >> > YARN_CONF_DIR=/opt/hadoop/conf ./spark-shell --master yarn-client
>>> >> >
>>> >> > Help info of spark-shell seems to be suggesting "--master yarn
>>> >> > --deploy-mode
>>> >> > cluster".
>>> >> >
>>> >> > But either way, I am seeing the following messages:
>>> >> > 14/06/01 00:33:20 INFO client.RMProxy: Connecting to ResourceManager
>>> >> > at
>>> >> > /0.0.0.0:8032
>>> >> > 14/06/01 00:33:21 INFO ipc.Client: Retrying connect to server:
>>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is
>>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>>> >> > SECONDS)
>>> >> > 14/06/01 00:33:22 INFO ipc.Client: Retrying connect to server:
>>> >> > 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is
>>> >> > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1
>>> >> > SECONDS)
>>> >> >
>>> >> > My guess is that spark-shell is trying to talk to resource manager
>>> >> > to
>>> >> > setup
>>> >> > spark master/worker nodes - I am not sure where 0.0.0.0:8032 came
>>> >> > from
>>> >> > though. I am running CDH5 with two resource managers in HA mode.
>>> >> > Their
>>> >> > IP/port should be in /opt/hadoop/conf/yarn-site.xml. I tried both
>>> >> > HADOOP_CONF_DIR and YARN_CONF_DIR, but that info isn't picked up.
>>> >> >
>>> >> > Any ideas? Thanks.
>>> >> > -Simon
>>> >
>>> >
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: spark 1.0.0 on yarn

martinxu
This post has NOT been accepted by the mailing list yet.
In reply to this post by Xu (Simon) Chen
I also encountered this problem.

what's  your finally solution?