Quick start example (README.md count) doesn't work

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Quick start example (README.md count) doesn't work

mohitvora
I installed Spark 0.9.0 and tried running the Quick Start example at https://spark.incubator.apache.org/docs/latest/quick-start.html. I got the following stack trace:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://ec2-54-211-74-252.compute-1.amazonaws.com:9000/user/root/README.md
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
        ...

I tried accessing hadoop from the command line on the master instance:
root@ip-10-245-26-121 ~]$ /root/persistent-hdfs/bin/hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.

14/02/17 22:58:41 INFO ipc.Client: Retrying connect to server: ec2-54-211-74-252.compute-1.amazonaws.com/10.116.229.108:9010. Already tried 0 time(s).
14/02/17 22:58:42 INFO ipc.Client: Retrying connect to server: ec2-54-211-74-252.compute-1.amazonaws.com/10.116.229.108:9010. Already tried 1 time(s).
...

I thought maybe the security groups setup by spark_ec2.py were bad and I opened up 9000-9010 for both master and slave but I still get the same errors.

Next, I tried actually starting the hadoop daemons by:
root@ip-10-245-26-121 ~]$ /root/persistent-hdfs/bin/start-all.sh

That seemed to run fine, but still same errors.

Please help.

Thanks,
Mohit

 
Reply | Threaded
Open this post in threaded view
|

Re: Quick start example (README.md count) doesn't work

Mayur Rustagi
Your name node is down. Did you format your name node ? 
Check using jps if you see name node is up on the master, also if formatting doesn't solve it then you can look at name node logs 

On Monday, February 17, 2014, mohitvora <[hidden email]> wrote:
I installed Spark 0.9.0 and tried running the Quick Start example at
https://spark.incubator.apache.org/docs/latest/quick-start.html. I got the
following stack trace:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
hdfs://ec2-54-211-74-252.compute-1.amazonaws.com:9000/user/root/README.md
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)
        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
        ...

I tried accessing hadoop from the command line on the master instance:
root@ip-10-245-26-121 ~]$ /root/persistent-hdfs/bin/hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.

14/02/17 22:58:41 INFO ipc.Client: Retrying connect to server:
ec2-54-211-74-252.compute-1.amazonaws.com/10.116.229.108:9010. Already tried
0 time(s).
14/02/17 22:58:42 INFO ipc.Client: Retrying connect to server:
ec2-54-211-74-252.compute-1.amazonaws.com/10.116.229.108:9010. Already tried
1 time(s).
...

I thought maybe the security groups setup by spark_ec2.py were bad and I
opened up 9000-9010 for both master and slave but I still get the same
errors.

Next, I tried actually starting the hadoop daemons by:
root@ip-10-245-26-121 ~]$ /root/persistent-hdfs/bin/start-all.sh

That seemed to run fine, but still same errors.

Please help.

Thanks,
Mohit





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Quick-start-example-README-md-count-doesn-t-work-tp1650.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


--
Sent from Gmail Mobile
Reply | Threaded
Open this post in threaded view
|

Re: Quick start example (README.md count) doesn't work

mohitvora
Mayur Rustagi wrote
Your name node is down. Did you format your name node ?
Check using jps if you see name node is up on the master, also if
formatting doesn't solve it then you can look at name node logs
Ok ephemeral HDFS was up and thats what the example tries to access. I dropped a file in there using hadoop fs and I was able to access it from within spark-shell.