SparkStreaming not read hadoop configuration from its sparkContext on Stand Alone mode?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SparkStreaming not read hadoop configuration from its sparkContext on Stand Alone mode?

robin_up
Hi

I try to run a small piece of code on Spark Steaming. It sets the s3 keys in sparkContext object and passed into a sparkStreaming object. However, I got the below error -- it seems StreamingContext did not use the hadoop config on work threads. It works ok if I run it in spark core (batch mode) without streaming.

java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).


//my code:

System.setProperty("spark.cleaner.ttl", "3600")
        val spark_master = "spark://" + System.getenv("SPARK_MASTER_IP") + ":" + System.getenv("SPARK_MASTER_PORT")
        val external_jars = Seq("target/scala-2.9.3/test_2.9.3-1.0.jar","/opt/json4s-core_2.9.3-3.2.2.jar","/opt/json4s-native_2.9.3-3.2.2.jar","/opt/json4s-ast_2.9.3-3.2.2.jar")

        val sc = new SparkContext(spark_master, "test", System.getenv("SPARK_HOME"), external_jars)
        sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", System.getenv("ds_awsAccessKeyId"))
        sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", System.getenv("ds_awsSecretAccessKey"))
        val ssc = new StreamingContext(sc, Seconds(5))

        val file = ssc.textFileStream("s3n://my-bucket/syslog-ng/2014-01-24/")
-- Robin Li
Reply | Threaded
Open this post in threaded view
|

Re: SparkStreaming not read hadoop configuration from its sparkContext on Stand Alone mode?

Tathagata Das
Which version of Spark are you trying this with?


On Mon, Jan 27, 2014 at 7:59 PM, robin_up <[hidden email]> wrote:
Hi

I try to run a small piece of code on Spark Steaming. It sets the s3 keys in
sparkContext object and passed into a sparkStreaming object. However, I got
the below error -- it seems StreamingContext did not use the hadoop config
on work threads. It works ok if I run it in spark core (batch mode) without
streaming.

java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key
must be specified as the username or password (respectively) of a s3n URL,
or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey
properties (respectively).


//my code:

System.setProperty("spark.cleaner.ttl", "3600")
        val spark_master = "spark://" + System.getenv("SPARK_MASTER_IP") +
":" + System.getenv("SPARK_MASTER_PORT")
        val external_jars =
Seq("target/scala-2.9.3/test_2.9.3-1.0.jar","/opt/json4s-core_2.9.3-3.2.2.jar","/opt/json4s-native_2.9.3-3.2.2.jar","/opt/json4s-ast_2.9.3-3.2.2.jar")

        val sc = new SparkContext(spark_master, "test",
System.getenv("SPARK_HOME"), external_jars)
        sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId",
System.getenv("ds_awsAccessKeyId"))
        sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey",
System.getenv("ds_awsSecretAccessKey"))
        val ssc = new StreamingContext(sc, Seconds(5))

        val file =
ssc.textFileStream("s3n://my-bucket/syslog-ng/2014-01-24/")



-----
-- Robin Li
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkStreaming-not-read-hadoop-configuration-from-its-sparkContext-on-Stand-Alone-mode-tp972.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.