Spark FileStreaming not Working with foreachRDD

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Spark FileStreaming not Working with foreachRDD

This post has NOT been accepted by the mailing list yet.
I'm newbie to Spark, and i'm building a Small sample application which is a Spark fileStreaming one. All i wanted is to read the whole file in one go instead of reading line by line(i guess this is what the textFileStream does).

The code is below:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat


object SampleXML{

    def main(args: Array[String]){

        val logFile = "/home/akhld/mobi/spark-streaming/logs/sample.xml"

        val ssc = new StreamingContext("spark://localhost:7077","XML Streaming Job",Seconds(5),"/home/akhld/mobi/spark-streaming/spark-0.8.0-incubating",List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))

        val lines = ssc.fileStream[LongWritable, Text, TextInputFormat]("/home/akhld/mobi/spark-streaming/logs/")


        lines.foreachRDD(rdd => {
          rdd.count()  // prints counts




This code is failing with an exception saying that:

[error] /home/akhld/mobi/spark-streaming/samples/samplexml/src/main/scala/SampleXML.scala:31: value foreachRDD is not a member of org.apache.spark.streaming.DStream[(,]
[error]         ssc.fileStream[LongWritable, Text, TextInputFormat]("/home/akhld/mobi/spark-streaming/logs/").foreachRDD(rdd => {
[error]                                                                                                       ^
[error] one error found
[error] (compile:compile) Compilation failed
[error] Total time: 3 s, completed Feb 3, 2014 7:32:57 PM

If this is not the right way of displaying the contents of the files in the stream, Please help me with an example. I searched a lot but couldn't find the proper one to use fileStream.