Logging in RDD mapToPair of Java Spark application

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Logging in RDD mapToPair of Java Spark application

johnzengspark
This post has NOT been accepted by the mailing list yet.
Hi, All,

Although there are lots of discussions related to logging in this news group, I did not find an answer to my specific question so I am posting mine with the hope that this will not cause a duplicated question.

Here is my simplified Java testing Spark app:

public class SparkJobEntry {
        public static void main(String[] args) {
                // Following line is in stdout from JobTracker UI
                System.out.println("argc=" + args.length);
               
                SparkConf conf = new SparkConf().setAppName("TestSparkApp");
                JavaSparkContext sc = new JavaSparkContext(conf);
                JavaRDD<String> fileRDD = sc.textFile(args[0]);
               
                fileRDD.mapToPair(new PairFunction<String, String, String>() {

                        private static final long serialVersionUID = 1L;
                       
                        @Override
                        public Tuple2<String, String> call(String input) throws Exception {
                                // Following line is not in stdout from JobTracker UI
                                System.out.println("This line should be printed in stdout");
                                // Other code removed from here to make things simple
                                return new Tuple2<String, String>("1", "Testing data");
                        }}).saveAsTextFile(args[0] + ".results");
        }
}

What I expected from JobTracker UI is to see both stdout lines: first line is "argc=2" and second line is "This line should be printed in stdout".  But I only see the first line which is outside of the 'mapToPair'.  I actually have verified my 'mapToPair' is called and the statements after the second logging line were executed.  The only issue for me is why the second logging is not in JobTracker UI.  

Appreciate your help.

Thanks

John
Loading...