code changes to SimpleApp

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

code changes to SimpleApp

Zahid Rahman
Hi,

I have made a number of code changes to make the SimpleApp work.
Your error messages are quite good.

package co.uk.backbutton.sparksimpleapp;
/**
 *
 * @author zahid
 */
/* SimpleApp.java */

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class SimpleApp {
  public static void main(String[] args) {
    String filename = "/home/zahid/spark/spark-3.0.0-preview2-bin-hadoop2.7/README.md"; // Should be some file on your system
//     SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate();
   

     SparkConf sparkConf = new SparkConf()
             .setAppName("Simple Application")
             .setMaster("local")
             .set("spark.executor.memory","2g");
     
        // start a spark context
        JavaSparkContext sc = new JavaSparkContext(sparkConf);
       
        JavaRDD<String> inputFile = sc.textFile(filename);    
       
   /* Dataset <String> logData = sparkConf.read()
            .textFile(logFile)
            .cache();
    */
   
       long numAs =  inputFile.filter(s -> s.contains("a")).count();
       long numBs = inputFile.filter(s -> s.contains("b")).count();
               
//     long numAs = logData.filter(s -> s.contains("a")).count();
//     long numBs = logData.filter(s -> s.contains("b")).count();

    System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);

//     spark.stop();
  }
}


cd /home/zahid/NetBeansProjects/SparkSimpleApp; JAVA_HOME=/home/zahid/jdk13 /home/zahid/netbeans/java/maven/bin/mvn "-Dexec.args=-classpath %classpath co.uk.backbutton.sparksimpleapp.SimpleApp" -Dexec.executable=/home/zahid/jdk13/bin/java process-classes org.codehaus.mojo:exec-maven-plugin:1.5.0:exec
Scanning for projects...
                                                                       
------------------------------------------------------------------------
Building Simple Project 1.0-SNAPSHOT
------------------------------------------------------------------------

--- maven-resources-plugin:2.6:resources (default-resources) @ SparkSimpleApp ---
Using 'UTF-8' encoding to copy filtered resources.
skip non existing resourceDirectory /home/zahid/NetBeansProjects/SparkSimpleApp/src/main/resources

--- maven-compiler-plugin:3.1:compile (default-compile) @ SparkSimpleApp ---
Changes detected - recompiling the module!
Compiling 1 source file to /home/zahid/NetBeansProjects/SparkSimpleApp/target/classes

--- exec-maven-plugin:1.5.0:exec (default-cli) @ SparkSimpleApp ---
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/zahid/.m2/repository/org/apache/spark/spark-unsafe_2.11/2.4.4/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/03/01 00:15:52 INFO SparkContext: Running Spark version 2.4.4
20/03/01 00:15:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/01 00:15:52 INFO SparkContext: Submitted application: Simple Application
20/03/01 00:15:52 INFO SecurityManager: Changing view acls to: zahid
20/03/01 00:15:52 INFO SecurityManager: Changing modify acls to: zahid
20/03/01 00:15:52 INFO SecurityManager: Changing view acls groups to:
20/03/01 00:15:52 INFO SecurityManager: Changing modify acls groups to:
20/03/01 00:15:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zahid); groups with view permissions: Set(); users  with modify permissions: Set(zahid); groups with modify permissions: Set()
20/03/01 00:15:52 INFO Utils: Successfully started service 'sparkDriver' on port 34347.
20/03/01 00:15:52 INFO SparkEnv: Registering MapOutputTracker
20/03/01 00:15:52 INFO SparkEnv: Registering BlockManagerMaster
20/03/01 00:15:52 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/03/01 00:15:52 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/03/01 00:15:52 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-4e60ffbd-a384-434f-9033-63506d90acf7
20/03/01 00:15:52 INFO MemoryStore: MemoryStore started with capacity 987.6 MB
20/03/01 00:15:52 INFO SparkEnv: Registering OutputCommitCoordinator
20/03/01 00:15:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/03/01 00:15:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.42:4040
20/03/01 00:15:53 INFO Executor: Starting executor ID driver on host localhost
20/03/01 00:15:53 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40923.
20/03/01 00:15:53 INFO NettyBlockTransferService: Server created on 192.168.0.42:40923
20/03/01 00:15:53 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/03/01 00:15:53 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.42, 40923, None)
20/03/01 00:15:53 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.42:40923 with 987.6 MB RAM, BlockManagerId(driver, 192.168.0.42, 40923, None)
20/03/01 00:15:53 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.42, 40923, None)
20/03/01 00:15:53 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.42, 40923, None)
20/03/01 00:15:53 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.2 KB, free 987.5 MB)
20/03/01 00:15:53 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 987.5 MB)
20/03/01 00:15:53 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.42:40923 (size: 20.4 KB, free: 987.6 MB)
20/03/01 00:15:53 INFO SparkContext: Created broadcast 0 from textFile at SimpleApp.java:31
20/03/01 00:15:53 INFO FileInputFormat: Total input paths to process : 1
20/03/01 00:15:53 INFO SparkContext: Starting job: count at SimpleApp.java:38
20/03/01 00:15:53 INFO DAGScheduler: Got job 0 (count at SimpleApp.java:38) with 1 output partitions
20/03/01 00:15:53 INFO DAGScheduler: Final stage: ResultStage 0 (count at SimpleApp.java:38)
20/03/01 00:15:53 INFO DAGScheduler: Parents of final stage: List()
20/03/01 00:15:53 INFO DAGScheduler: Missing parents: List()
20/03/01 00:15:53 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at filter at SimpleApp.java:38), which has no missing parents
20/03/01 00:15:53 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.2 KB, free 987.5 MB)
20/03/01 00:15:53 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 987.5 MB)
20/03/01 00:15:53 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.42:40923 (size: 2.5 KB, free: 987.6 MB)
20/03/01 00:15:53 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
20/03/01 00:15:53 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at filter at SimpleApp.java:38) (first 15 tasks are for partitions Vector(0))
20/03/01 00:15:53 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/03/01 00:15:53 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7927 bytes)
20/03/01 00:15:53 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/03/01 00:15:53 INFO HadoopRDD: Input split: file:/home/zahid/spark/spark-3.0.0-preview2-bin-hadoop2.7/README.md:0+4666
20/03/01 00:15:53 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 832 bytes result sent to driver
20/03/01 00:15:53 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 86 ms on localhost (executor driver) (1/1)
20/03/01 00:15:53 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/03/01 00:15:53 INFO DAGScheduler: ResultStage 0 (count at SimpleApp.java:38) finished in 0.162 s
20/03/01 00:15:53 INFO DAGScheduler: Job 0 finished: count at SimpleApp.java:38, took 0.194265 s
20/03/01 00:15:53 INFO SparkContext: Starting job: count at SimpleApp.java:39
20/03/01 00:15:53 INFO DAGScheduler: Got job 1 (count at SimpleApp.java:39) with 1 output partitions
20/03/01 00:15:53 INFO DAGScheduler: Final stage: ResultStage 1 (count at SimpleApp.java:39)
20/03/01 00:15:53 INFO DAGScheduler: Parents of final stage: List()
20/03/01 00:15:53 INFO DAGScheduler: Missing parents: List()
20/03/01 00:15:53 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at filter at SimpleApp.java:39), which has no missing parents
20/03/01 00:15:53 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.2 KB, free 987.5 MB)
20/03/01 00:15:53 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.5 KB, free 987.5 MB)
20/03/01 00:15:53 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.0.42:40923 (size: 2.5 KB, free: 987.6 MB)
20/03/01 00:15:53 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1161
20/03/01 00:15:53 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at filter at SimpleApp.java:39) (first 15 tasks are for partitions Vector(0))
20/03/01 00:15:53 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
20/03/01 00:15:53 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 0, PROCESS_LOCAL, 7927 bytes)
20/03/01 00:15:53 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
20/03/01 00:15:53 INFO HadoopRDD: Input split: file:/home/zahid/spark/spark-3.0.0-preview2-bin-hadoop2.7/README.md:0+4666
20/03/01 00:15:53 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 875 bytes result sent to driver
20/03/01 00:15:53 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 19 ms on localhost (executor driver) (1/1)
20/03/01 00:15:53 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
20/03/01 00:15:53 INFO DAGScheduler: ResultStage 1 (count at SimpleApp.java:39) finished in 0.032 s
20/03/01 00:15:53 INFO DAGScheduler: Job 1 finished: count at SimpleApp.java:39, took 0.035838 s
Lines with a: 65, lines with b: 33
20/03/01 00:15:53 INFO SparkContext: Invoking stop() from shutdown hook
20/03/01 00:15:53 INFO SparkUI: Stopped Spark web UI at http://192.168.0.42:4040
20/03/01 00:15:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/03/01 00:15:53 INFO MemoryStore: MemoryStore cleared
20/03/01 00:15:53 INFO BlockManager: BlockManager stopped
20/03/01 00:15:53 INFO BlockManagerMaster: BlockManagerMaster stopped
20/03/01 00:15:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/03/01 00:15:53 INFO SparkContext: Successfully stopped SparkContext
20/03/01 00:15:53 INFO ShutdownHookManager: Shutdown hook called
20/03/01 00:15:53 INFO ShutdownHookManager: Deleting directory /tmp/spark-f955ef19-35a5-43b3-9e05-4fe0cc25590d
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 3.547 s
Finished at: 2020-03-01T00:15:53+00:00
Final Memory: 39M/132M
------------------------------------------------------------------------


¯\_(ツ)_/¯
♡۶Java♡۶RMI ♡۶
Make Use Method {MUM}