Problem of how to retrieve file from HDFS

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Problem of how to retrieve file from HDFS

Ashish Mittal
Hi,
I am trying to store and retrieve csv file from HDFS.but i have successfully store csv file in HDFS using LinearRegressionModel in spark using Java.but not retrieve csv file from HDFS. how to retrieve csv file from HDFS.
code--
SparkSession sparkSession = SparkSession.builder().appName("JavaSparkModelWithHadoopHDFSExample").master("local[2]").getOrCreate();
        SQLContext sqlContext = new SQLContext(sparkSession);

        VectorAssembler assembler = new VectorAssembler();
        assembler.setInputCols(new String[] { "MONTH_1", "MONTH_2", "MONTH_3", "MONTH_4", "MONTH_5", "MONTH_6" })
                .setOutputCol("features");

        Dataset<Row> rowDataSet = sqlContext.read().format("csv").option("header", "true").option("inferSchema", "true")
                .load("hdfs://localhost:9000/user/hadoop/inpit/data/history.csv");
        rowDataSet.show();
        rowDataSet.printSchema();

        Dataset<Row> vectorDataSet = assembler.transform(rowDataSet).drop("CUST_ID");
        vectorDataSet.show();

        LinearRegression lr = new LinearRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
                .setFeaturesCol("features").setLabelCol("CLV");
        lr.setPredictionCol("prediction");

        LinearRegressionModel lrModel = lr.fit(vectorDataSet);
        lrModel.write().overwrite().save("hdfs://localhost:9000/user/hadoop/inpit/data/history.csv");

This code is successfully store csv file. but i don't know how to retrieve csv file from hdfs. Please help me.

Thanks & Regards,
Ashish Mittal