Convert a line of String into column

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Convert a line of String into column

hamishberridge
I want to convert a line of String to a table. For instance, I want to convert following line

<column1> <column2> <columns> ...<column6> # this is a line in a text file, separated by a white space

to table

+-----+------+----....+------+
|col1| col2| col3...|col6|
+-----+-----+-----....+-----+
|val1|val2|val3....|val6|
+-----+------+---.....+-----+
.....

The code looks as below

    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.SparkSession

    val spark = SparkSession
      .builder
      .master("local")
      .appName("MyApp")
      .getOrCreate()

    import spark.implicits._

    val lines = spark.readStream.textFile("/tmp/data/")

    val words = lines.as[String].flatMap(_.split(" "))
    words.printSchema()

    val query = words.
      writeStream.
      outputMode("append").
      format("console").
      start
    query.awaitTermination()
But in fact this code only turns the line into a single column

+-------+
| value|
+-------+
|col1...|
|col2...|
| col3..|
|  ...     |
|  col6 |
+------+

How to achieve the effect that I want to do?

Thanks?

Reply | Threaded
Open this post in threaded view
|

Re: Convert a line of String into column

Dhaval Modi
Hi,

1st convert  "lines"  to dataframe. You will get one column with original string in one row.

Post this, use string split on this column to convert to Array of String.

After This, you can use explode function to have each element of the array as columns.

On Wed 2 Oct, 2019, 03:18 , <[hidden email]> wrote:
I want to convert a line of String to a table. For instance, I want to convert following line

<column1> <column2> <columns> ...<column6> # this is a line in a text file, separated by a white space

to table

+-----+------+----....+------+
|col1| col2| col3...|col6|
+-----+-----+-----....+-----+
|val1|val2|val3....|val6|
+-----+------+---.....+-----+
.....

The code looks as below

    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.SparkSession

    val spark = SparkSession
      .builder
      .master("local")
      .appName("MyApp")
      .getOrCreate()

    import spark.implicits._

    val lines = spark.readStream.textFile("/tmp/data/")

    val words = lines.as[String].flatMap(_.split(" "))
    words.printSchema()

    val query = words.
      writeStream.
      outputMode("append").
      format("console").
      start
    query.awaitTermination()
But in fact this code only turns the line into a single column

+-------+
| value|
+-------+
|col1...|
|col2...|
| col3..|
|  ...     |
|  col6 |
+------+

How to achieve the effect that I want to do?

Thanks?

Reply | Threaded
Open this post in threaded view
|

Re: Convert a line of String into column

ayan guha
Do you know how many columns? 

On Sat, Oct 5, 2019 at 6:39 PM Dhaval Modi <[hidden email]> wrote:
Hi,

1st convert  "lines"  to dataframe. You will get one column with original string in one row.

Post this, use string split on this column to convert to Array of String.

After This, you can use explode function to have each element of the array as columns.

On Wed 2 Oct, 2019, 03:18 , <[hidden email]> wrote:
I want to convert a line of String to a table. For instance, I want to convert following line

<column1> <column2> <columns> ...<column6> # this is a line in a text file, separated by a white space

to table

+-----+------+----....+------+
|col1| col2| col3...|col6|
+-----+-----+-----....+-----+
|val1|val2|val3....|val6|
+-----+------+---.....+-----+
.....

The code looks as below

    import org.apache.spark.sql.functions._
    import org.apache.spark.sql.SparkSession

    val spark = SparkSession
      .builder
      .master("local")
      .appName("MyApp")
      .getOrCreate()

    import spark.implicits._

    val lines = spark.readStream.textFile("/tmp/data/")

    val words = lines.as[String].flatMap(_.split(" "))
    words.printSchema()

    val query = words.
      writeStream.
      outputMode("append").
      format("console").
      start
    query.awaitTermination()
But in fact this code only turns the line into a single column

+-------+
| value|
+-------+
|col1...|
|col2...|
| col3..|
|  ...     |
|  col6 |
+------+

How to achieve the effect that I want to do?

Thanks?



--
Best Regards,
Ayan Guha