Structured Stream in Spark

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Structured Stream in Spark

khajaasmath786
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath
Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

Subhash Sriram
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath

Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

khajaasmath786
Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <[hidden email]> wrote:
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath


Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

Subhash Sriram

On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <[hidden email]> wrote:
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath



Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

khajaasmath786
Thanks Subhash.

Have you ever used zero data loss concept with streaming. I am bit worried to use streamig when it comes to data loss.


On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <[hidden email]> wrote:

On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <[hidden email]> wrote:
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath




Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

Tathagata Das
Please do not confuse old Spark Streaming (DStreams) with Structured Streaming. Structured Streaming's offset and checkpoint management is far more robust than DStreams. 

On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks Subhash.

Have you ever used zero data loss concept with streaming. I am bit worried to use streamig when it comes to data loss.


On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <[hidden email]> wrote:

On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <[hidden email]> wrote:
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath





Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

khajaasmath786
Thanks TD. 

On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <[hidden email]> wrote:
Please do not confuse old Spark Streaming (DStreams) with Structured Streaming. Structured Streaming's offset and checkpoint management is far more robust than DStreams. 

On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks Subhash.

Have you ever used zero data loss concept with streaming. I am bit worried to use streamig when it comes to data loss.


On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <[hidden email]> wrote:

On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <[hidden email]> wrote:
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath






Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

khajaasmath786
Hi TathagataDas,

I was trying to use eventhub with spark streaming. Looks like I was able to make connection successfully but cannot see any data on the console. Not sure if eventhub is supported or not.

is the code snippet I have used to connect to eventhub

Thanks,
Asmath



On Thu, Oct 26, 2017 at 9:39 AM, KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks TD. 

On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <[hidden email]> wrote:
Please do not confuse old Spark Streaming (DStreams) with Structured Streaming. Structured Streaming's offset and checkpoint management is far more robust than DStreams. 

On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks Subhash.

Have you ever used zero data loss concept with streaming. I am bit worried to use streamig when it comes to data loss.


On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <[hidden email]> wrote:

On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <[hidden email]> wrote:
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath







Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

Shixiong(Ryan) Zhu
The codes in the link write the data into files. Did you check the output location?

By the way, if you want to see the data on the console, you can use the console sink by changing this line format("parquet").option("path", outputPath + "/ETL").partitionBy("creationTime").start() to format("console").start().

On Fri, Oct 27, 2017 at 8:41 AM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi TathagataDas,

I was trying to use eventhub with spark streaming. Looks like I was able to make connection successfully but cannot see any data on the console. Not sure if eventhub is supported or not.

is the code snippet I have used to connect to eventhub

Thanks,
Asmath



On Thu, Oct 26, 2017 at 9:39 AM, KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks TD. 

On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <[hidden email]> wrote:
Please do not confuse old Spark Streaming (DStreams) with Structured Streaming. Structured Streaming's offset and checkpoint management is far more robust than DStreams. 

On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks Subhash.

Have you ever used zero data loss concept with streaming. I am bit worried to use streamig when it comes to data loss.


On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <[hidden email]> wrote:

On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <[hidden email]> wrote:
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath








Reply | Threaded
Open this post in threaded view
|

Re: Structured Stream in Spark

khajaasmath786
Yes I checked both the output location and console too. It doesnt have any data.

link also has the code and question that I have raised with Azure HDInsights.



On Fri, Oct 27, 2017 at 3:22 PM, Shixiong(Ryan) Zhu <[hidden email]> wrote:
The codes in the link write the data into files. Did you check the output location?

By the way, if you want to see the data on the console, you can use the console sink by changing this line format("parquet").option("path", outputPath + "/ETL").partitionBy("creationTime").start() to format("console").start().

On Fri, Oct 27, 2017 at 8:41 AM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi TathagataDas,

I was trying to use eventhub with spark streaming. Looks like I was able to make connection successfully but cannot see any data on the console. Not sure if eventhub is supported or not.

is the code snippet I have used to connect to eventhub

Thanks,
Asmath



On Thu, Oct 26, 2017 at 9:39 AM, KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks TD. 

On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das <[hidden email]> wrote:
Please do not confuse old Spark Streaming (DStreams) with Structured Streaming. Structured Streaming's offset and checkpoint management is far more robust than DStreams. 

On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Thanks Subhash.

Have you ever used zero data loss concept with streaming. I am bit worried to use streamig when it comes to data loss.


On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <[hidden email]> wrote:

On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi Sriram,

Thanks. This is what I was looking for.

one question, where do we need to specify the checkpoint directory in case of structured streaming?

Thanks,
Asmath

On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <[hidden email]> wrote:
Hi Asmath,

Here is an example of using structured streaming to read from Kafka:


In terms of parsing the JSON, there is a from_json function that you can use. The following might help:


I hope this helps.

Thanks,
Subhash

On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <[hidden email]> wrote:
Hi,

Could anyone provide suggestions on how to parse json data from kafka and load it back in hive.

I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case?

Thanks,
Asmath