Spark Structured Streaming XML content

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark Structured Streaming XML content

Nick Dawes
I'm trying to analyze data using Kinesis source in PySpark Structured Streaming on Databricks. 

Ceeated a Dataframe as shown below. 

kinDF = spark.readStream.format("kinesis").("streamName", "test-stream-1").load()

Converted the data from base64 encoding as below. 

df =  kinDF.withColumn("xml_data", expr("CAST(data as string)"))

Now, I need to extract few fields from df.xml_data column using xpath. 

Can you please suggest any possible solution?

If I create a dataframe directly for these xml files as

xml_df = spark.read.format("xml").options(rowTag='Consumers').load("s3a://bkt/xmldata")

I'm able to query using xpath

xml_df.select("Analytics.Amount1").show()

But, not sure how to do extract elements similarly on a Spark Streaming dataframe where data is in text format. 

Are there any xml functions to convert text data using schema? I saw an example for json data using from_json. 

Is it possible to use spark.read on a dataframe column? 

I need to find aggregated "Amount1" for every 5 minutes window. 

Thanks for your help. 

Nick
Reply | Threaded
Open this post in threaded view
|

Re: Spark Structured Streaming XML content

akstremepro
Hi, I am trying something similar now. However not able to do it on the
streaming dataframe myself. Were you able to resolve it?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]