help on use case - spark parquet processing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

help on use case - spark parquet processing

manjay
Hi ,

I have a use case,

where i need to merge three data set and build one where ever data is available.

And my dataset is a complex object.

Customer
- name - string
- accounts - List<Account>

Account
- type - String
- Adressess - List<Address>

Address
-name - String

----

---


And it goes on.

These file are in parquet ,


All 3 input datasets are having some details , which need to merge.

And build one dataset , which has all the information ( i know the files which need to merge )


I want to know , how should I proceed on this  ??

- my approach is to build case class of actual output and parse the three dataset.
 ( but this is failing because the input response have not all the fields).

So basically , what should be the approach to deal this kind of problem ?

2nd , how can i convert parquet dataframe to dataset, considering the pauquet struct does not have all the fields. but case class has all the field ( i am getting error no struct type found)

Thanks
Manjay Kumar
8320 120 839
Reply | Threaded
Open this post in threaded view
|

Re: help on use case - spark parquet processing

Amit Sharma
Can you keep option field in your case class.


Thanks
Amit

On Thu, Aug 13, 2020 at 12:47 PM manjay kumar <[hidden email]> wrote:
Hi ,

I have a use case,

where i need to merge three data set and build one where ever data is available.

And my dataset is a complex object.

Customer
- name - string
- accounts - List<Account>

Account
- type - String
- Adressess - List<Address>

Address
-name - String

----

---


And it goes on.

These file are in parquet ,


All 3 input datasets are having some details , which need to merge.

And build one dataset , which has all the information ( i know the files which need to merge )


I want to know , how should I proceed on this  ??

- my approach is to build case class of actual output and parse the three dataset.
 ( but this is failing because the input response have not all the fields).

So basically , what should be the approach to deal this kind of problem ?

2nd , how can i convert parquet dataframe to dataset, considering the pauquet struct does not have all the fields. but case class has all the field ( i am getting error no struct type found)

Thanks
Manjay Kumar
8320 120 839