Spark-Mongodb connector issue

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Spark-Mongodb connector issue

ayan guha
Hi Guys

I have a large mongodb collection with complex document structure. I an facing an issue when I am getting error as

Can not cast Array to Struct. Value:BsonArray([])

The target column is indeed a struct. So the error makes sense.

I am able to successfully read from another collection with exactly same structure but subset of data. 

I am suspecting some documents are corrupted at mongodb.

Question:
1. Is there any way to filter out such documents in mongodb connector?
2. I tried to exclude the column from a custom select statement but did not work. Is it possible?
3. Is there any way to suppress errors to a certain amount? I do not want to stall the load of 1M record if 1 record is bad. 

I know this might be a question for mongodb forum. But I started from here as there may be some generic solution I can use. I am going to post to SO and Mongo forum shortly.

Best
Ayan

--
Best Regards,
Ayan Guha