MongoDB plugin to Spark - too many open cursors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

MongoDB plugin to Spark - too many open cursors

Daniel Stojanov
Hi,


I receive an error message from the MongoDB server if there are too many
Spark applications trying to access the database at the same time (about
3 or 4), "Cannot open a new cursor since too many cursors are already
opened." I am not too sure of how to remedy this. I am not sure how the
plugin behaves when it's pulling data.

It appears that a given running application will open many connections
to the database. The total number of cursors in the database's setting
is many more than the number of read operations occurring in Spark.


Does the plugin keep a connection/cursor open to the database even after
it has pulled out the data into a dataframe?

Why are there so many open cursors for a single read operation?

Does catching the exception, sleeping for a while, then trying again
make sense? If cursors are kept open throughout the life of the
application this would not make sense.


Plugin version: org.mongodb.spark:mongo-spark-connector_2.12:2.4.1


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: MongoDB plugin to Spark - too many open cursors

lec ssmi
Is the connection pool configured by mongodb full?

Daniel Stojanov <[hidden email]> 于2020年10月26日周一 上午10:28写道:
Hi,


I receive an error message from the MongoDB server if there are too many
Spark applications trying to access the database at the same time (about
3 or 4), "Cannot open a new cursor since too many cursors are already
opened." I am not too sure of how to remedy this. I am not sure how the
plugin behaves when it's pulling data.

It appears that a given running application will open many connections
to the database. The total number of cursors in the database's setting
is many more than the number of read operations occurring in Spark.


Does the plugin keep a connection/cursor open to the database even after
it has pulled out the data into a dataframe?

Why are there so many open cursors for a single read operation?

Does catching the exception, sleeping for a while, then trying again
make sense? If cursors are kept open throughout the life of the
application this would not make sense.


Plugin version: org.mongodb.spark:mongo-spark-connector_2.12:2.4.1


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: MongoDB plugin to Spark - too many open cursors

Daniel Stojanov

Hi,

Thanks.

I believe that this is an error message coming from the MongoDB server itself. Essentially there are multiple instances of my application running at the same time. So with a single or small number of applications there are never issues. It's an issue when a sufficient number of applications are running.

I am not aware of how the MongoDB client manages connections. For example, is it leaving connections hanging (rather than closing them) after it pulls data from MongoDB? I do not know if there is a way to specify to individual running applications to limit the number of active connections to the database. The database instance is running on AWS' DocumentDB, so the only way to allow additional cursors is to upgrade to a larger instance type. This seems unnecessary since my concern is just the number of open cursors, rather than performance needs of the hardware itself.


Regards,




On 26/10/20 1:52 pm, lec ssmi wrote:
Is the connection pool configured by mongodb full?

Daniel Stojanov <[hidden email]> 于2020年10月26日周一 上午10:28写道:
Hi,


I receive an error message from the MongoDB server if there are too many
Spark applications trying to access the database at the same time (about
3 or 4), "Cannot open a new cursor since too many cursors are already
opened." I am not too sure of how to remedy this. I am not sure how the
plugin behaves when it's pulling data.

It appears that a given running application will open many connections
to the database. The total number of cursors in the database's setting
is many more than the number of read operations occurring in Spark.


Does the plugin keep a connection/cursor open to the database even after
it has pulled out the data into a dataframe?

Why are there so many open cursors for a single read operation?

Does catching the exception, sleeping for a while, then trying again
make sense? If cursors are kept open throughout the life of the
application this would not make sense.


Plugin version: org.mongodb.spark:mongo-spark-connector_2.12:2.4.1


---------------------------------------------------------------------
To unsubscribe e-mail: [hidden email]