MongoDB connections from AWS Lambda
I'm looking to create a RESTful API using AWS Lambda/API Gateway connected to a MongoDB database. I've read that connections to MongoDB are relatively expensive so it's best practice to retain a connection for reuse once its been established rather than making new connections for every new query.
This is pretty straight forward for normal applications as you can establish a connection during start up and reuse it during the applications lifetime. But, since Lambda is designed to be stateless retaining this connection seems to be less straight forward.
Therefore, I'm wondering what would be the best way to approach this database connection issue? Am I forced to make new connections every time a Lambda function is invoked or is there a way to pool/cache these connections for more efficient queries?
Thanks.
AWS Lambda functions should be defined as stateless functions, so they can't hold state like a connection pool.
This issue was also raised in this AWS forum post. On Oct 5, 2015 AWS engineer Sean posted that you should not open and close connection on each request, by creating a pool on code initialization, outside of handler block. But two days later the same engineer posted that you should not do this.
The problem is that you don't have control over Lambda's runtime environment. We do know that these environments (or containers) are reused, as describes the blog post by Tim Wagner. But the lack of control can drive you to drain all your resources, like reaching a connection limit in your database. But it's up to you.
Instead of connecting to MongoDB from your lambda function you can use RESTHeart to access the database through HTTP. The connection pool to MongoDB is maintained by RESTHeart instead. Remember that in regards to performance you'll be opening a new HTTP connection to RESTHeart on each request, and not using a HTTP connection pool, like you could do in a tradicional application.
Restheart is a REST-based server that runs alongside MongoDB. It maps most CRUD operations in Mongo to GET, POST, etc., requests with extensible support when you need to write a custom handler (e.g., specialized geoNear, geoSearch query)
You should assume lambdas to be stateless but the reality is that most of the time the vm is simply frozen and does maintain some state. It would be stupid for Amazon to spin up a new process for every request so they often re-use the same process and you can take advantage of this to avoid thrashing connections.
To avoid connecting for every request (in cases where the lambda process is re-used):
Write the handler assuming the process is re-used such that you connect to the database and have the lamba re-use the connection pool (the
db
promise returned fromMongoClient.connect
).In order for the lambda not to hang waiting for you to close the db connection,
db.close()
, after servicing a request tell it not wait for an empty event loop.
Example:
var db = MongoClient.connect(MongoURI);
module.exports.targetingSpec = (event, context, callback) => {
context.callbackWaitsForEmptyEventLoop = false;
db.then((db) => {
// use db
});
};
From the documentation about context.callbackWaitsForEmptyEventLoop
:
callbackWaitsForEmptyEventLoop The default value is true. This property is useful only to modify the default behavior of the callback. By default, the callback will wait until the Node.js runtime event loop is empty before freezing the process and returning the results to the caller. You can set this property to false to request AWS Lambda to freeze the process soon after the callback is called, even if there are events in the event loop. AWS Lambda will freeze the process, any state data and the events in the Node.js event loop (any remaining events in the event loop processed when the Lambda function is called next and if AWS Lambda chooses to use the frozen process). For more information about callback, see Using the Callback Parameter.
I ran some tests executing Java Lambda functions connecting to MongoDB Atlas.
As already stated by other posters Amazon does reuse the Instances, however these may get recycled and the exact behaviour cannot be determined. So one could end up with stale connections. I'm collecting data every 5 minutes and pushing it to the Lambda function every 5 minutes.
The Lambda basically does:
- Build up or reuse connection
- Query one record
- Write or update one record
- close the connection or leave it open
The actual amount of data is quite low. Depending on time of the day it varies from 1 - 5 kB. I only used 128 MB.
The Lambdas ran in N.Virgina as this is the location where the free tier is tied to.
When opening and closing the connection each time most calls take between 4500 - 9000 ms. When reusing the connection most calls are between 300 - 900 ms. Checking the Atlas console the connection count stays stable. For this case reusing the connection is worth it. Building up a connection and even disconnecting from a replica-set is rather expensive using the Java driver.
For a large scale deployment one should run more comprehensive tests.