AWS Serverless Common Mistakes - Communication with Other Systems (5/7)

"No service is an island" and serverless systems can have problems communicating with other systems. This is part of a multipart blog post about serverless common mistakes. Friday, February 15, 2019

This is continuation of a blog about common mistakes when building serverless systems on AWS. It is split into the following parts:

Functions are mostly the glue in the serverless world that connect different systems. If these systems are legacy systems or systems that were designed for another environment this could present a challenge.

Connections

Connecting to systems that demand a live connection that is expensive to set up with a limited number of them can present a problem. SQL databases are an example of such system. Connecting to Redis and MongoDb can also be problematic.

What is the problem? Let's look at SQL databases. Traditionally, you have a connection pool system because establishing a connection is expensive. That is a pool of open connections. You get a connection for the time you need it. Then it is returned to the pool. In the serverless world, functions ("servers"/containers) are stateless. They come and go depending on the current needs. There is nobody to hold the connection pool. You can open connection whenever you need it and close it afterward. But this is expensive. You can leave open a connection and reuse it next time if the same container that executes function is reused. But this way, you could easily run out of them. The solution is to build a smart system that handles the opening and closing connection and reuses it if possible. When leaving open a connection, be aware that Lambda waits for Node's Event Loop to finish before returning anything via the callback. So, if you leave the connection open, the call could never finish. Lambda has a "context" on which you set callbackWaitsForEmptyEventLoop to false to change this behavior.

Lambda Live Debugger Banner

MySQL

If you are connecting to MySql or MySql compatible database (Aurora, MariaDb), there is already a library for Node.js that resolves your problems called Serverless MySQL by the awesome Jeremy Daly. Here is a detailed explanation of all the settings you should set to optimaly manage connections.

How to use:

const mysql = require('serverless-mysql')({
  config: {
    host     : process.env.ENDPOINT,
    database : process.env.DATABASE,
    user     : process.env.USERNAME,
    password : process.env.PASSWORD
  }
})

// Main handler function
exports.handler = async (event, context) {
 context.callbackWaitsForEmptyEventLoop = false;
  // Run your query
  let results = await mysql.query('SELECT \* FROM table')
  // Run clean up function
  await mysql.end()
  // Return the results
  return results
}

When you finish an executing function, you call end() instead of closing the connection. The library looks at how many connections are used and decides if it needs to close the connection or if it can be saved for the next revocation. Do not forget to await this call. Also, do not forget to set callbackWaitsForEmptyEventLoop = false, which allows Lambda to complete despite an open database connection that occupies the event loop.

For connecting to MySql in Lambda use library Serverless MySQL by @jeremy_daly

VPC and SQL Databases

When connecting to SQL databases with Lambda, there is another issue. You can make a connection in two ways:

  • Lambda inside VPC

    Allow access to RDS from all IPs in the VPC. You have a longer cold start and you can run out of ENIs or IPs, so you have to limit Lambda concurrency. You can read moure about VPC here. The problem will soon be significantly reduced.

  • Lambda outside VPC

    Set the RDS to "Publicly Accessible" and in the security group allow access from everywhere (Lambda IPs are not known).

Opening the database to public access is of course not an ideal solution and probably not acceptable because of security reasons. But if you want to avoid all the disadvantages of VPC, that is the only way. If you are choosing this option, use an SSL connection. You can read how to use SSL connection with SQL database in this blog.

Amazon is building a new Data API for Amazon Aurora Serverless that will not have such problems, but it is currently not ready for prime time.

Limiting Scalability

Systems you connect to might not be as scalable as your serverless systems. You could easily overload them and there are several ways to prevent this.

If you can make the process asynchronous, you can use a Decoupled Invocation pattern and put queue (SQS), Kinesis Data Stream, or DynamoDB Streams between and balance the load. Here is an excellent blog of how to implement this.

If the calling service have fixet set of quotas you can implement this quite complext throttling mechanizem.

If calls need to be synchronous, then limit the concurrency of Lambda or set API Gateway throttling, so your serverless system will be throttled but your external system will survive.

An SQL database is again such a system where you must implement described tactics. In asynchronous scenarios, use a Decoupled Invocation pattern and for synchronous, set the maximum concurrency for the functions less than the number of provided connections.

Serverless systems: Prevent overloading downstream resources with Decoupled Invocation pattern

Multiple Concurrent Calls

If you are making multiple concurrent calls (like in a fan-out pattern) in Node.js there is a default configuration that limits open sockets to five. You can increase this:

var agent = new https.Agent();
agent.maxSockets = Infinity;

var AWS = require('aws-sdk');
AWS.config.httpOptions = { agent: agent }

Other systems have similar limitations.

Set maxSockets to remove the limit of multiple concurrent calls in #nodejs