How AWS Lambda team made my two years old talk completely irrelevant

Serhat Can
Cloud Türkiye
Published in
7 min readMay 18, 2020

--

A picture of me from 2 years ago looking at AWS Lambda’s future.

Two years ago on June 2018, I gave a talk in Devopsdays Amsterdam (video recording and slide deck). I remember that talk like it was yesterday because I was on stage in front of some of the best names in DevOps. I was very excited.

My talk was about the problems we were having with AWS Lambda and our workarounds for them.

Recently, in our meetup, we talked about Lambda’s challenges again. As we talked, I realized that almost everything I mentioned in Amsterdam has been addressed by the AWS Lambda team. Impressive job!

The reason I wanted to write this post is that a lot of people who tried AWS Lambda 2 or 3 years ago, probably have some bad memories — and may still insist on Lambda not being prod ready.

This post, hopefully, should help them realize most of those bad memories are now gone. Here, I briefly mention each of those issues (and mention a few others that I couldn’t fit into that 30 min. talk) and share their current state.

Problem 1: Function Startup Latency — aka. Cold start

The most famous problem with FaaS/AWS Lambda is the cold start. Even if you haven’t deployed a single Lambda function, you have heard about it.

Cold start means a startup latency. Your function spends some time before running the actual code. In some cases, that latency matters. Note that this only happens if there is no idle container available waiting to run the code. Cold start is not significant for languages like node or go while it does have some effect in languages like Java and .Net — especially if you fail to apply recommended good practices like keeping your deployment package size small.

The most popular solution to cold start was to invoke Lambda to keep some containers alive. But a lot of people don’t like this idea because the whole purpose of Lambda is to focus on writing business logic instead of dealing with these undifferentiated heavy lifting.

AWS Lambda team was aware of the frustration that the cold start was causing. Last year, they announced Provisioned Concurrency as a way of automating this manual way of keeping container up, in other words, a way of keeping functions warm. I think having this as a configuration parameter instead of deploying a bunch of other Lambdas is a big win. In an ideal world, we don’t even have this option but I think this is a pretty nice solution for such a hard problem.

Problem 2: Lack of SQS integration

If I had to guess which Lambda trigger is the most requested of all time, I’d say SQS without hesitation :) A lot of people were asking for it consistently (#awswishlist). The reasons were obvious.

First, SQS is a great Serverless service that is used in microservices architectures — hence useful in Lambda. More importantly, the dead-letter queues, where Lambda sends failed function payloads, had only SQS support. That means if you wanted to add some resiliency, you would have to leave the event-driven workflow and run Lambda functions all the time to poll your queues. That was far from ideal as you can imagine.

The SQS Trigger for AWS Lambda news came just before my talk. I clearly remember the fuzz about it. Many of us were really happy :)

Problem 3: Overscaled Lambda functions

Sometimes we write bad code or miss out on some best practices. This can cause a Lambda function to over scale and suck up all the concurrent execution limits. This fault behavior leads to errors and often a big bulky bill.

An overscaled Lambda function. A lambda function polls (old-times) a queue and runs the function again but Graylog is down and we don’t know about it…

Back in the days, we didn’t have a good solution to this problem except for following best practices like setting up cloudwatch alerts or avoiding infinite retries.

We suffered from this. Although the solution is released about half a year before my talk, I still want to bring this as up as the current solution has some downsides.

The solution to this problem is to set a limit to a function’s concurrent executions count. For a job that doesn’t need an immediate high scale, this is useful. For example, if your function consumes messages from a queue and executes a nonurgent piece of code, you can set that Lambda’s execution limit to 20 — or something. This means your function will only handle 20 messages at the same time, which is more than fine for this case.

But this solution has some downsides you should be aware of. When you set a function level limit, it consumes from the global limit, hence giving other Lambda’s fewer scalability capability. Also, no one likes to predict the scale of their function.

Problem 4: At least once event delivery — not exactly once

This is rather a distributed system problem. Your functions should be idempotent. I can hear some of you saying what do you mean. Let me explain shortly.

AWS Lambda doesn’t guarantee to invoke your functions exactly once for the same event. It guarantees to do it at least once. This means your function can unintentionally be involved multiple times by the same event, with the same requestID. You should be aware of this behavior while designing your functions.

There is a nice support page by AWS with some good suggestions. One that I’d like to mention here is using Step Functions.

Step functions, similar to AWS Lambda, has improved a lot. For this case, step functions can help you detect duplicates without any custom solution — take a look at here.

Problem 5: Using connection pools while connecting to a database

We use connection pools to effectively manage connections to databases. This is required to keep your apps and more importantly your databases healthy.

Functions are short-lived and can scale pretty quickly. We can cache some objects and reuse connection pools, but this still doesn’t remove the danger of overloading your database with many unnecessary connections.

The solution to this problem is the Amazon RDS Proxy if you use RDS (assuming you do because you are using Lambda :)). This proxy service maintains a pool of established connections to your RDS database instances and solves our problem by reducing the stress on the database. This is announced just before reInvent 2019 and still in preview.

In addition to RDS Proxy, you may want to take a look at Aurora Serverless, a Serverless relational database that is perfect for AWS Lambda — at least on paper. I don’t have much experience with it.

Problem 6: VPC effect on cold start and scalability

VPC (Virtual private cloud) is commonly used to add an extra layer of security. When you use VPC with Lambda, there were two problems. The first one was increased cold start times. The other one was overusing ENIs (Elastic Network Interfaces) and hitting your scalability limit.

This problem is now history. The good news is you don’t need to do anything to see the improvements — one of the greatest benefits of Serverless computing :)

Chris Munns wrote a detailed blog post about how they fixed this.

Problem 7: Local development and debugging

When you start with Lambda, you deploy a production-grade hello world app in a couple of minutes. This amazes you. But, when you decide to deploy a couple of more functions and write something more meaningful, things get weird.

You make a small mistake and start writing print statements to figure out what is wrong. This is not ideal. I want to be able to use my IDE and put a breakpoint and simply debug.

This issue has been addressed by a third party — Thundra. But before, I want to mention AWS Cloud9. I don’t have much experience with Cloud9 but you can use it directly on AWS Lambda console and see it for yourself that it has support for Lambda. The problem is I don’t like the idea of leaving my favorite IDE — IntellijIDEA.

Thundra.io has made some amazing progress and released their AWS Lambda Debugger. You can develop your apps and debug them locally or remotely with their debugger. The good part is you can do it with your favorite IDE like VSCode or IntellijIDEA. Check this blog post for more details.

Problem 8: Knowing what is wrong when things get weird

Compared to two years ago, the AWS Lambda ecosystem has much better support for monitoring. Many startups got mature and enterprise solutions added support for AWS Lambda. Now, we don’t need to worry if we have tools to support us. There are too many of them!

One that I want to mention here is AWS X-Ray. It gives you visibility with traces. You can use it for distributed traces or local traces with minimal effort. If you need to dig deeper (APM style), AWS Layers offers a seamless way of integrating monitoring into AWS Lambda. Many monitoring tools leverages this and make the integration seamless.

To Sum Up

I’m glad my talk is now out of date and many problems with AWS Lambda are now addressed. AWS Lambda and many other teams at AWS made Serverless much better in the last 2 years. Thank you all!

Follow me on Twitter and add me on Linkedin.

Please clap if you like my blog post. That will give me a lot of motivation to write more :)

Thank you for your time!

--

--