Putting the fun in Cloud Functions

Serverless functions are becoming a more and more popular way to deploy microservices. Given the advantage of not having to maintain your own infrastructure, effortless scaling, and the cost benefits of only paying for the time your code is actually running, it’s easy to see why.

For several of these reasons I’ve begun evaluating Google Cloud Functions for several of the smaller services in the nodesecurity platform. Google’s platform is still in beta, and as such is still going through some changes, but I did learn a few interesting things.

One of the most common limitations on any serverless platform is based around execution time. Functions are allowed to run for a set amount of time, after which they are killed and a timeout response is sent to the client that made the request. Google’s maximum timeout is 9 minutes. For most requests this is more than enough, and in fact many providers have a much lower maximum (AWS Lambda, for example, has a maximum of 5 minutes). Since billing is typically calculated based on execution time this is a very important measurement, and one of the very first things I made an accidental discovery about.

A simple HTTP triggered cloud function looks something like this:

exports.handler = function (req, res) {
  res.send('hello, world!');
};

The req parameter represents the incoming request, and the res parameter the outgoing response. Your handler would inspect the request (if necessary), perform whatever tasks you like, then send a response.

One of the first things I did was create a handler that performed some asynchronous work (in this case, a database lookup) and was intended to return the results to the client. The function looked something like this:

exports.handler = function (req, res) {
  db.find({ id: req.query.id }).then((body) => {

    console.log(body);
    res.send(body);
  });

  res.send('hello, world!');
};

As I’m sure some of you may have noticed, there’s a pretty glaring problem here. When I added the database lookup, I mistakenly neglected to remove the original 'hello, world!' response. I made a request to the function, received a response of 'hello, world!' and immediately saw the problem in my code. I corrected my code, redeployed, and moved on.

Later, while debugging another issue, I happened to notice in the logs that the object that was looked up from the database in my poorly written function did, in fact, get logged. There was also an error since I had already called res.send() once and I was trying to call it again. This is where things get interesting, however. In the log stream was a message showing the start of my function, then the end of my function, and after that was the log entry showing the database object.

Surely this was some race condition, I thought. Why would I be able to log something after the function has already finished? So to verify that assumption, I put together a quick proof of concept function like so:

const delay = function () {
  setTimeout(() => {
    console.log('am i still running?');
  }, 5000);
};

exports.handler = function (req, res) {
  delay();
  res.send('hello, world!');
};

The idea here was that if I delayed the log entry a full 5 seconds there’s no way it should actually happen, right? Well, I made a request to the function and checked the logs, and sure enough a full 5 seconds after the entry showing the function was finished was my 'am i still running?' log.

Being curious, I checked the execution time graph for my function and saw that according to Google my function only ran for 45 milliseconds. The log that showed up 5 seconds later wasn’t taken into account at all. I still haven’t confirmed this for certain, but it appears that I wasn’t billed for that extra 5 seconds.

Feeling like I was on to something, I changed the timeout on my function down to 3 seconds. This means that if my function hasn’t completed after 3 seconds, it should be terminated and the logs should reflect that. I made another request to my function, and guess what? There was absolutely no change. The function ran successfully, I got a response, and 5 full seconds after the logs noted my function had finished there was my 'am i still running?' log entry.

At this point I had proven that asynchronous code can continue running after a response has been sent, and it completely disregards your configured timeout. As I mentioned earlier in this post, Google imposes a maximum timeout of 9 minutes. Naturally I had to see if the hard timeout would catch my function. I changed the timeout in my delay function to 10 minutes, deployed the updated function, and made a new request. I impatiently watched logs, and after 10 minutes you can imagine my delight when I saw 'am i still running?' come across my screen.

That’s right, evading the execution timeout limits (and potentially billing, again I haven’t been able to confirm that for certain) is as simple as sending a response immediately and running your slow, expensive code in the background. With this method I was able to run a single function for over two and a half hours.

I reported this finding to Google on April 6th at 1:13PM PST, and on April 7th at 2:28AM PST I received a response that they were declining to accept it to their bug bounty program on the grounds that it only negatively affects Google, and not their customers (despite my mentioning that this could easily be used to consume massive amounts of CPU cycles, which would affect everyone on their infrastructure). I should also mention that while this evasion is possible, it does classify as abuse of Google’s systems and absolutely should not be used intentionally.