AWS Object Lambda Access Point timing out when writing response when run from within VPC

I've got an AWS Object Lambda Access Point. (These are sort of like a proxy lambda function which can intercept S3 requests and transform them.) It runs fine when not run inside a VPC (so I think IAM is fine). A later iteration will want to access private resources so I want it running inside a VPC.

The flow of one of these lambdas (at least when transforming a GET request) is:

  1. Get invoked
  2. Download the object that was requested using a HTTP client (you get a pre-signed URL to grant access (getObjectContext.inputS3Url in the payload))
  3. Do your transformation
  4. Write the result using s3.Client.WriteGetObjectResponse

It's the last step that isn't working for me.

In my VPC I've added a gateway endpoint for S3 (for S3 either gateway or interface endpoints are supported; gateways are free. This works fine to fetch the object (step 2), I can download the object and work on it. I think that download happens through the gateway endpoint. So far so good.

But after doing the processing it times out when trying to write the response (step 4). In the logs it looks like this:

POST /WriteGetObjectResponse?x-id=WriteGetObjectResponse HTTP/1.1
Host: io-cell002.s3-object-lambda.eu-west-2.amazonaws.com
...
DEBUG retrying request s3-object-lambda/WriteGetObjectResponse, attempt 2
...
time="2022-01-02T22:25:39Z" level=error msg="Error writing to S3: operation error S3: WriteGetObjectResponse, https response error StatusCode: 0, RequestID: , HostID: , canceled, context deadline exceeded"

Which smells to me like I can't connect to the endpoint at a network level.

I tried adding an interface endpoint for Lambda (this is the only option returned - see screenshot down below), but that doesn't seem to make any difference. Perhaps this doesn't cover s3-object-lambda.<region>.amazonaws.com? Or maybe it wasn't being used - not sure how to tell that.

A picture showing that there is only an interface endpoint returned for lambda

I also tried adding an interface endpoint for S3, and removing the gateway one referenced above. This caused the Lambda to not be able to even retrieve the input object from S3 in step 2, with an i/o timeout.

(What does also work is adding a NAT gateway to the VPC, but I'd rather avoid the cost of this and AFAICT it shouldn't be necessary.)

Any help getting this working with a VPC / without NAT would be gratefully received!


Solution 1:

The short answer is: You can run your object lambda function in a VPC as long as you allow it to route to s3-object-lambda..amazonaws.com through the internet, e.g. through a NAT gateway. You were on the right track and basically figured it out in your question already.

The S3 gateway interface endpoint is necessary to enable download of the input object.

When writing the result, the request goes to s3-object-lambda, which is technically a different service than S3 (at least on network level). AWS currently doesn't provide an interface endpoint for s3-object-lambda and the S3 gateway endpoint doesn't cover it either (which can be verified by comparing the IP address WriteGetObjectResponse request goes to and the routes created by the gateway endpoint).

So the only way is to route WriteGetObjectResponse requests via opened access to the internet. For future reference, one way to set this up is with a NAT gateway. Quoting AWS docs:

  • The NAT gateway must be in a public subnet with a route table that routes internet traffic to an internet gateway.
  • Your instance must be in a private subnet with a route table that routes internet traffic to the NAT gateway.
  • Check that there are no other route table entries that route all or part of the internet traffic to another device instead of the NAT gateway.

In other words:

  • Provision a public NAT Gateway in a public subnet and allocate it an elastic IP
  • Make sure the public subnet has an internet gateway and the default route (0.0.0.0/0) points to it.
  • Set up a default route (0.0.0.0/0) from the subnet hosting your lambda and point it to the NAT Gateway.

You're right that a NAT Gateway is priced by the hour, unfortunately, and you need one per subnet.

In theory, you could at least limit the egress with a security group to IP addresses of the s3-object-lambda service, but I'm not aware these IP ranges are published anywhere.