Comparison of AWS API Gateway Endpoint Types When Behind a CloudFront Distribution
The official AWS documentation does not provide guidance around which endpoint type to use. This becomes relevant when once wants to run their entire website behind Cloudfront.
http://blog.ryangreen.ca/2017/11/03/api-gateway-regional-vs-edge-optimized-endpoints/ does suggest using a REGIONAL endpoint if you also have a Cloudfront distribution, but is lacking further detail in the area.
For this test, AWS CDK was used to setup all of the infrastructure. It makes it trivial to iterate over a list of options + regions to easy generate everything very quickly (and to update it all when there was a bug). C# was selected as the language only as a change of pace from the Java + Typescript used at work but to preserve static typing.
All tests where run async’ly at the same time using the lambda cli on 2020 09 20 21:00:00 UTC. Each test lambda was run for 15mins, and no errors were reported at this time.
clientside response time metrics
additional Cloudfront distribution metrics
Origin latency The total time spent from when CloudFront receives a request to when it starts providing a response to the network (not the viewer), for requests that are served from the origin, not the CloudFront cache. This is also known as first byte latency, or time-to-first-byte. src
REGIONAL is the preferred API Gateway endpoint type when behind a custom Cloudfront distribution
The response times for all regions were lower when the endpoint type was REGIONAL. This is expected because REGIONAL endpoints have one fewer ‘hops’ (no built in Cloudfront distribution) to go through to get to the lambda integration. A ~10% response time improvement was observed during the test when using a REGIONAL endpoint.
API Gateway compression is only suggested for far away users
Nearby users will see a small performance hit, but far away users will see a larger performance gain. Ideally, one would place another API gateway closer to their faraway users if there was enough of them to justify the cost + complexity.
Additionally, compression only added to the response time of EDGE endpoints.