Should AWS CloudFront *increase* load time for infrequently accessed files?
I am new to CDNs and experimenting with CloudFront. I have set everything up and all appears to be working fine. I can create a static image on a page and access it though my CloudFront distribution. I am using a custom origin (i.e. not an s3 bucket).
I'm worried that I might be worse off from a performance point of view though. I have a test page that is loading up the same 20 or so images with and without the CDN. Looking at the net panel in Firebug, the first time I load this page the images that are loaded directly from the origin server come in much faster. On subsequent page loads the benefits of the CDN become obvious -- after 3-5 refreshes the CDN is doing better than the origin server.
So I can see that on a popular page on our site that is being hit all the time, this will be a benefit. And I should expect a benefit because I'm in Seattle (around the corner from Amazon) and my server is in CA.
The thing is that if I leave the page for a few minutes and then reload, things are back to square one, with CloudFront being worse than the origin server. Is this expected? Do things drop out of the CDN "cache" so quickly?
Is it possible that something in my setup is hurting performance? Or is the reality that the CDN will only be a net positive for content that is currently being accessed every few seconds on average?
(cross posted from the AWS forum because I've been spoiled forever by SO's turnaround times)
UPDATE:
There are two good answers below that are worth looking at if you have questions about CloudFront performance. I recently found one explanation for my specific problem wasn't mentioned though. I had left TTL at 5 minutes as an oversight. Since I'm also using a custom origin there is an additional round trip to the authoritative nameserver to resolve that to the actual Amazon CloudFront domain. Now that the TTL setting is back to 12 hours it seems that the long loads happen more seldom.
Cloudfront sets a header in replies like "X-Cache: Hit from cloudfront" in replies. Presumably, it will say "Miss" if your file wasn't in the cache of the node to which you were directed.
It is possible that your files just aren't popular enough, so they get ejected from CloudFront's cache by more popular content even though 24 hours haven't elapsed. Is also possible that IO overload or some other circumstance inside of a particular CloudFront node makes access slow. Cloudfront is very inexpensive compared with Akamai or LimeLight. Worst-case performance and guaranteed service levels are two of the reasons to use the more expensive players.
I would do a test, putting just one popular file into cloudfront in production, and then use periodic tests to see if CloudFront is indicating hits (also record total transaction time).
It is possible. However, one purpose of a CDN is scalability. You can expect the CDN to perform the same if you throw 100 visits at once or 1 million visits at once.
As far as your setup goes, there's nothing that I can know with the information you provided, but I think that the point above is what makes a CDN so valuable. If you're creating a site that doesn't get a lot of traffic, you might be better off without the CDN. However, the CDN will provide a lighter load on your web server if you get a lot of traffic because you're passing off the serving of your media to another server. One last point, a good CDN (and Amazon's is) will user their extensive network to serve your content from the location closest to the requestor. In many cases, they can serve the content from the requestor's ISP, meaning VERY fast load times.
Hope that helps.
Have I misunderstood? Doesn't the cache-control manage how long things live at the edge locations before the edge locations reload them from S3? So surely they are relevant to your situation whether you use S3 or your own origin? No?
The Amazon FAQ says: "Q. How long will Amazon CloudFront keep my files at the edge locations? By default, if no cache control header is set, each edge location checks for an updated version of your file whenever it receives a request more than 24 hours after the previous time it checked the origin for changes to that file. This is called the “expiration period.” You can set this expiration period as short as 1 hour, or as long as you’d like, by setting the cache control headers on your files in your origin. Amazon CloudFront uses these cache control headers to determine how frequently it needs to check the origin for an updated version of that file. If your files don’t change very often, it is best practice to set a long expiration period and implement a versioning system to manage updates to your files."
[I assume the last sentence means "tough luck if you set it to 50 years and then want to change the file".]
Isn't the main point of using a CDN that it hosts static content? If so, would it help to use considerably longer TTL than one day? For virtually everything (all images and CSS), I use Cache-Control = "max-age=604800, public, must-revalidate" (i.e. 1 week). In my experience, files definitely do then take up to a week to change if I upload new versions onto S3.
Hope this helps. [BTW: On your more general point, I too wonder if a CDN helps performance as much as you think it's going to. I am about to move my entire site (CDN included) onto a super-fast dedicated server and do some tests to find out.]
The reasons to use CDN is if you are expecting
- Static content - infrequently or controlled updates
- Viewed over the world
- Accessed frequently
Our website is accessed infrequently as your case but we have a monitoring service setup that requests our website all over the world. So it keeps CDN caches warm. I would also like to share our case which is a simple one and demonstrates CDN capability.
Further more we are expecting a monthly charge of 2.2$ as opposed to 7$ for godaddy server(which cant handle surges)