Proxy for a local mirror of S3 directories

We have an office that has increasing demand for accessing large files from our own Amazon S3 directories. Being able to access them quickly is important for our business, so we believe it is time to start keeping copies of the files onsite. This is not my area of expertise, so I'm hoping for some advice.

A "normal" cache will not be sufficient for us on its own, since we want to speed up even the first request for any given file. The AWS CLI has the ability to keep a local directory in sync with S3, so one idea is to run that on a schedule during low traffic times, then configure a proxy to treat that directory as its cache, if that is possible.

Another idea is to issue get requests to a caching proxy from a script to keep the cache warm, on a similar schedule.

One caveat is that the S3 assets are private, so we sign their URLs before making each request. This means the proxy will need to be able to serve the local copy based on the URL excluding any query parameters. For example, both of these urls should resolve to the same cached/mirrored file:

  • https://example.com/asset1.txt?signature=1
  • https://example.com/asset1.txt?signature=2

The size of the cache will be in the single digit terabytes, and process the traffic for about 300 active users.

So finally, my questions:

  • Do either of these approaches sound sane?
  • Can anyone recommend proxying software that can be configured the the way we need?
  • Are there resources I can consult to determine hardware requirements for this load?
  • Any other thoughts/suggestions?

If you need just synchronize your local repository with cloud-based object storage, I would take a look at Rclone or CloudBerry. Rclone has a command-line interface to synchronize directories and files between the clouds. It works for most popular cloud storage like Azure, AWS (both S3 and Glacier), etc. https://rclone.org/

Also, if you want to backup all the data to the cloud, there is an opportunity to take Virtual Tape Library backups with an additional offload to the cloud. So if you need to backup your existing infrastructure, you can take ransomware-proof backups with automatic offload to the cloud. It has dedupe and compression, but as far as I know, right now Starwind gives it for free. https://www.starwindsoftware.com/starwind-virtual-tape-library

Both solutions are mature and reliable, you only need to choose the needed option. Hope it was helpful.