persistent storage on Amazon EC2 and other questions
i come from the land of VPSes, and all this cloud hype confuse me.
i have been reading through the amazon EC2 guide for the past 2 days, but i still have few sketchy points i'm not understanding.
if i use an EBS backed ubuntu AMI, where exactly is that EBS volume? it's the root device, right?
how persistent is it? i have termination-protection enabled, so should i worry about any data loss? can i stop/start (or issue a reboot/shutdown command on ssh) the instance without any fear of data loss?
i still am running my mail and web servers on traditional VPSes for now, but would like to gradually move to the cloud. if i use an EBS backed instance, will it function just like a VPS? if not, what's the difference (obviously, other than elasticity, scalibility, etc.. just in functionality is what i want to know)
one more question, i noticed the security groups, if i configure a security group for an instance, does it mean i wont have to worry about iptables anymore?
thanks in advance for your patience.
Solution 1:
Instances can have two types of storage - EBS and S3 backed. Instances with root devices that are S3 backed are called 'instance store'.
S3 backed volumes are ephemeral - they can have data on them, you can write to them and read from them - but none of the data persists - as soon as the instance is terminated the data is lost (and the original data will then be reloaded from S3 the next time you launch such an instance).
EBS is a block device - you can attach multiple EBS volumes, of varying size. Typically one is the root volume (unless you go with an S3 backed instance), and the others will be setup for whatever purpose you desire. You must provision EBS volumes in advance - so if you have a 10GB volume, and only use 1GB of it, you still pay for all 10GB of 'provisioned storage'.
EBS volumes will show up as devices under /dev - typically /dev/xvda1, /dev/xvdf, etc. (and are also symlinked to /dev/sd*).
You can transfer an EBS volume between instances (even the root EBS volume) but you cannot attach a single EBS volume to more than one instance at a time. Keep in mind that the performance of EBS volumes is limited in part by the network bandwidth your instance has - and therefore smaller instance types will see a greater variability in EBS performance
You can also setup multiple EBS volumes in a RAID configuration to improve performance or for redundancy, etc. Typically though, using EBS snapshots is a good way to keep point-in-time backups of your EBS volumes.
EBS root volumes, by default, are set to be deleted on termination (you can change this) - when an instance is terminated, the volume will be destroyed. EBS volumes you create and attach manually, are not deleted on termination, by default (again, you can change this).
Instances with an EBS root volume can be stopped in addition to being terminated. The stopped state is quite useful - for example, it allows you to remove the root EBS volume, and attach it to another instance (e.g. to fix a problem that prevents booting).
Keep in mind that instances may fail - termination protection only prevents you from accidentally terminating your instance (essentially, it adds an extra step if you want to terminate your instance). If you use an S3 backed instance, your data will be lost when an instance fails.
S3 backed instances cannot be stopped (only terminated). Stopped (EBS root) instances, do not lose their data (although, IP addresses, etc may change). A restart (e.g. via SSH) does not affect the data on any instance (S3 or EBS).
With regard to persistence, EBS volumes are replicated within an Availability Zone - but durability decreases with size and the amount of data changed since the last snapshot. AWS quotes 'an annual failure rate of 0.1-0.5% for 20GB or less of data modified since your last EBS snapshot'
An EBS backed instance will function just like a VPS but you have some additional flexibility (and some additional costs).
Security groups do not offer stateful rules - just ports and packet types. If you want any more complex or dynamic rules (e.g. blacklisting IPs, using the recent or limits modules, etc) - you will still need iptables. Especially for things like SSH/Email which tend to get a lot of unwanted intrusion attempts, some sort of stateful firewall is probably advisable. The advantage of the security groups is that it is external to your instance - the blocked packets don't reach your instance (unlike with iptables) - which is a large advantage - also, you can specify permissions by groups, IPs, or instance-id.
Solution 2:
if i use an EBS backed ubuntu AMI, where exactly is that EBS volume? it's the root device, right?
Right, your EBS volume is the root device, generally at /dev/sda1 in debian/ubuntu servers.
how persistent is it? i have termination-protection enabled, so should i worry about any data loss? can i stop/start (or issue a reboot/shutdown command on ssh) the instance without any fear of data loss?
It's redundant, and you can "snapshot" it to backup the volume. Termination protection is a setting to not allow you to shutdown the instance by mistake. You can start/stop it at any time and your data will be there, only instances working with ephemeral storage loss their data when you shutdown them.
i still am running my mail and web servers on traditional VPSes for now, but would like to gradually move to the cloud. if i use an EBS backed instance, will it function just like a VPS? if not, what's the difference (obviously, other than elasticity, scalibility, etc.. just in functionality is what i want to know)
You won't probably notice any difference, but you can do a lot more with the AWS services.
one more question, i noticed the security groups, if i configure a security group for an instance, does it mean i wont have to worry about iptables anymore?
For basic usage and experimentation, I'll say it's safe to just use the security groups.