Running MySQL on Amazon EC2 with EBS (Elastic Block Store) and Elastic Beanstalk

I'm new to Elastic Beanstalk, but I created an environment with a container running 64bit Amazon Linux running PHP 5.3. I want to get MySQL set up on EBS, then install phpMyAdmin and import my databases.

However, I don't know how to do this because the documentation is not working for me: Running MySQL on Amazon EC2 with EBS (Elastic Block Store)

Since that guide was last updated on March 23, 2010 I guess it might be out of date.

Here's what I did:

  1. Set up the EC2 instance, attached EBS volume, connected to the instance via ssh, and got root access with the command "sudo su -" (so far so good).
  2. According to the guide, I should now run the "sudo apt-get update && sudo apt-get upgrade -y" command, but the apt-get command is not found. No problem, I ran yum update and yum upgrade.
  3. Similarly, "sudo apt-get install -y xfsprogs mysql-server" didn't work, so I ran "sudo yum install -y xfsprogs mysql-server" and this installed MySQL.
  4. The guide says to create an XFS file system on /dev/sdh, but my EBS volume is attached to /dev/sda1 (this is the default when you use Elastic Beanstalk and it cannot be changed) and there already is a file system on it (I guess Elastic Beanstalk automatically creates one), so it doesn't let me run the "sudo mkfs.xfs /dev/sdh" command (I changed it to "sudo mkfs.xfs /dev/sda1" because that's where my volume is attached).
  5. The next 3 commands in the guide seemed to execute (no error messages), so now I've got a my EBS volume mounted as /ebsvol. However, I noticed that /ebsvol has a copy of the root directory structure in /ebsvol. However, /ebsvol/ebsvol is an empty directory. At this point I'm a bit worried, but I carry on.
  6. I stop the MySQL server using the command "/sbin/service mysqld stop" because the command in the guide (sudo /etc/init.d/mysql stop) did not work.
  7. Now I have to move the existing database files to the EBS volume and point MySQL to the EBS volume. This was a disaster because the mysql files are not located where the guide says they should be. Now I can't restart the MySQL server.

Help!

  1. Is there an updated guide that will configure MySQL to use an EBS volume on Elastic Beanstalk? Searching Google, stackoverflow, serverfault result in a couple of guides from 2008/2009, so they haven't been tested with Elastic Beanstalk. (EDIT: is there something like the Eric Hammond documentation, but for CentOS/RHEL?)
  2. How can I undo all the mount binds which are preventing MySQL from starting? Should I just delete the instance and volume and start over?
  3. Am I wasting my time trying to set up MySQL on EBS on Elastic Beanstalk? Based on the lack of information available, it's either a) trivially easy to do and no guide is needed (therefore making me an idiot); or, b) nobody sets up MySQL on Elastic Beanstalk because it's unnecessary/redundant (therefore I shouldn't bother).

EDIT: Thank-you for your comments and answers. I'm trying to figure out the best way to set up a PHP/MySQL website on AWS, so Elastic Beanstalk seemed like a good idea because AWS markets it as "an even easier way for you to quickly deploy and manage applications in the AWS cloud."

However, that doesn't seem to be entirely true if you want to run MySQL with EBS. From what I gather, I guess everyone uses RDS with Elastic Beanstalk because it scales automatically and presumably has automated snapshot functionality.

So I guess I'm left with these options:

1) Don't use Elastic Beanstalk: set up an Ubuntu EC2 instance with MySQL running on an EBS volume, as per the Eric Hammond documentation (it sounds like there would be some scalability issues?).

2) Use Elastic Beanstalk: set up my databases on RDS (no scalability issues).

3) Use Elastic Beanstalk: but with an ubuntu AMI configured with MySQL running on an EBS volume (is this possible? will Elastic Beanstalk work with a private AMI?)

4) Use Elastic Beanstalk: start over and use cyberx86's instructions on adapting the ubuntu instructions to work on CentOS/RHL.

At this point my database and site traffic are quite small. It would be nice to make it scaleable, but at this point I just want to get it running in a way that lets me deploy new versions using git after I've got my code working on localhost. Top priority is to get the site up and running and get back to marketing and building features instead of spending time on hosting. What should I do?


Solution 1:

I don't use Elastic Beanstalk - but the guide you are following is for EC2 (which I can definitely help with). The first difficulty you have is that the guide you are using is for Ubuntu 9.10; Amazon's Linux is based on CentOS/RHEL - so you would have an easier time if you could find a CentOS 6 guide.

The root of your issue seems to stem from 'attaching an EBS volume'. On EC2 you can attach multiple EBS volumes to a single instance. All instances have a root volume - these can be either S3-backed or EBS-backed. By far, the preferred approach is to use an EBS backed root volume (it costs a bit more, but makes up for it in flexibility and durability). An instance with an EBS root volume will almost always have this volume attached as /dev/sda1 - on modern Linux systems, the device actually shows up as /dev/xvda1 (and it is the latter which should be passed to any commands). (Other than trying to format a mounted volume - you were trying to format your root file system with the instance running - i.e. you were trying to erase your operating system, definitely not a good idea, if it is even possible).

In this case, the suggestion is to add a second EBS volume - attach it to your instance (e.g. as /dev/sdh, but use /dev/xvdh for commands), and use that for storing your MySQL data. (Despite not using Elastic Beanstalk) I find it hard to believe that Elastic Beanstalk would not allow you to attach a second volume - as this functionality is fairly central to EC2.

You should be able to get a list of the EBS devices by running cat /proc/partitions (or using fdisk -l).

You will note that in step 5 of what you have done, you are actually mounting the root volume within itself (i.e. /dev/sda1 is already mounted as / and you are mounting /dev/sda1 as /ebsvol) - it is best to avoid doing that.

Also, while /etc/init.d/mysql stop did not work, /etc/init.d/mysqld stop probably would have worked. (Again, you can get a list of the init.d scripts by running ls /etc/init.d - and should be able to use those paths, like you, I usually use the service command though).

The MySQL databases should be in /var/lib/mysql - however, your mountpoints in /etc/fstab are probably incorrect (given the ebsvol within /ebsvol problem). When you cd /var/lib/mysql you should be able to see your databases - if not your mounts haven't worked correctly. (Verify that /var/lib/mysql is mounted on a different device by mountpoint -d /var/lib/mysql and compare the device to cat /proc/partitions).

The basic ideas of the guide you are following are quite valid - it is common practise to put your data and databases on a different EBS volume than your root volume, as it offers numerous advantages (performance, ease of snapshotting, easier to move between instances, etc.), and the basic Linux commands haven't changed - they are just for a Ubuntu.

Undo your mounts with umount /path - just like you normally would, of course, you will need to ensure that the device is not busy (which may not be a problem if you haven't managed to start MySQL). umount is only temporary though - so you will have to edit /etc/fstab and remove any references to the mount points from there also. If you don't have anything of value on the instance, you might be better off starting over (not because it is difficult to unmount a few volumes, but rather because it is always easier to figure out where you went wrong when you start from a known state).

Finally, with regard to MySQL on Elastic Beanstalk: the point of Elastic Beanstalk is supposed to be that it handles provisioning of resources and scaling automatically - it is still based on the core AWS components (e.g. EC2, S3, ELB, etc) but it will do some things for you. Elastic Beanstalk usually uses RDS to handle MySQL databases. RDS is an Amazon managed version of MySQL which simplifies the provisioning and scaling of MySQL instances. Keep in mind that MySQL doesn't lend itself well to autoscaling without a lot of setup. You can't just launch a second MySQL instance and have the load split between your two instances - you need to setup replication, which may not be a simple task).

Essentially, if you are able to setup MySQL in such a way that it runs from your web server instances and can autoscale seamlessly, you'd almost certainly be better off using EC2 directly and not bothering with Elastic Beanstalk. I'd suggest therefore, the most people don't actually setup MySQL on Elastic Beanstalk (what you could do is setup a separate MySQL instance, but if you are using Beanstalk, RDS is probably a simpler approach).


Edit:

Unlike a lot of other services that operate mostly as a black box, Elastic Beanstalk does give you access to the underlying components. That said, if you are going to go through the effort of setting up your EC2 instances manually, you have negated the point of Elastic Beanstalk.

If you are using EC2, there are few approaches to PHP/MySQL:

  1. You can host both your webserver and database on a single instance - when you are starting out, this can be a reasonable approach, however, it doesn't scale horizontally very well (but you can still scale vertically - using larger instances). Hopefully by the time you exceed the capacity of the x-large instances, you will be in a position to setup a more complex setup. That said, it is bad for redundancy - everything is on that single instance, and a failure of any component takes down your whole setup.
  2. You can host your webserver on one instance, and use RDS for your database. Most well designed applications will tax the web server more than the database (and the database load will ideally be read-biased). In such a scenario, you can scale your web server instances relatively easily (e.g. by putting them behind an ELB - with just a bit of effort to ensure that all are serving the same content). RDS is MySQL managed by AWS - it isn't quite fully automatic, but it does go a long way towards autoscaling. Essentially, RDS will provision multiple read-only slaves, and a single write-master, with multiple hot-backups that can take over if you need. The downside is that you are paying for all those instances that are running (and you don't have full control over some of the intricate settings of MySQL).
  3. The final approach would be to use your web server cluster and your own MySQL cluster. Essentially, you can scale your web instances (as above), and then you will setup MySQL instances that will scale separately. You will need to look into MySQL replication (or perhaps use MySQL cluster if you can adapt your application to its data structures).

A few other answers on the same topic:

  • RDS vs EC2
  • Deploying applications
  • Some approaches to EC2 scaling

My perspective is usually that one click solutions aren't the best approach - I like the control that is offered by doing something manually. I find that not only do I usually end up with a more tailored and efficient end result, but I also have a much better understanding of how the system works, which makes figuring out what is wrong much easier. You can always automate your own setups once you have a good understanding of the intricacies of them.

One point to keep in mind about RDS - it is already EBS backed. RDS is MySQL - it isn't something similar, or another relational database. It is a managed instance of MySQL running on EBS backed EC2 instances. AWS will keep the software up to date, and you can do normal EBS snapshots of your data, etc. You just don't have direct access to the underlying software running on the instance.

As for the choice of operating system, I am partial to Amazon's Linux. It is well supported by AWS and uses a minimum of resources - it is fully compatible with CentOS (as a matter of fact, it includes the EPEL repository by default in the latest version). The usual viewpoint is to use whatever Linux distribution you are comfortable with, as the differences are usually minor (CentOS will work just as well as Ubuntu for the instructions you are working from - most commands (except apt-get) are the same on CentOS. Given that my own setup has the databases on a separate EBS volume using Amazon's Linux, I can assure you that it is not difficult to do).

I'd suggest that there are some main considerations:

  • Comfortability with/willingness to learn Linux systems - if you don't mind setting up your own servers and want to get a better understanding of them, I'd definitely go the EC2 route. You'll end up with a better end result if you do it right and will have more versatility in the long run. I will mention though, that if you are taking this approach, you want to really understand what the commands you are running do - just following a guide will not be enough if you really want to commit to it.
  • Budget - remember that with AWS everything has a price. The more AWS does for you, the more they charge you. An RDS instance costs about 30% more than an equivalent EC2 instance (and there is not micro instance) and if you want the redundancy they offer, you need to be running multiple RDS instances (and paying for each of those). Elastic Beanstalk will provision instances, load balancers, RDS instances, etc. for you - the costs add up quickly.
  • Time - if you have no time, want to press a couple of buttons and have something functional, Elastic Beanstalk is probably the best approach for you.

I would advise against using Elastic Beanstalk with MySQL baked into your AMI - it will likely be quite unstable, if it works at all. (Just think about what happens when it adds and removes an instance to you cluster, or when data goes to one instance instead of the other...)

It is great to keep scalability in mind - but don't optimize things too soon, or you will never get anything done. Definitely keep it in mind, but if the cost (time, money, etc.) of making a particular component scalable is not practical at the moment, don't worry too much about it - when the time comes to scale it, you'll figure it out (most popular sites started out that way, afterall).

I'd advise that if your application is designed so that it can take advantage of some caching, it will go a long way.

Typically, on EC2 it is better to scale vertically (to larger instances) than horizontally (to more instances). To begin with however, you want to scale to two instances so that you have some redundancy and minimize your single points of failure. A possible approach, therefore, may be:

  1. Start with a micro instance - have both your database and application on it (you can't get any smaller than this, which makes it a good starting point).
    • This is of course, quite easy to scale vertically, just keep upgrading your instance until you are using x-large instances. The problem comes down to redundancy - if there is any problem with your instance, your application is offline.
  2. Now, you usually want to separate of your database to another instance (since a) the database will see different load than your application and b) you can't autoscale MySQL in quite the same way as web servers), but micro instances just don't handle load well, so I'd suggest upgrading to a larger instance first, at least a small, and then, perhaps a medium (basically, the idea is that once you need larger instance types, the effect is presumably greater)
  3. Separate your database from your web server. This will allow you to cater to the different needs for databases (e.g. high memory) vs web servers (e.g. higher cpu) and the differences between how you scale each (Recommended reading). At this point you might decide to use RDS instead of running your own MySQL instance.
  4. Now that you have your application running on a dedicated instance, you can scale it and not worry about your database - setup autoscaling so that you have some redundancy. This should automatically add more application nodes as any of them fail or as load exceeds the thresholds you specify.
  5. Add a second database node and configure replication between your nodes (if you opt to use MySQL cluster, or NoSQL solutions, you should be able to setup autoscaling as well). Everything should at this point have redundancy, and even if a node fails, you should still be online.
  6. Upgrade one instance at a time to larger instance sizes as demand merits it.

Solution 2:

Now that I've become a bit more familiar with Elastic Beanstalk and EC2, I've decided to forego using Elastic Beanstalk because although it's got some cool features, it's too regimented for my liking. For example, I don't like the fact that I can't change the httpd.conf file (well, you can change it, but those changes disappear when your environment is restarted). Another reason is that the only way to run Elastic Beanstalk with MySQL (properly, i.e., with auto-scaling and automated backups) is to use RDS. Even though you can get 3 months of RDS free with a new sign-up, I'm not at the scale where I need its features, so it's not worth it for me to pay ~$76/month for RDS.

Bottom Line: If you have a reasonable amount of traffic and you need a solution that scales and takes care of itself, Elastic Beanstalk with RDS is a great option. I like the fact that you can deploy using git. It's like Heroku for PHP. The getting started guide should include instructions for setting up MySQL.

What I did: I opted to use "Synthetic Elastic Beanstalk": I can recreate its functionality using the various offerings from AWS and have the flexibility to configure it exactly the way I want. While I'm kicking the tires of AWS, I've set up MySQL on the same EC2 instance as my webapp (not ideal once you need to scale, but perfectly fine while you're learning the ropes of using AWS).

This guide is what I used to set up a LAMP stack on an EC2 instance running an Amazon linux AMI. I find it easier to import my databases with phpmyadmin. Because I'm using a micro instance, it uses EBS by default and you can take snapshots to backup your data. I would recommend setting up the CLI tools for EC2 and running

ec2-modify-instance-attribute --block-device-mapping "/dev/sda1=:false" i-xxxxxxx

where i-xxxxxxxx is the EC2 instance the EBS volume is attached to. This prevents the EBS volume from being deleted if your instance terminates. Since that EBS volume is where everything is run from and where my database is stored, I don't want to lose it (while I was playing around with Elastic Beanstalk my EC2 instance was terminated and Elastic Beanstalk instantly started another one with a new EBS volume attached, but I had fortunately changed the DeleteOnTermination setting to "false" for the original EBS volume, so I was able to stop the new instance, detach the new EBS volume, and attach the old EBS volume, thus preserving my MySQL installation and database).

Overall, the whole process of moving a web app to AWS is still quite a pain in the ass. Now that I've gone through the learning curve, I feel more comfortable with it, but I can't help thinking there should be better documentation for getting started.