LVM and disaster recovery
Solution 1:
Lets say we have two physical drives, sda and sdb. Both are 100 Megs. I put them into VolumeGroup1 and create one 200 meg LogicalVolume1.
What would happen if I create a 150 meg file? Would 100 megs physically be on sda and 50 on sdb?
Correct (assuming the filesystem was empty before the file was created).
If so, what tells the OS that a piece of the file is on one drive, and another piece is on the other?
LVM tells the operating system that there is one single 200MB disk. The LVM part of the kernel (it comes in two parts, userspace management tools and kernel drivers) will then map what the operating system sees to physical locations/blocks on the disks.
What about drive failure? Assuming no RAID, if sdb fails, will all the data on sda be lost? Is there anyway to control what files are on what physical drives?
Yes, consider the data lost.
If you create smaller Logical Volumes then you can use the pvmove
command to move them from disk to disk.
How do you generally manage LVM? Do you create one or two large Volume Groups then make partitions as it makes sense? Any other tips?
I tend to create large Volume Groups and then create Logical Volumes as needed. There is no need to fully allocate all the space in a Volume Group; allocate it when it is needed. It's easy to increase the size of a Logical Volume, and pretty much all modern filesystems can be easily grown, too.
Solution 2:
The underlying thing that lets LVM and Software Raid in Linux work is the device mapper portion of the kernel. This is what abstracts the block addresses of the physical devices to the virtual block devices that you're using.
When using LVM as with anything when it comes to data you do need to be aware of the data availability repercussions. That's not to say that LVM is dangerous in fact when the proper practices are used it's impact on availability is minimal.
In the scenario you suggest in your question the availability of your data would be the same as a RAID0 where if any drive fails it would result in data loss.
In practice I would not use LVM without running it on some sort of RAID. I have used LVM on a 30TB file server that had about 20 Hardware RAID5 volumes in one VG. But if you have enough free Extents you can use pvmove to migrate the data off one or more PV's should it start to give you problems.
But always have a backup strategy in place that is tested from time to time.
Solution 3:
How do you generally manage LVM? Do you create one or two large Volume Groups then make partitions as it makes sense?
My general strategy is to put into separate volume group the physical volumes that might possibly be migrated (as a whole set) to another system.
If you have external storage, it is good idea to put it in a separate volume group. It is physically easy to disconnect it from this computer and connect to another, so it should be similarly logically easy to export/import it in LVM, keeping the data intact.
If you already have a vg00 on internal disk(s), and then you buy another internal disk for your machine, ask yourself a question: will the data on the new disk be bound to vg00, and there would be no sense ever in moving the data to another system? In this case, it should be part of vg00. Otherwise, I would create vg01, as it can be easily exported/imported on its own.