Tuning sequential disk reads for performance
You say that you want to minimize read IOPS and maximise the size of each IO request. I suspect that you wouldn't really benefit from this though. Normally I'd care about maximizing throughput while minimizing latency, and finding a good balance of those two for the particular application.
Note that when you went from a 128kB readahead to a 256kB readahead, read throughput actually dropped from 103.88MB/s to 102.50MB/s. I wouldn't expect this trend to reverse at a higher readahead size. The higher readahead also brings a risk of more wasted IO if the data is not purely sequential, which would reduce performance of useful IO.
If you're interested, the 512kB limit probably comes from another layer in the storage stack such as the SCSI driver, the controller firmware, or the bus.
To throttle IO you could look at the following: How to Throttle per process I/O to a max limit?