what is the relation between block size and IO?

I have been reading about disk recently which led me to 3 different doubts. And I am not able to link them together. Three different terms I am confused with are block size, IO and Performance.

I was reading about superblock at slashroot when I encountered the statement

Less IOPS will be performed if you have larger block size for your file system.

From this I understand that If I want to read 1024 KB of data, a disk(say A) with block size 4KB/4096B would take more IO than a disk(Say B) with block size of 64KB.

Now my question is how much more IO would disk A need ?.

As far as I am understanding the number of IO request required to read this data would also be dependent on the size of each IO request.

  • So who is deciding what is the size of the IO request? Is it equal to the block size? Some people say that your application decides the size of IO request which seems fair enough but how then OS divides the single request in multiple IO. There must be a limit after which the request splits in more then one IO. How to find that limit ?
  • Is it possible that in both disk (A and B) the data can be read in same number of IO?
  • Does reading each block means a single IO ? If not how many blocks can be maximum read in a single IO?
  • If the data is sequential or random spread, does CPU provides all block address to read once?

Also

num of IOPS possible = 1 /(average rotational delay + avg seek time)

Throughput = IOPS * IO size

From above the IOPS for a disk would be fix always but IO size can be variable. So to calculate the maximum possible throughput we would need maximum IO size. And from this what I understand is If I want to increase throughput from a disk I would do request with maximum data I can send in a request. Is this assumption correct ?

I apologize for too many questions but I have been reading about this for a while and could not get any satisfactory answers. I found different views on the same.


I think the Wikipedia article explains it well enough:

Absent simultaneous specifications of response-time and workload, IOPS are essentially meaningless.
...
Like benchmarks, IOPS numbers published by storage device manufacturers do not directly relate to real-world application performance. ...

Now to your questions:

So who is deciding what is the size of the IO request?

That is a both an easy and a difficult question to answer for a non-programmer like myself.

As usual the answer is an unsatisfactory "it depends"...

I/O operations with regards to disk storage by an application are usually system calls to the operating system and their size depends on which system call is made...

I'm more familiar with Linux than other operating systems, so I'll use that as reference.

The size of I/O operations such as open() , stat() , chmod() and similar is almost negligible.
On a spinning disk the performance of those calls is mainly dependant on how much the disk actuator needs to move the arm and read head the correct position on the disk platter.

On the other hand the size of a read() and write() calls is initially set by the application and can vary between 0 and 0x7ffff000 (2,147,479,552) bytes in a single I/O request...

Of course once such a system call has been made by the application and is received by the OS, the call will get scheduled and queued (depending on wether or not the O_DIRECT flag was used to by-pass the page cache and buffers and direct I/O was selected).

The abstract system call will need to be mapped to/from operations on the underlying file-system which is ordered in discrete blocks (the size of which is usually set when the file-system was created) and eventually the disk driver operates on either hard disk sectors of 512 or 4096 bytes or SSD memory pages of 2K, 4K, 8K, or 16K.

(For benchmarks typically the read and write calls are usually set to either 512B or 4KB which align really well with the underlying disk resulting in optimal performance.)

There must be a limit after which the request splits in more then one IO. How to find that limit ?

Yes there is a limit, on Linux as documented in the manual a single read() or write() system call will return a maximum of 0x7ffff000 (2,147,479,552) bytes. To read larger files larger you will need additional system calls.

Does reading each block means a single IO ?

As far as I understand typically each occurrence of a system call is what counts as an IO event.

A single read() system call counts as 1 I/0 event and neither X nor Y IO's regardless of how that system call get translated/implemented to accessing X blocks from a filesystem or reading Y sectors from a spinning hard disk.