Python slow read performance issue
I will focus on only one of your examples, because rest things should be analogical:
What I think, may matter in this situation is Read-Ahead (or maybe another technique related to this) feature:
Let consider such example:
I have created 1000 xml files in "1" dir (names 1.xml to 1000.xml) as you did by dd command and then I copied orginal dir 1 to dir 2
$ mkdir 1
$ cd 1
$ for i in {1..1000}; do dd if=/dev/urandom of=$i.xml bs=1K count=10; done
$ cd ..
$ cp -r 1 2
$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time strace -f -c -o trace.copy2c cp -r 2 2copy
$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time strace -f -c -o trace.copy1c cp -r 1 1copy
In the next step I debugged cp command (by strace) to found out in what order data are copied:
So cp does it in following order (only first 4 files, because I saw that the second read from original directory is more time consuming that second read from copied directory)
100.xml 150.xml 58.xml 64.xml ... * in my example
Now, take a look on filesystem blocks which are used by these files (debugfs output - ext3 fs):
Original directory:
BLOCKS:
(0-9):63038-63047 100.xml
(0-9):64091-64100 150.xml
(0-9):57926-57935 58.xml
(0-9):60959-60968 64.xml
....
Copied directory:
BLOCKS:
(0-9):65791-65800 100.xml
(0-9):65801-65810 150.xml
(0-9):65811-65820 58.xml
(0-9):65821-65830 64.xml
....
As you can see, in the "Copied directory" the block are adjacent, so it means that during reading of the first file 100.xml the "Read Ahead" technique (controller or system settings) can increase performance.
dd create file in order 1.xml to 1000.xml, but cp command copies it in another order (100.xml, 150.xml, 58.xml,64.xml). So when you execute:
cp -r 1 1copy
to copy this dir to another, the blocks of files which you are copied are not adjacent, so read of such files take more time.
When you copy dir which you copied by cp command (so files are not created by dd command), then file are adjacent so creating:
cp -r 2 2copy
copy of the copy is faster.
Summary: So to test performance python/perl you should use the same dir (or two dirs copied by cp command) and also you can use option O_DIRECT to read bypassing all kernel buffers and read data from disk directly.
Please remember, that results can be different on different type of kernel, system, disk controller, system settings, fs and so on.
Additions:
[debugfs]
[root@dhcppc3 test]# debugfs /dev/sda1
debugfs 1.39 (29-May-2006)
debugfs: cd test
debugfs: stat test.xml
Inode: 24102 Type: regular Mode: 0644 Flags: 0x0 Generation: 3385884179
User: 0 Group: 0 Size: 4
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 2
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x543274bf -- Mon Oct 6 06:53:51 2014
atime: 0x543274be -- Mon Oct 6 06:53:50 2014
mtime: 0x543274bf -- Mon Oct 6 06:53:51 2014
BLOCKS:
(0):29935
TOTAL: 1
debugfs: