Slow synchronisation stage on gsutil rsync?

I've just started to use GCS as backup for my web servers. One server has 1.2 million JPEGS (3.5TB) and this all rsynced over flawlessly over 10 hours or so.

The other has 2.5 million JPEGS (just thumbnails/previews though - 300GB total). The first time I did it the "building synchronization state" went through all 2.5 million quite quickly. A few minutes. My session got interrupted though (wifi dropped) and when I SSHed in to try to run it again the "At source listing" prompt quickly nips through 10000, 20000, 30000. Then grinds to a near halt. Half an hour later it's only up to 300,000. I know it has to work out what files the destination has too, but I don't feel that should significantly slow down the "At source listing..." echoes?

Does it suggest a problem with my filesystem, and if so what should I check?

Or is it expected behaviour, for any reason?

Is trying to use gsutil rsync with 2 million files to one bucket a bad idea? I could find no guidelines from google on how many files can sit in a bucket so I'm assuming it's billions/unlimited?

FWIW the files are all in nested subdirectories, with no more than 2000 files in any one directory.

Thanks

edit: the exact command I'm using is:

gsutil -m rsync -r /var/www/ gs://mybucketname/var/www

Solution 1:

I have discovered that changing

output_chunk.writelines(unicode(''.join(current_chunk)))

to

output_chunk.write(unicode(''.join(current_chunk)))

in /gsutil/gslib/commands/rsync.py makes a big difference. Thanks to Mike from the GS Team for his help - this simple change has been rolled out on github already:

https://github.com/GoogleCloudPlatform/gsutil/commit/a6dcc7aa7706bf9deea3b1d243ecf048a06a64f2