How long should I wait after applying an AWS IAM policy before it is valid?
The phrase "almost immediately" is used 5 times in the IAM FAQ, and is, of course, somewhat subjective.
Since AWS is a globally-distributed system, your changes have to propagate, and the system as a whole seems to be designed to favor availability and partition tolerance as opposed to immediate consistency.
I don't know whether you've considered it, but it's entirely within the bounds of possibility that you might actually, at step 4 in your flow, see a sequence of pass, fail, pass, pass, fail, fail, fail, fail... because neither a bucket nor an object in a bucket are actually a single thing in a single place, as evidenced by the mixed consistency model of different actions in S3, where new objects are immedately-consistent while overwrites and deletes are eventually consistent... so the concept of a policy having "had an effect" or not on the bucket or an object isn't an entirely meaningful concept since the application of the policy is, itself, almost certainly, a distributed event.
To confirm such an application of policies would require AWS to expose the capability of (at least indirectly) interrogating every entity that has a replicated copy of that policy to see whether it had the current version or not... which would be potentially impractical or unwieldy to say the least in a system as massive as S3, which has grown beyond a staggering 2 trillion objects, and serves peak loads in excess of 1.1 million requests per second.
Official AWS answers to this forum post provide more information:
While changes you make to IAM entities are reflected in the IAM APIs immediately, it can take noticeable time for the information to be reflected globally. In most cases, changes you make are reflected in less than a minute. Network conditions may sometimes increase the delay, and some services may cache certain non-credential information which takes time expire and be replaced.
The accompanying answer to what to do in the mean time was "try again."
We recommend a retry loop after a slight initial delay, since in most circumstances you'll see your changes reflected quite quickly. If you sleep, your code will be waiting far too long in most cases, and possibly not long enough for the rare exceptions.
We actively monitor the performance of the replication system. But like S3, we guarantee only eventual consistency, not any particular upper bound.
I have a far less scientific answer here... but I think it will help some other people feel less insane :). I kept thinking things were not working while they were just taking more time than I expected.
Last night I was adding an inline policy to allow a host to get parameters from the system manager. I thought it wasn't working because many minutes after the change (maybe 5 or so), my CLI commands were still failing. Then, they started working. So, that was a fairly large delay.
Just now, I removed that policy and it took 2-3 minutes (enough to google this and read a couple other pages) before my host lost access.
Generally things are quite snappy for me as well, but if you're pretty sure something should work and it's not, just do yourself a favor and wait 10 minutes. Unfortunately, this makes automation after IAM changes sound harder than I thought!