When do static files get retrieved during SMTP?

Solution 1:

Yes, it depends on how the image was referred to.

You can place the image in a MIME formatted message, and refer it with cid: as described in RFC 2392. Then the image is transmitted along with the rest of the message, and everyone who handled the message also could directly access the image without any further retrieval - it is part of the message, as long as nobody strips the attachment. For mails with a single html part, embedding the image right there in a data: URI works similar, though invisible to MUAs not rendering html parts.

You also can link the image for later retrieval, and refer it with a common protocol spoken by the MUA, typically https:. Then the image might be individually retrieved by every software handling the mail - and depending on the recipient configuration, yes, the email might be changed to refer to a cached version of the external resource instead of the URL provided by the sender. A common setup is for the recipients MTA to download the image, check it for inappropriate contents (whether it is ransomware), and rewrite the mail so it no longer contains any external references.

If the message is not rewritten and the recipients MUA is configured to fetch external resources, the download might even happen right when the message is opened by the recipient (and possibly multiple times on later occasions, if not cached). This simple approach was common, but as it was often used to notify the sender about when, how often and from where external resources were fetched, it is no longer typical (think of it like involuntary read receipts).

Solution 2:

There are two different cases here at hand: the one is referring to some media (the dog image) from an HTML message body and the other one is sending some chunk of data with the message in the form of an attachment.

In the first case, SMTP - and in fact any other protocols involved such as IMAP, POP3, LMTP - does not care at all about the image. It is completely up to the receiver's MUA to make sense of the message body, meaning that it tries to render the HTML and while doing so, retrieves the image using HTTP as any web browser would do.

The second case would involve the sender attaching the dog image that she/he has stored on her/his local machine directly to the message. This forms a multipart message where the binary data of that image is encoded in some text form (usually base64 or so-called quoted printable) and placed somewhere in the message body. You can find more information about that in RFC 2045 and the four following RFCs. If your MUA has some "save message as..." function, it will likely store a message in eml format, which actually is exactly what the above-mentioned RFC describes. This might give you some insight if you create, then save those messages and then open them using your favorite text editor.