How come Automator can't find images on sites like this:?

Solution 1:

Here are two different examples using Safari of how I'd go about downloading the individual thumbnail images of the brain scans at the target URL.

The example AppleScript code, shown below, was tested in Script Editor under macOS Catalina with Language & Region settings in System Preferences set to English (US) — Primary and worked for me without issue1.

  • 1 Assumes necessary and appropriate settings in System Preferences > Security & Privacy > Privacy have been set/addressed as needed.


Semi-Automated Method

  1. Save the already opened Safari webpage containing the images of brain scans as a Web Archive.
  2. Run the example AppleScript code, shown directly below, in Script Editor.

Example AppleScript code:

property saveToFolder : POSIX path of ¬
    (path to downloads folder from user domain)

set thePropertyListFilePath to the POSIX path of ¬
    (choose file with prompt ¬
        "Select the saved Safari document " & ¬
        "of the brain scans..." default location ¬
        (path to desktop folder from user domain))

tell application "System Events"
    tell property list file thePropertyListFilePath
        
        set theWebSubresourcesCount to the ¬
            (count of property list items) of ¬
            property list item "WebSubresources"
        
        set n to 0
        repeat with i from 0 to theWebSubresourcesCount - 1
            
            set theWebResourceURL to the value of ¬
                property list item "WebResourceURL" of ¬
                property list item i of ¬
                property list item "WebSubresources"
            
            if theWebResourceURL contains "cgi-bin/imageservice" then
                
                set theWebResourceData to the value of ¬
                    property list item "WebResourceData" of ¬
                    property list item i of ¬
                    property list item "WebSubresources"
                
                if n is less than 10 then
                    set n to "00" & n
                    set imageName to "Image_" & n
                else if n is less than 100 then
                    set n to "0" & n
                    set imageName to "Image_" & n
                else
                    set imageName to "Image_" & n
                end if
                
                set theFile to saveToFolder & imageName & ".jpg"
                my writeToFile(theWebResourceData, theFile, true)
                set n to n + 1
                
            end if
            
        end repeat
        
    end tell
end tell


--  ## Handler ##

to writeToFile(theData, theFile, overwriteExistingContent)
    try
        set theFile to theFile as string
        if theFile contains "/" then
            set theOpenedFile to ¬
                open for access theFile with write permission
        else
            set theOpenedFile to ¬
                open for access file theFile with write permission
        end if
        if overwriteExistingContent is true then ¬
            set eof of theOpenedFile to 0
        write theData to theOpenedFile starting at eof
        close access theOpenedFile
        return true
    on error
        try
            close access file theFile
        end try
        return false
    end try
end writeToFile

Notes:

The example AppleScript code, shown directly above, will open a choose file dialog box for you to pick the just saved Safari webpage containing the images of brain scans.

It will then extract the target images of the brain scans from the saved Safari webpage containing the images of brain scans, writing them to the Downloads folder, sequentially named as Image_000.jpg thru the number of images of brain scans on the saved Web Archive webpage.

This method is very quick compared to the Fully-Automated Method shown below and should only takes ~ 10 seconds after selecting the target file, depending on how fast your CPU is.

Unfortunately, it appears that the AppleScript save command in Safari is broken and why the this is posted as a semi-automated method. If the target webpage was to be programmatically saved first, then this too could be a fully-automated method.

The reason there are two methods is this one was an afterthought, however, because it's much quicker than the original method I posted, I've placed it first and kept the original as it's always good to have more than one way to accomplish the task with the target application.



Fully-Automated Method

(Original Answer)

The example AppleScript code creates a new window in Safari to the target URL, waits for the page to finish loading, and then builds out a list of curl commands to be used in a do shell script command to download each image, naming them sequentially in the Downloads folder.

Note that once you start the example AppleScript code you must wait until it starts downloading the images before doing anything else in Safari, which should only take a few seconds, depending on how fast your Internet connection is.

Once the downloading starts, it waits a random length of time, between 3 and 8 seconds, to download the next image, as you do not want to inundate the server with rapid requests, as it may consider it an attack.

As currently coded, it should take ~ 20 minutes to download all the images.



Example AppleScript code:

property downloadsFolder : POSIX path of (path to downloads folder from user domain)

property theURL : "https://connectivity.brain-map.org/static/referencedata/experiment/thumbnails/100142290?image_type=NEUN&popup=true"
property domainURL : "https://connectivity.brain-map.org"
property curlCMDs : {}

tell application "Safari"
    make new document with properties {URL:theURL}
    my waitForSafariPageToFinishLoading()
    delay 3 --  # In this case, an extra delay is needed even after page finishes loading.
    tell document 1
        set imageCount to ¬
            (do JavaScript ¬
                "document.getElementsByClassName('_ish_tnw_cell _ish_tnw_dark').length;") ¬
                as integer
        repeat with i from 0 to imageCount - 1
            set foo to ¬
                do JavaScript ¬
                    "document.getElementsByClassName('_ish_tnw_cell _ish_tnw_dark')['" & i & "'].innerHTML;"
            my buildOutCURLcommands(foo, i)
        end repeat
    end tell
end tell

--  # Download the images of brain scans.

repeat with thisCMD in curlCMDs
    do shell script thisCMD
    delay (random number from 3 to 8)
end repeat


--  ## Handlers ##

to buildOutCURLcommands(foo, i)
    tell current application
        set AppleScript's text item delimiters to "\""
        set imageURL to text item 6 of foo
        set AppleScript's text item delimiters to "&"
        set imageURL to text items of imageURL
        set AppleScript's text item delimiters to "&"
        set imageURL to imageURL as string
        set AppleScript's text item delimiters to ""
        set curlURL to quoted form of (domainURL & imageURL)
        if n is less than 10 then
            set i to "00" & i
            set imageName to "Image_" & i
        else if n is less than 100 then
            set i to "0" & i
            set imageName to "Image_" & i
        else
            set imageName to "Image_" & i
        end if
        copy ¬
            "curl " & curlURL & " -o " & downloadsFolder & imageName & ".jpg" to ¬
            the end of curlCMDs
    end tell
end buildOutCURLcommands


to waitForSafariPageToFinishLoading()
    --  # Wait for page to finish loading in Safari.
    --  # This works in **macOS Catalina** and 
    --  # macOS Big Sur and may need adjusting for
    --  # other versions of macOS.
    tell application "System Events" to repeat until ¬
        exists (buttons of groups of toolbar 1 of window 1 of ¬
            process "Safari" whose name = "Reload this page")
        delay 0.5
    end repeat
end waitForSafariPageToFinishLoading

Notes:

Looking as the source code of the target URL the images are laid out in a table and as such it makes it easy to ascertain the images URL shown as the value of src= in the HTML code of the cells of the table.

This script use AppleScript's text item delimiters to parse the HTML text returned by the second do JavaScript ... command, and while generally speaking parsing HTML should be done by applications/utilities specifically designed to do so, nonetheless, in this particular use case it works as intended and thus is used to accomplish the task at hand.

The second do JavaScript ... command returns, e.g.:

"<img class=\"_ish_tnw_img\" id=\"_img_0_0\" src=\"/cgi-bin/imageservice?zoom=4&amp;path=/external/connectivity/prod17/0500179232/0500179232.aff&amp;top=61168&amp;left=2176&amp;width=228&amp;height=181\" title=\"Position: 248\" _ish_tnw_index=\"0\" style=\"width: 153px;\">"

The buildOutCURLcommands(foo, i) handler uses AppleScript's text item delimiters to parse the value of foo and does the following:

  • Get the portion of the URL that points to the target image file from the value of foo.
    • Parses it base on the " in the string of HTML text getting the sixth item in the list, e.g.: /cgi-bin/imageservice?zoom=4&amp;path=/external/connectivity/prod17/0500179232/0500179232.aff&amp;top=61168&amp;left=2176&amp;width=228&amp;height=181
    • Replaces &amp; with & so the URL is properly formed.
  • Pads the value of i with leading zeros as needed to be three digits long for the name used for the downloaded image file.
  • Builds out the curl command while adding it to the curlCMDs list, to be executed by a do shell script command.

After this part of the script has finished the downloading begins and you can then do whatever you want with Safari, just let Script Editor continue running the script until it finishes.



Note: The example AppleScript code is just that and sans any included error handling does not contain any additional error handling as may be appropriate. The onus is upon the user to add any error handling as may be appropriate, needed or wanted. Have a look at the try statement and error statement in the AppleScript Language Guide. See also, Working with Errors. Additionally, the use of the delay command may be necessary between events where appropriate, e.g. delay 0.5, with the value of the delay set appropriately.

Solution 2:

The webpage doesn't contain the images directly, but rather some JavaScript that the browser runs. This code makes an API request for a new page, containing links to images. The code on the first webpage then reads the links to images from the second page and outputs markup to the browser containing these images.

Automator downloads the first page and looks at it for images, finding none. It doesn't execute JavaScript and make additional requests like the browser. You can't use Automator's ‘Save Images from Web Content’ with this webpage for that reason.

If you want to programmatically download content from this website, use its API. Use the web inspector to inspect the requests that the webpage makes, then make the request yourself and parse the output (e.g. jq to parse JSON from your linked website's API), downloading URLs found (with Automator or other scripting language).

Solution 3:

If you're not “married” to using Safari, this following alternate solution using Google Chrome, may be of some interest to you.

This following AppleScript code, as is, will create a new folder named "Brainmap Images_files" containing all of the thumbnail images, on your desktop.

Depending on the speed of your internet connection and computer, this process should take less than 12 seconds to complete

property savedImages : (path to desktop as text) & "Brainmap Images.html"
property savedImagesFolder : (path to desktop as text) & "Brainmap Images_files"
property theURL : "https://connectivity.brain-map.org/static/referencedata/experiment/thumbnails/100142290?image_type=NEUN&popup=true"

tell application "Google Chrome"
    activate
    try
        tell window 1 to set URL of (make new tab) to theURL
    on error errMsg number errNum
        set newWindow to make new window
        tell newWindow to set URL of active tab to theURL
    end try
    tell window 1
        repeat while active tab is loading
            delay 0.1
        end repeat
    end tell
    delay 1
    save window 1's active tab in file savedImages
end tell

tell application "System Events"
    repeat until folder savedImagesFolder exists
        delay 0.1
    end repeat
    delay 1
end tell

do shell script "find " & quoted form of POSIX path of savedImagesFolder & ¬
    " -type f | grep -v 'imageser' | xargs -I {} rm {}"

do shell script "find " & quoted form of POSIX path of savedImagesFolder & ¬
    " -type f | xargs -I {} mv {} '{}.jpg' ; rm " & ¬
    quoted form of POSIX path of savedImages