How come Automator can't find images on sites like this:?
Solution 1:
Here are two different examples using Safari of how I'd go about downloading the individual thumbnail images of the brain scans at the target URL.
The example AppleScript code, shown below, was tested in Script Editor under macOS Catalina with Language & Region settings in System Preferences set to English (US) — Primary and worked for me without issue1.
- 1 Assumes necessary and appropriate settings in System Preferences > Security & Privacy > Privacy have been set/addressed as needed.
Semi-Automated Method
- Save the already opened Safari webpage containing the images of brain scans as a Web Archive.
- Run the example AppleScript code, shown directly below, in Script Editor.
Example AppleScript code:
property saveToFolder : POSIX path of ¬
(path to downloads folder from user domain)
set thePropertyListFilePath to the POSIX path of ¬
(choose file with prompt ¬
"Select the saved Safari document " & ¬
"of the brain scans..." default location ¬
(path to desktop folder from user domain))
tell application "System Events"
tell property list file thePropertyListFilePath
set theWebSubresourcesCount to the ¬
(count of property list items) of ¬
property list item "WebSubresources"
set n to 0
repeat with i from 0 to theWebSubresourcesCount - 1
set theWebResourceURL to the value of ¬
property list item "WebResourceURL" of ¬
property list item i of ¬
property list item "WebSubresources"
if theWebResourceURL contains "cgi-bin/imageservice" then
set theWebResourceData to the value of ¬
property list item "WebResourceData" of ¬
property list item i of ¬
property list item "WebSubresources"
if n is less than 10 then
set n to "00" & n
set imageName to "Image_" & n
else if n is less than 100 then
set n to "0" & n
set imageName to "Image_" & n
else
set imageName to "Image_" & n
end if
set theFile to saveToFolder & imageName & ".jpg"
my writeToFile(theWebResourceData, theFile, true)
set n to n + 1
end if
end repeat
end tell
end tell
-- ## Handler ##
to writeToFile(theData, theFile, overwriteExistingContent)
try
set theFile to theFile as string
if theFile contains "/" then
set theOpenedFile to ¬
open for access theFile with write permission
else
set theOpenedFile to ¬
open for access file theFile with write permission
end if
if overwriteExistingContent is true then ¬
set eof of theOpenedFile to 0
write theData to theOpenedFile starting at eof
close access theOpenedFile
return true
on error
try
close access file theFile
end try
return false
end try
end writeToFile
Notes:
The example AppleScript code, shown directly above, will open a choose file
dialog box for you to pick the just saved Safari webpage containing the images of brain scans.
It will then extract the target images of the brain scans from the saved Safari webpage containing the images of brain scans, writing them to the Downloads folder, sequentially named as Image_000.jpg thru the number of images of brain scans on the saved Web Archive webpage.
This method is very quick compared to the Fully-Automated Method shown below and should only takes ~ 10 seconds after selecting the target file, depending on how fast your CPU is.
Unfortunately, it appears that the AppleScript save
command in Safari is broken and why the this is posted as a semi-automated method. If the target webpage was to be programmatically saved first, then this too could be a fully-automated method.
The reason there are two methods is this one was an afterthought, however, because it's much quicker than the original method I posted, I've placed it first and kept the original as it's always good to have more than one way to accomplish the task with the target application.
Fully-Automated Method
(Original Answer)
The example AppleScript code creates a new window in Safari to the target URL, waits for the page to finish loading, and then builds out a list of curl
commands to be used in a do shell script
command to download each image, naming them sequentially in the Downloads folder.
Note that once you start the example AppleScript code you must wait until it starts downloading the images before doing anything else in Safari, which should only take a few seconds, depending on how fast your Internet connection is.
Once the downloading starts, it waits a random length of time, between 3 and 8 seconds, to download the next image, as you do not want to inundate the server with rapid requests, as it may consider it an attack.
As currently coded, it should take ~ 20 minutes to download all the images.
Example AppleScript code:
property downloadsFolder : POSIX path of (path to downloads folder from user domain)
property theURL : "https://connectivity.brain-map.org/static/referencedata/experiment/thumbnails/100142290?image_type=NEUN&popup=true"
property domainURL : "https://connectivity.brain-map.org"
property curlCMDs : {}
tell application "Safari"
make new document with properties {URL:theURL}
my waitForSafariPageToFinishLoading()
delay 3 -- # In this case, an extra delay is needed even after page finishes loading.
tell document 1
set imageCount to ¬
(do JavaScript ¬
"document.getElementsByClassName('_ish_tnw_cell _ish_tnw_dark').length;") ¬
as integer
repeat with i from 0 to imageCount - 1
set foo to ¬
do JavaScript ¬
"document.getElementsByClassName('_ish_tnw_cell _ish_tnw_dark')['" & i & "'].innerHTML;"
my buildOutCURLcommands(foo, i)
end repeat
end tell
end tell
-- # Download the images of brain scans.
repeat with thisCMD in curlCMDs
do shell script thisCMD
delay (random number from 3 to 8)
end repeat
-- ## Handlers ##
to buildOutCURLcommands(foo, i)
tell current application
set AppleScript's text item delimiters to "\""
set imageURL to text item 6 of foo
set AppleScript's text item delimiters to "&"
set imageURL to text items of imageURL
set AppleScript's text item delimiters to "&"
set imageURL to imageURL as string
set AppleScript's text item delimiters to ""
set curlURL to quoted form of (domainURL & imageURL)
if n is less than 10 then
set i to "00" & i
set imageName to "Image_" & i
else if n is less than 100 then
set i to "0" & i
set imageName to "Image_" & i
else
set imageName to "Image_" & i
end if
copy ¬
"curl " & curlURL & " -o " & downloadsFolder & imageName & ".jpg" to ¬
the end of curlCMDs
end tell
end buildOutCURLcommands
to waitForSafariPageToFinishLoading()
-- # Wait for page to finish loading in Safari.
-- # This works in **macOS Catalina** and
-- # macOS Big Sur and may need adjusting for
-- # other versions of macOS.
tell application "System Events" to repeat until ¬
exists (buttons of groups of toolbar 1 of window 1 of ¬
process "Safari" whose name = "Reload this page")
delay 0.5
end repeat
end waitForSafariPageToFinishLoading
Notes:
Looking as the source code of the target URL the images are laid out in a table and as such it makes it easy to ascertain the images URL shown as the value of src=
in the HTML code of the cells of the table.
This script use AppleScript's text item delimiters
to parse the HTML text returned by the second do JavaScript ...
command, and while generally speaking parsing HTML should be done by applications/utilities specifically designed to do so, nonetheless, in this particular use case it works as intended and thus is used to accomplish the task at hand.
The second do JavaScript ...
command returns, e.g.:
"<img class=\"_ish_tnw_img\" id=\"_img_0_0\" src=\"/cgi-bin/imageservice?zoom=4&path=/external/connectivity/prod17/0500179232/0500179232.aff&top=61168&left=2176&width=228&height=181\" title=\"Position: 248\" _ish_tnw_index=\"0\" style=\"width: 153px;\">"
The buildOutCURLcommands(foo, i)
handler uses AppleScript's text item delimiters
to parse the value of foo
and does the following:
- Get the portion of the URL that points to the target image file from the value of
foo
.- Parses it base on the
"
in the string of HTML text getting the sixth item in the list, e.g.:/cgi-bin/imageservice?zoom=4&path=/external/connectivity/prod17/0500179232/0500179232.aff&top=61168&left=2176&width=228&height=181
- Replaces
&
with&
so the URL is properly formed.
- Parses it base on the
- Pads the value of
i
with leading zeros as needed to be three digits long for the name used for the downloaded image file. - Builds out the
curl
command while adding it to thecurlCMDs
list, to be executed by ado shell script
command.
After this part of the script has finished the downloading begins and you can then do whatever you want with Safari, just let Script Editor continue running the script until it finishes.
Note: The example AppleScript code is just that and sans any included error handling does not contain any additional error handling as may be appropriate. The onus is upon the user to add any error handling as may be appropriate, needed or wanted. Have a look at the try statement and error statement in the AppleScript Language Guide. See also, Working with Errors. Additionally, the use of the delay command may be necessary between events where appropriate, e.g. delay 0.5
, with the value of the delay set appropriately.
Solution 2:
The webpage doesn't contain the images directly, but rather some JavaScript that the browser runs. This code makes an API request for a new page, containing links to images. The code on the first webpage then reads the links to images from the second page and outputs markup to the browser containing these images.
Automator downloads the first page and looks at it for images, finding none. It doesn't execute JavaScript and make additional requests like the browser. You can't use Automator's ‘Save Images from Web Content’ with this webpage for that reason.
If you want to programmatically download content from this website, use its API. Use the web inspector to inspect the requests that the webpage makes, then make the request yourself and parse the output (e.g. jq to parse JSON from your linked website's API), downloading URLs found (with Automator or other scripting language).
Solution 3:
If you're not “married” to using Safari, this following alternate solution using Google Chrome, may be of some interest to you.
This following AppleScript code, as is, will create a new folder named "Brainmap Images_files" containing all of the thumbnail images, on your desktop.
Depending on the speed of your internet connection and computer, this process should take less than 12 seconds to complete
property savedImages : (path to desktop as text) & "Brainmap Images.html"
property savedImagesFolder : (path to desktop as text) & "Brainmap Images_files"
property theURL : "https://connectivity.brain-map.org/static/referencedata/experiment/thumbnails/100142290?image_type=NEUN&popup=true"
tell application "Google Chrome"
activate
try
tell window 1 to set URL of (make new tab) to theURL
on error errMsg number errNum
set newWindow to make new window
tell newWindow to set URL of active tab to theURL
end try
tell window 1
repeat while active tab is loading
delay 0.1
end repeat
end tell
delay 1
save window 1's active tab in file savedImages
end tell
tell application "System Events"
repeat until folder savedImagesFolder exists
delay 0.1
end repeat
delay 1
end tell
do shell script "find " & quoted form of POSIX path of savedImagesFolder & ¬
" -type f | grep -v 'imageser' | xargs -I {} rm {}"
do shell script "find " & quoted form of POSIX path of savedImagesFolder & ¬
" -type f | xargs -I {} mv {} '{}.jpg' ; rm " & ¬
quoted form of POSIX path of savedImages