applescript CURL with authentication fails
QUESTION: i need a working method to be used in applescript to get the right source of the page, WITHOUT loading the page.
sample link: https://www.idealista.it/immobile/16679597/
result: wrong html, it talks about authentication.
INITIAL CODE (always present for all the tries below):
set MyUser to [email protected]
set MyPass to password
set UrlOfPage to "https://www.idealista.it/immobile/16679597/"
TRIES (all of the tries below are given in this webpage https://ec.haxx.se/http-auth.html):
-
works but need the page to be loaded in Safari
tell front document of application "Safari" to set StrHtml to (get source) as string
-
returns wrong html
set StrHtml to (do shell script "curl --user " & MyUser & ":" & MyPass & " " & UrlOfPage) set StrHtml to (do shell script "curl --anyauth --user " & MyUser & ":" & MyPass & " " & UrlOfPage) set StrHtml to (do shell script "curl --digest --user " & MyUser & ":" & MyPass & " " & cellurl) set StrHtml to (do shell script "curl --negotiate --user " & MyUser & ":" & MyPass & " " & cellurl) set StrHtml to (do shell script "curl --ntlm --user " & MyUser & ":" & MyPass & " " & cellurl)
-
doesn't work: unknown token
set StrHtml to (do shell script "curl --proxy-anyauth --proxy-user " & MyUser & ":" & MyPass & " https://www.idealista.it/immobile/16679597/ \ --proxy https://proxy.idealista.it/immobile/16679597:80/")
could somebody help me now please?
Solution 1:
Dedicated Tools
Given the problems encountered with curl
and AppleScript, consider using an alternative dedicated tool such as Beautiful Soup. See How To Scrape Web Pages with Beautiful Soup and Python 3 for a comprehensive introduction.
Alternatively, there are numerous tools other that can help, see Web scraping software on Wikipedia. Many of these tools are free, open-source, and can be called from the command line.
I have previously used Web::Scraper for extracting property listings.