Headless, scriptable Firefox/Webkit on linux? [closed]

I'm looking to automate some web interactions, namely periodic download of files from a secure website. This basically involves entering my username/password and navigating to the appropriate URL.

I tried simple scripting in Python, followed by more sophisticated scripting, only to discover this particular website is using some obnoxious javascript and flash based mechanism for login, rendering my methods useless.

I then tried HTMLUnit, but that doesn't seem to want to work either. I suspect use of Flash is the issue.

I don't really want to think about it any more, so I'm leaning towards scripting an actual browser to log in and grab the file I need.

Requirements are:

  • Run on linux server (ie. no X running). If I really need to have X I can make that happen, but I won't be happy.
  • Be reliable. I want to start this thing and never think about it again.
  • Be scriptable. Nothing too sophisticated, but I should be able to tell the browser the various steps to take and pages to visit.

Are there any good toolkits for a headless, X-less scriptable browser? Have you tried something like this and if so do you have any words of wisdom?


What about phantomjs?


I did related task with IE embedded browser (although it was gui application with hidden browser component panel). Actually you can take any layout engine and cut output logic. Navigation is should be done via firing script-like events.

You can use Crowbar. It is headless version of firefox (Gecko engine). It turns browser into RESTful server that can accept requests ("fetch url"). So it parse html, represent it as DOM, wait defined delay for all script performed.

It works on linux. I suppose you can easily extend it for your goal using JS and rich XULrunner abilities.