Python code to automate desktop activities in windows

Have a look at SIKULI.

Sikuli is a visual technology to automate and test graphical user interfaces (GUI) using images (screenshots).

SIKULI uses a very clever combination of taking screenshots, and embedding them into your python (it's jython, actually) script.


Take screenshots:

enter image description here

and use them in your code:

enter image description here


You can try Automa.

It's a Windows GUI automation tool written in Python which is very simple to use. For example, you can do the following:

# to double click on an icon on the desktop
doubleclick("Recycle Bin")

# to maximize
click("Maximize")

# to input some text and press ENTER
write("Some text", into="Label of the text field")
press(ENTER)

The full list of available commands can be found here.

Disclaimer: I'm one of Automa's developers.


There are different ways of automating user interfaces in Windows that can be accessed via Python (using ctypes or some of the Python windows bindings):

  1. Raw windows APIs -- Get/SetCursorPos for the mouse, HWND APIs like GetFocus and GetForegroundWindow

  2. AutoIt -- an automation scripting language: Calling AutoIt Functions in Python

  3. Microsoft Active Accessibility (MSAA) / WinEvent -- an API for interrogating a UI through the accessibility APIs in Win95.

  4. UI/Automation (UIA) -- a replacement for MSAA introduced in Vista (available for XP SP3 IIRC).

Automating a user interface to test it is a non-trivial task. There are a lot of gotchas that can trip you up.

I would suggest testing your automation framework in an automated way so you can verify that it works on the platforms you are testing (to identify failures in the automation API vs failures in the application).

Another consideration is how to deal with localization. Note also that the names for Minimize/Maximize/... are localized as well, and can be in a different language to the application (system vs. user locale)!

In pseudo-code, an MSAA program to minimize an application would look something like:

window = AccessibleObjectFromWindow(FindWindow("My Window"))
titlebar = [x for x in window.AccessibleChildren if x.accRole == TitleBar]
minimize = [x for x in titlebar[0].AccessibleChildren if x.Name == "Minimize"]
if len(minimize) != 0: # may already be minimized
    mimimize[0].accDoDefaultAction()

MSAA accessible items are stored as (object: IAccessible, childId: int) pairs. Care is needed here to get the calls correct (e.g. get_accChildCount only uses the IAccessible, so when childId is not 0 you must return 0 instead of calling get_accChildCount)!

IAccessible calls can return different error codes to indicate "this object does not support this property" -- e.g. DISP_E_MEMBERNOTFOUND or E_NOTIMPL.

Be aware of the state of the window. If the window is maximized then minimized, restore will restore the window to its maximized state, so you need to restore it again to get it back to the normal/windowed state.

The MSAA and UIA APIs don't support right mouse button clicks, so you need to use a Win32 API to trigger it.

The MSAA model does not support treeview heirarchy information -- it displays it as a flat list. On the other hand, UIA will only enumerate elements that are visible so you will not be able to access elements in the UIA tree that are collapsed.