Interact with other programs using Python
I'm having the idea of writing a program using Python which shall find a lyric of a song whose name I provided. I think the whole process should boil down to couple of things below. These are what I want the program to do when I run it:
- prompt me to enter a name of a song
- copy that name
- open a web browser (google chrome for example)
- paste that name in the address bar and find information about the song
- open a page that contains the lyrics
- copy that lyrics
- run a text editor (like Microsoft Word for instance)
- paste the lyrics
- save the new text file with the name of the song
I am not asking for code, of course. I just want to know the concepts or ideas about how to use python to interact with other programs
To be more specific, I think I want to know, fox example, just how we point out where is the address bar in Google Chrome and tell python to paste the name there. Or how we tell python how to copy the lyrics as well as paste it into the Microsof Word's sheet then save it.
I've been reading (I'm still reading) several books on Python: Byte of python, Learn python the hard way, Python for dummies, Beginning Game Development with Python and Pygame. However, I found out that it seems like I only (or almost only) learn to creat programs that work on itself (I can't tell my program to do things I want with other programs that are already installed on my computer)
I know that my question somehow sounds rather silly, but I really want to know how it works, the way we tell Python to regconize that this part of the Google chrome browser is the address bar and that it should paste the name of the song in it. The whole idea of making python interact with another program is really really vague to me and I just extremely want to grasp that.
Thank you everyone, whoever spend their time reading my so-long question.
ttriet204
Solution 1:
If what you're really looking into is a good excuse to teach yourself how to interact with other apps, this may not be the best one. Web browsers are messy, the timing is going to be unpredictable, etc. So, you've taken on a very hard task—and one that would be very easy if you did it the usual way (talk to the server directly, create the text file directly, etc., all without touching any other programs).
But if you do want to interact with other apps, there are a variety of different approaches, and which is appropriate depends on the kinds of apps you need to deal with.
Some apps are designed to be automated from the outside. On Windows, this nearly always means they a COM interface, usually with an IDispatch interface, for which you can use
pywin32
's COM wrappers; on Mac, it means an AppleEvent interface, for which you useScriptingBridge
orappscript
; on other platforms there is no universal standard. IE (but probably not Chrome) and Word both have such interfaces.Some apps have a non-GUI interface—whether that's a command line you can drive with
popen
, or a DLL/SO/DYLIB you can load up throughctypes
. Or, ideally, someone else has already written Python bindings for you.Some apps have nothing but the GUI, and there's no way around doing GUI automation. You can do this at a low level, by crafting WM_ messages to send via
pywin32
on Windows, using the accessibility APIs on Mac, etc., or at a somewhat higher level with libraries likepywinauto
, or possibly at the very high level ofselenium
or similar tools built to automate specific apps.
So, you could do this with anything from selenium for Chrome and COM automation for Word, to crafting all the WM_ messages yourself. If this is meant to be a learning exercise, the question is which of those things you want to learn today.
Let's start with COM automation. Using pywin32
, you directly access the application's own scripting interfaces, without having to take control of the GUI from the user, figure out how to navigate menus and dialog boxes, etc. This is the modern version of writing "Word macros"—the macros can be external scripts instead of inside Word, and they don't have to be written in VB, but they look pretty similar. The last part of your script would look something like this:
word = win32com.client.dispatch('Word.Application')
word.Visible = True
doc = word.Documents.Add()
doc.Selection.TypeText(my_string)
doc.SaveAs(r'C:\TestFiles\TestDoc.doc')
If you look at Microsoft Word Scripts, you can see a bunch of examples. However, you may notice they're written in VBScript. And if you look around for tutorials, they're all written for VBScript (or older VB). And the documentation for most apps is written for VBScript (or VB, .NET, or even low-level COM). And all of the tutorials I know of for using COM automation from Python, like Quick Start to Client Side COM and Python, are written for people who already know about COM automation, and just want to know how to do it from Python. The fact that Microsoft keeps changing the name of everything makes it even harder to search for—how would you guess that googling for OLE automation, ActiveX scripting, Windows Scripting House, etc. would have anything to do with learning about COM automation? So, I'm not sure what to recommend for getting started. I can promise that it's all as simple as it looks from that example above, once you do learn all the nonsense, but I don't know how to get past that initial hurdle.
Anyway, not every application is automatable. And sometimes, even if it is, describing the GUI actions (what a user would click on the screen) is simpler than thinking in terms of the app's object model. "Select the third paragraph" is hard to describe in GUI terms, but "select the whole document" is easy—just hit control-A, or go to the Edit menu and Select All. GUI automation is much harder than COM automation, because you either have to send the app the same messages that Windows itself sends to represent your user actions (e.g., see "Menu Notifications") or, worse, craft mouse messages like "go (32, 4) pixels from the top-left corner, click, mouse down 16 pixels, click again" to say "open the File menu, then click New".
Fortunately, there are tools like pywinauto
that wrap up both kinds of GUI automation stuff up to make it a lot simpler. And there are tools like swapy
that can help you figure out what commands you want to send. If you're not wedded to Python, there are also tools like AutoIt
and Actions
that are even easier than using swapy
and pywinauto
, at least when you're getting started. Going this way, the last part of your script might look like:
word.Activate()
word.MenuSelect('File->New')
word.KeyStrokes(my_string)
word.MenuSelect('File->Save As')
word.Dialogs[-1].FindTextField('Filename').Select()
word.KeyStrokes(r'C:\TestFiles\TestDoc.doc')
word.Dialogs[-1].FindButton('OK').Click()
Finally, even with all of these tools, web browsers are very hard to automate, because each web page has its own menus, buttons, etc. that aren't Windows controls, but HTML. Unless you want to go all the way down to the level of "move the mouse 12 pixels", it's very hard to deal with these. That's where selenium
comes in—it scripts web GUIs the same way that pywinauto
scripts Windows GUIs.
Solution 2:
The following script uses Automa to do exactly what you want (tested on Word 2010):
def find_lyrics():
print 'Please minimize all other open windows, then enter the song:'
song = raw_input()
start("Google Chrome")
# Disable Google's autocompletion and set the language to English:
google_address = 'google.com/webhp?complete=0&hl=en'
write(google_address, into="Address")
press(ENTER)
write(song + ' lyrics filetype:txt')
click("I'm Feeling Lucky")
press(CTRL + 'a', CTRL + 'c')
press(ALT + F4)
start("Microsoft Word")
press(CTRL + 'v')
press(CTRL + 's')
click("Desktop")
write(song + ' lyrics', into="File name")
click("Save")
press(ALT + F4)
print("\nThe lyrics have been saved in file '%s lyrics' "
"on your desktop." % song)
To try it out for yourself, download Automa.zip from its Download page and unzip into, say, c:\Program Files
. You'll get a folder called Automa 1.1.2
. Run Automa.exe
in that folder. Copy the code above and paste it into Automa by right-clicking into the console window. Press Enter twice to get rid of the last ...
in the window and arrive back at the prompt >>>
. Close all other open windows and type
>>> find_lyrics()
This performs the required steps.
Automa is a Python library: To use it as such, you have to add the line
from automa.api import *
to the top of your scripts and the file library.zip
from Automa's installation directory to your environment variable PYTHONPATH
.
If you have any other questions, just let me know :-)
Solution 3:
Here's an implementation in Python of @Matteo Italia's comment:
You are approaching the problem from a "user perspective" when you should approach it from a "programmer perspective"; you don't need to open a browser, copy the text, open Word or whatever, you need to perform the appropriate HTTP requests, parse the relevant HTML, extract the text and write it to a file from inside your Python script. All the tools to do this are available in Python (in particular you'll need urllib2 and BeautifulSoup).
#!/usr/bin/env python
import codecs
import json
import sys
import urllib
import urllib2
import bs4 # pip install beautifulsoup4
def extract_lyrics(page):
"""Extract lyrics text from given lyrics.wikia.com html page."""
soup = bs4.BeautifulSoup(page)
result = []
for tag in soup.find('div', 'lyricbox'):
if isinstance(tag, bs4.NavigableString):
if not isinstance(tag, bs4.element.Comment):
result.append(tag)
elif tag.name == 'br':
result.append('\n')
return "".join(result)
# get artist, song to search
artist = raw_input("Enter artist:")
song = raw_input("Enter song:")
# make request
query = urllib.urlencode(dict(artist=artist, song=song, fmt="realjson"))
response = urllib2.urlopen("http://lyrics.wikia.com/api.php?" + query)
data = json.load(response)
if data['lyrics'] != 'Not found':
# print short lyrics
print(data['lyrics'])
# get full lyrics
lyrics = extract_lyrics(urllib2.urlopen(data['url']))
# save to file
filename = "[%s] [%s] lyrics.txt" % (data['artist'], data['song'])
with codecs.open(filename, 'w', encoding='utf-8') as output_file:
output_file.write(lyrics)
print("written '%s'" % filename)
else:
sys.exit('not found')
Example
$ printf "Queen\nWe are the Champions" | python get-lyrics.py
Output
I've paid my dues Time after time I've done my sentence But committed no crime And bad mistakes I've made a few I've had my share of sand kicked [...] written '[Queen] [We are the Champions] lyrics.txt'
Solution 4:
If you really want to open a browser, etc, look at selenium. But that's overkill for your purposes. Selenium is used to simulate button clicks, etc for testing the appearance of websites on various browsers, etc. Mechanize is less of an overkill for this
What you really want to do is understand how a browser (or any other program) works under the hood i.e. when you click on the mouse or type on the keyboard or hit Save
, what does the program do behind the scenes? It is this behind-the-scenes work that you want your python code to do.
So, use urllib
, urllib2
or requests
(or heck, even scrapy
) to request a web page (learn how to put together the url to a google search or the php GET
request of a lyrics website). Google also has a search API that you can take advantage of, to perform a google search.
Once you have your results from your page request, parse it with xml
, beautifulsoup
, lxlml
, etc and find the section of the request result that has the information you're after.
Now that you have your lyrics, the simplest thing to do is open a text file and dump the lyrics in there and write to disk. But if you really want to do it with MS Word, then open a doc
file in notepad or notepad++ and look at its structure. Now, use python to build a document with similar structure, wherein the content will be the downloaded lyrics.
If this method fails, you could look into pywinauto or such to automate the pasting of text into an MS Word doc and clicking on Save
Citation: Matteo Italia, g.d.d.c from the comments on the OP
Solution 5:
You should look into a package called selenium
for interacting with web browsers