Terminate multiple threads when any thread completes a task

I am new to both python, and to threads. I have written python code which acts as a web crawler and searches sites for a specific keyword. My question is, how can I use threads to run three different instances of my class at the same time. When one of the instances finds the keyword, all three must close and stop crawling the web. Here is some code.

class Crawler:
      def __init__(self):
            # the actual code for finding the keyword 

 def main():  
        Crawl = Crawler()

 if __name__ == "__main__":
        main()

How can I use threads to have Crawler do three different crawls at the same time?


There doesn't seem to be a (simple) way to terminate a thread in Python.

Here is a simple example of running multiple HTTP requests in parallel:

import threading

def crawl():
    import urllib2
    data = urllib2.urlopen("http://www.google.com/").read()

    print "Read google.com"

threads = []

for n in range(10):
    thread = threading.Thread(target=crawl)
    thread.start()

    threads.append(thread)

# to wait until all three functions are finished

print "Waiting..."

for thread in threads:
    thread.join()

print "Complete."

With additional overhead, you can use a multi-process aproach that's more powerful and allows you to terminate thread-like processes.

I've extended the example to use that. I hope this will be helpful to you:

import multiprocessing

def crawl(result_queue):
    import urllib2
    data = urllib2.urlopen("http://news.ycombinator.com/").read()

    print "Requested..."

    if "result found (for example)":
        result_queue.put("result!")

    print "Read site."

processs = []
result_queue = multiprocessing.Queue()

for n in range(4): # start 4 processes crawling for the result
    process = multiprocessing.Process(target=crawl, args=[result_queue])
    process.start()
    processs.append(process)

print "Waiting for result..."

result = result_queue.get() # waits until any of the proccess have `.put()` a result

for process in processs: # then kill them all off
    process.terminate()

print "Got result:", result

Starting a thread is easy:

thread = threading.Thread(function_to_call_inside_thread)
thread.start()

Create an event object to notify when you are done:

event = threading.Event()
event.wait() # call this in the main thread to wait for the event
event.set() # call this in a thread when you are ready to stop

Once the event has fired, you'll need to add stop() methods to your crawlers.

for crawler in crawlers:
    crawler.stop()

And then call join on the threads

thread.join() # waits for the thread to finish

If you do any amount of this kind of programming, you'll want to look at the eventlet module. It allows you to write "threaded" code without many of the disadvantages of threading.