Memory usage steadily growing for multiprocessing.Pool.imap_unordered

It seems like Pool.imap_unordered() launches a new thread to iterate over the input sequence generated by step 1, so we need to throttle that thread from the main thread that is running step 3. The Semaphore class is designed for limiting one thread from another, so we call acquire() before we produce each line, and release() when we consume each line. If we start the semaphore at some arbitrary value like 100, then it will produce a buffer of 100 lines before blocking and waiting for the consumer to catch up.

import logging
import os
import multiprocessing
from threading import Semaphore
from time import sleep

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s:%(process)d:%(thread)d:%(message)s')
logger = logging.getLogger()

def process_step1(semaphore):
    data = 'a' * 100000
    for i in xrange(10000):
        semaphore.acquire()
        sleep(.001)  # Faster than step 3.
        yield data
        if i % 1000 == 0:
            logger.info('Producing %d.', i)
    logger.info('Finished producing.')


def process_step2(data):
    return data.upper()


def process_step3(up_data, semaphore):
    assert up_data == 'A' * 100000
    sleep(.005)  # Slower than step 1.
    semaphore.release()


def main():
    pool = multiprocessing.Pool(processes=10)
    semaphore = Semaphore(100)
    logger.info('Starting.')
    loader = process_step1(semaphore)
    processed = pool.imap_unordered(process_step2, loader)
    for i, up_data in enumerate(processed):
        process_step3(up_data, semaphore)
        if i % 500 == 0:
            logger.info('Consuming %d, using %0.1f MB.', i, get_memory())
    logger.info('Done.')


def get_memory():
    """ Look up the memory usage, return in MB. """
    proc_file = '/proc/{}/status'.format(os.getpid())
    scales = {'KB': 1024.0, 'MB': 1024.0 * 1024.0}
    with open(proc_file, 'rU') as f:
        for line in f:
            if 'VmSize:' in line:
                fields = line.split()
                size = int(fields[1])
                scale = fields[2].upper()
                return size*scales[scale]/scales['MB']
    return 0.0  # Unknown

main()

Now the memory usage is steady, because the producer doesn't get very far ahead of the consumer.

2016-12-01 15:52:13,833:6695:140124578850560:Starting.
2016-12-01 15:52:13,835:6695:140124535109376:Producing 0.
2016-12-01 15:52:13,841:6695:140124578850560:Consuming 0, using 255.0 MB.
2016-12-01 15:52:16,424:6695:140124578850560:Consuming 500, using 255.0 MB.
2016-12-01 15:52:18,498:6695:140124535109376:Producing 1000.
2016-12-01 15:52:19,015:6695:140124578850560:Consuming 1000, using 255.0 MB.
2016-12-01 15:52:21,602:6695:140124578850560:Consuming 1500, using 255.0 MB.
2016-12-01 15:52:23,675:6695:140124535109376:Producing 2000.
2016-12-01 15:52:24,192:6695:140124578850560:Consuming 2000, using 255.0 MB.
2016-12-01 15:52:26,776:6695:140124578850560:Consuming 2500, using 255.0 MB.
2016-12-01 15:52:28,846:6695:140124535109376:Producing 3000.
2016-12-01 15:52:29,362:6695:140124578850560:Consuming 3000, using 255.0 MB.
2016-12-01 15:52:31,951:6695:140124578850560:Consuming 3500, using 255.0 MB.
2016-12-01 15:52:34,022:6695:140124535109376:Producing 4000.
2016-12-01 15:52:34,538:6695:140124578850560:Consuming 4000, using 255.0 MB.
2016-12-01 15:52:37,128:6695:140124578850560:Consuming 4500, using 255.0 MB.
2016-12-01 15:52:39,193:6695:140124535109376:Producing 5000.
2016-12-01 15:52:39,704:6695:140124578850560:Consuming 5000, using 255.0 MB.
2016-12-01 15:52:42,291:6695:140124578850560:Consuming 5500, using 255.0 MB.
2016-12-01 15:52:44,361:6695:140124535109376:Producing 6000.
2016-12-01 15:52:44,878:6695:140124578850560:Consuming 6000, using 255.0 MB.
2016-12-01 15:52:47,465:6695:140124578850560:Consuming 6500, using 255.0 MB.

Update

If you're using multiprocessing.Pool, consider upgrading to concurrent.futures.process.ProcessPoolExecutor, because it handles killed workers better. It doesn't affect the problem described in this question.