Tips for keeping Perl memory usage low
What are some good tips for keeping memory usage low in a Perl script? I am interested in learning how to keep my memory footprint as low as possible for systems depending on Perl programs. I know Perl isn't great when it comes to memory usage, but I'd like to know if there are any tips for improving it.
So, what can you do to keep a Perl script using less memory. I'm interested in any suggestions, whether they are actual tips for writing code, or tips for how to compile Perl differently.
Edit for Bounty: I have a perl program that serves as a server for a network application. Each client that connects to it gets it's own child process currently. I've used threads instead of forks as well, but I haven't been able to determine if using threads instead of forks is actually more memory efficient.
I'd like to try using threads instead of forks again. I believe in theory it should save on memory usage. I have a few questions in that regard:
- Do threads created in Perl prevent copying Perl module libraries into memory for each thread?
- Is threads (use threads) the most efficient way (or the only) way to create threads in Perl?
- In threads, I can specify a stack_size paramater, what specifically should I consider when specifying this value, and how does it impact memory usage?
With threads in Perl/Linux, what is the most reliable method to determine the actual memory usage on a per-thread basis?
What sort of problem are you running into, and what does "large" mean to you? I have friends you need to load 200 Gb files into memory, so their idea of good tips is a lot different than the budget shopper for minimal VM slices suffering with 250 Mb of RAM (really? My phone has more than that).
In general, Perl holds on to any memory you use, even if it's not using it. Realize that optimizing in one direction, e.g. memory, might negatively impact another, such as speed.
This is not a comprehensive list (and there's more in Programming Perl):
☹ Use Perl memory profiling tools to help you find problem areas. See Profiling heap memory usage on perl programs and How to find the amount of physical memory occupied by a hash in Perl?
☹ Use lexical variables with the smallest scope possible to allow Perl to re-use that memory when you don't need it.
☹ Avoid creating big temporary structures. For instance, reading a file with a foreach
reads all the input at once. If you only need it line-by-line, use while
.
foreach ( <FILE> ) { ... } # list context, all at once
while( <FILE> ) { ... } # scalar context, line by line
☹ You might not even need to have the file in memory. Memory-map files instead of slurping them
☹ If you need to create big data structures, consider something like DBM::Deep or other storage engines to keep most of it out of RAM and on disk until you need it. Outside of Perl, there are various key-value stores, such as Redis, that may help.
☹ Don't let people use your program. Whenever I've done that, I've reduced the memory footprint by about 100%. It also cuts down on support requests.
☹ (Update: Perl can now handle this for you in most cases because it uses a Copy On Write (COW) mechanism) Pass large chunks of text and large aggregates by reference so you don't make a copy, thus storing the same information twice. If you have to copy it because you want to change something, you might be stuck. This goes both ways as subroutine arguments and subroutine return values:
call_some_sub( \$big_text, \@long_array );
sub call_some_sub {
my( $text_ref, $array_ref ) = @_;
...
return \%hash;
}
☹ Track down memory leaks in modules. I had big problems with an application until I realized that a module wasn't releasing memory. I found a patch in the module's RT queue, applied it, and solved the problem.
☹ If you need to handle a big chunk of data once but don't want the persistent memory footprint, offload the work to a child process. The child process only has the memory footprint while it's working. When you get the answer, the child process shuts down and releases it memory. Similarly, work distribution systems, such as Minion, can spread work out among machines.
☹ Turn recursive solutions into iterative ones. Perl doesn't have tail recursion optimization, so every new call adds to the call stack. You can optimize the tail problem yourself with tricks with goto or a module, but that's a lot of work to hang onto a technique that you probably don't need.
☹ Use external programs, forks, job queues, or other separate actors so you don't have to carry around short-term memory burdens. If you have a have processing task that will use a big chunk of memory, let a different program (perhaps a fork of the current program) handle that and give you back the answer. When that other program is done, all of its memory returns to the operating system. This program doesn't even need to be on the same box.
☹ Did he use 6 Gb or only five? Well, to tell you the truth, in all this excitement I kind of lost track myself. But being as this is Perl, the most powerful language in the world, and would blow your memory clean off, you've got to ask yourself one question: Do I feel lucky? Well, do ya, punk?
There are many more, but it's too early in the morning to figure out what those are. I cover some in Mastering Perl and Effective Perl Programming.
My two dimes.
-
Do threads created in Perl prevent copying Perl module libraries into memory for each thread?
- It does not, it is just one process, what isn't repeated in the program stack, each thread must have its own.
-
Is threads (use threads) the most efficient way (or the only) way to create threads in Perl?
- IMO Any method eventually calls the pthread library APIs which actually does the work.
-
In threads, I can specify a stack_size paramater, what specifically should I consider when specifying this value, and how does it impact memory usage?
- Since threads runs in the same process space, the stack cannot be shared. The stack size tells pthreads how far away they should be from each other. Everytime a function is called the local variables are allocated on the stack. So stack size limits how deep you can recurse. you can allocate as little as possible to the extend that your application still works.
With threads in Perl/Linux, what is the most reliable method to determine the actual memory usage on a per-thread basis?
* Stack storage is fixed after your thread is spawned, heap and static storage is shared and
they can be used by any thread so this notion of memory usage per-thread doesn't really
apply. It is per process.
Comparing fork and thread:
* fork duplicate the process and inherites the file handles
advantages: simpler application logic, more fault tolerant.
the spawn process can become faulty and leaking resource
but it will not bring down the parent. good solution if
you do not fork a lot and the forked process eventually
exits and cleaned up by the system.
disadvantages: more overhead per fork, system limitation on the number
of processes you can fork. You program cannot share variables.
* threads runs in the same process with addtional program stacks.
advantages: lower memory footprint, thread spawn if faster and ligther
than fork. You can share variables.
disadvantages: more complex application logic, serialization of resources etc.
need to have very reliable code and need to pay attention to
resource leaks which can bring down the entire application.
IMO, depends on what you do, fork can use way less memory over the life time of the
application run if whatever you spawn just do the work independently and exit, instead of
risking memory leaks in threads.
In addition to brian d foy's suggestions, I found the following also helped a LOT.
- Where possible, don't "use" external modules, you don't know how much memory they utilise. I found by replacing the LWP and HTTP::Request::Common modules with either Curl or Lynx slashed memory usage by half.
- Slashed it again by modifying our own modules and pulling in only the required subroutines using "require" rather than a full library of unnecessary subs.
-
Brian mentions using lexical variables with the smallest possible scope. If you're forking, using "undef" also helps by immediately freeing up memory for Perl to re-use. So you declare a scalar, array, hash or even sub, and when you're finished with any of them, use :
my (@divs) = localtime(time); $VAR{minute} = $divs[1];
undef @divs; undef @array; undef $scalar; undef %hash; undef ⊂
And don't use any unnecssary variables to make your code smaller. It's better to hard code whatever is possible to reduce namespace usage.
Then there's a lot of other tricks you can try depending on your application's functionality. Ours was run by cron, every minute. We found we could fork half the processes with a sleep(30) so half would run and complete within the first 30 seconds, freeing up cpu and memory, and the other half would run after a 30 second delay. Halved the resource usage again. All up, we managed to reduce RAM usage from over 2 GB down to 200MB, a 90% saving.
We managed to get a pretty good idea of memory usage with
top -M
as our script was executed on an relatively stable server with only one site. So watching "free ram" gave us a pretty good indication of memery usage.
Also "ps" grepping for your script and if forking, sorting by either memory or cpu usage was a good help.
ps -e -o pid,pcpu,pmem,stime,etime,command --sort=+cpu | grep scriptname | grep -v grep
If you're really desperate you could try to mount some memory as a filesystem (tmpfs/ramdisk) and read/write/delete files on it. I guess the tmpfs implementation is smart enough to release the memory when you delete a file.
You could also mmap (see File::Map, Sys::Mmap) a huge file on the tmpfs, an idea I got from Cache::FastMmap.
Never tried, but it should work :)