Using zgrep on a remote drive eats space on my local Mac
Once the command is running, I can see my free space on my local MacBook...disappear swiftly... Could someone please tell me why is this happening or how to prevent it?
Operating Locally while connected remotely
The command is acting upon those files as if they were local, even when they are not. *nix systems have this wonderful feature where you can mount almost anything as if it were a local file/directory. This "feature" is where you're running into this problem.
To operate on the file, it has to "download" it from the remote, hold it in memory and if necessary, resort to swap space. If it needs to unzip it (bzip, gzip, compress, etc.) it will likely utilize a temporary file (i.e. mktemp
). Since your command is operating locally, all the resources it will utilize will be local as well - it doesn't know to use your remote for temp space.
You can do a simple experiment to show how this works - move a file from one remote directory to another. Using a simple mount on a NAS (/Volumes/NAS/
) if you try to move /Volumes/NAS/folder1/*
to /Volumes/NAS/folder2/
, this simple operation will need to "download" the files to local machine, then upload them to the destination folder. However, if you login directly to the server and issue the move command there - for example, mv /home/username/folder1/* /home/username/folder2
, it will happen almost instantly.
In your case, it has to extract a number of files from each of those .tar.gz
files, download them, store them in a temp location, then load it to memory for as the ASCII text is searched for.
Cleaning up Temp files...
An important consideration to keep in mind is that temp files are not cleaned up until the process has completed or terminated abnormally; if the developer was good about it. If not, those temp files will be cleared out upon reboot.
If you have 200GB of files to search and a MacBook Air with 128GB or 256GB of storage, you will run out of space very quickly filling it up with nothing but temp files.
How to prevent it
Do you operation on the server (remote machine) itself. Instead of mounting the remote file/directory locally and operating on it locally, connect via SSH (for example) and issue your command on the remote itself.
Can you specify a different location for the temp directory like the remote?
Technically yes. For example, gzip uses /tmp
and you can set your environment variable to a remote temporary directory (i.e. /Volumes/NAS/tmp
). The problem is, you're still operating on the file locally, so it would "download" the file, and then while extracting them, "upload" all the extracted bits to the remote.