Remove trailing spaces from a file using Windows batch?

Solution 1:

The DosTips RTRIM function that Ben Hocking cites can be used to create a script that can right trim each line in a text file. However, the function is relatively slow.

DosTips user (and moderator) aGerman developed a very efficient right trim algorithm. He implemented the algorithm as a batch "macro" - an interesting concept of storing complex mini scripts in environment variables that can be executed from memory. The macros with arguments are a major discussion topic in and of themselves that is not relevent to this question.

I have extracted aGerman's algorithm and put it in the following batch script. The script expects the name of a text file as the only parameter and proceeds to right trim the spaces off each line in the file.

@echo off
setlocal enableDelayedExpansion
set "spcs= "
for /l %%n in (1 1 12) do set "spcs=!spcs!!spcs!"
findstr /n "^" "%~1" >"%~1.tmp"
setlocal disableDelayedExpansion
(
  for /f "usebackq delims=" %%L in ("%~1.tmp") do (
    set "ln=%%L"
    setlocal enableDelayedExpansion
    set "ln=!ln:*:=!"
    set /a "n=4096"
    for /l %%i in (1 1 13) do (
      if defined ln for %%n in (!n!) do (
        if "!ln:~-%%n!"=="!spcs:~-%%n!" set "ln=!ln:~0,-%%n!"
        set /a "n/=2"
      )
    )
    echo(!ln!
    endlocal
  )
) >"%~1"
del "%~1.tmp" 2>nul

Assuming the script is called rtrimFile.bat, then it can be called from the command line as follows:

rtrimFile "fileName.txt"

A note about performance
The original DosTips rtrim function performs a linear search and defaults to trimming a maximum of 32 spaces. It has to iterate once per space.

aGerman's algorithm uses a binary search and it is able to trim the maximum string size allowed by batch (up to ~8k spaces) in 13 iterations.

Unfotunately, batch is very SLOW when it comes to processing text. Even with the efficient rtrim function, it takes ~70 seconds to trim a 1MB file on my machine. The problem is, just reading and writing the file without any modification takes significant time. This answer uses a FOR loop to read the file, coupled with FINDSTR to prefix each line with the line number so that blank lines are preserved. It toggles delayed expansion to prevent ! from being corrupted, and uses a search and replace operation to remove the line number prefix from each line. All that before it even begins to do the rtrim.

Performance could be nearly doubled by using an alternate file read mechanism that uses set /p. However, the set /p method is limited to ~1k bytes per line, and it strips trailing control characters from each line.

If you need to regularly trim large files, then even a doubling of performance is probably not adequate. Time to download (if possible) any one of many utilities that could process the file in the blink of an eye.

If you can't use non-native software, then you can try VBScript or JScript excecuted via the CSCRIPT batch command. Either one would be MUCH faster.

UPDATE - Fast solution with JREPL.BAT

JREPL.BAT is a regular expression find/replace utility that can very efficiently solve the problem. It is pure script (hybrid batch/JScript) that runs natively on any Windows machine from XP onward. No 3rd party exe files are needed.

With JREPL.BAT somewhere within your PATH, you can strip trailing spaces from file "test.txt" with this simple command:

jrepl " +$" "" /f test.txt /o -

If you put the command within a batch script, then you must precede the command with CALL:

call jrepl " +$" "" /f test.txt /o -

Solution 2:

Go get yourself a copy of CygWin or the sed package from GnuWin32.

Then use that with the command:

sed "s/ *$//" inputFile >outputFile

Solution 3:

Dos Tips has an implementation of RTrim that works for batch files:

:rTrim string char max -- strips white spaces (or other characters) from the end of a string
::                     -- string [in,out] - string variable to be trimmed
::                     -- char   [in,opt] - character to be trimmed, default is space
::                     -- max    [in,opt] - maximum number of characters to be trimmed from the end, default is 32
:$created 20060101 :$changed 20080219 :$categories StringManipulation
:$source http://www.dostips.com
SETLOCAL ENABLEDELAYEDEXPANSION
call set string=%%%~1%%
set char=%~2
set max=%~3
if "%char%"=="" set char= &rem one space
if "%max%"=="" set max=32
for /l %%a in (1,1,%max%) do if "!string:~-1!"=="%char%" set string=!string:~0,-1!
( ENDLOCAL & REM RETURN VALUES
    IF "%~1" NEQ "" SET %~1=%string%
)
EXIT /b

If you're not used to using functions in batch files, read this.

Solution 4:

There is a nice trick to remove trailing spaces based on this answer of user Aacini; I modified it so that all other spaces occurring in the string are preserved. So here is the code:

@echo off
setlocal EnableDelayedExpansion

rem // This is the input string:
set "x=  This is   a text  string     containing  many   spaces.   "

rem // Ensure there is at least one trailing space; then initialise auxiliary variables:
set "y=%x% " & set "wd=" & set "sp="

rem // Now here is the algorithm:
set "y=%y: =" & (if defined wd (set "y=!y!!sp!!wd!" & set "sp= ") else (set "sp=!sp! ")) & set "wd=%"

rem // Return messages:
echo  input: "%x%"
echo output: "%y%"

endlocal

However, this approach fails when a character of the set ^, !, " occurs in the string.