Windows batch script to parse CSV file and output a text file

I've seen a response on another page (Help in writing a batch script to parse CSV file and output a text file) - brilliant code BTW:

@ECHO OFF
IF "%~1"=="" GOTO :EOF
SET "filename=%~1"
SET fcount=0
SET linenum=0
FOR /F "usebackq tokens=1-10 delims=," %%a IN ("%filename%") DO ^
CALL :process "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i" "%%j"
GOTO :EOF

:trim
SET "tmp=%~1"
:trimlead
IF NOT "%tmp:~0,1%"==" " GOTO :EOF
SET "tmp=%tmp:~1%"
GOTO trimlead

:process
SET /A linenum+=1
IF "%linenum%"=="1" GOTO picknames

SET ind=0
:display
IF "%fcount%"=="%ind%" (ECHO.&GOTO :EOF)
SET /A ind+=1
CALL :trim %1
SETLOCAL ENABLEDELAYEDEXPANSION
ECHO !f%ind%!!tmp!
ENDLOCAL
SHIFT
GOTO display

:picknames
IF %1=="" GOTO :EOF
CALL :trim %1
SET /a fcount+=1
SET "f%fcount%=%tmp%"
SHIFT
GOTO picknames

It works brilliantly for an example csv file I made in the format:

Header,Name,Place
one,two,three
four,five,six

However the actual file I want to change comprises of 64 fields - so I altered the tokens=1-10 to tokens=1-64 and increased the %%a etc right up to 64 variables (the last being called %%BL for example). Now, however, when I run the batch on my 'big' csv file (with the 64 tokens) nothing happens. No errors (good) but no output! (bad). If anyone can help that would be fantastic... am soooo close to getting the whole app working if I can just nail this last bit! Or if anyone has some example code that will do similar for an indefinite number of tokens... Ultimately I want to make a string which will be something like:

field7,field12,field15,field18

Solution 1:

Important update - I don't think Windows batch is a good option for your needs because a single FOR /F cannot parse more than 31 tokens. See the bottom of the Addendum below for an explanation.

However, it is possible to do what you want with batch. This ugly code will give you access to all 64 tokens.

for /f "usebackq tokens=1-29* delims=," %%A in ("%filename%") do (
  for /f "tokens=1-26* delims=," %%a in ("%%^") do (
    for /f "tokens=1-9 delims=," %%1 in ("%%{") do (
      rem Tokens 1-26 are in variables %%A - %%Z
      rem Token  27 is in %%[
      rem Token  28 is in %%\
      rem Token  29 is in %%]
      rem Tokens 30-55 are in %%a - %%z
      rem Tokens 56-64 are in %%1 - %%9
    )
  )
)

The addendum provides important info on how the above works.

If you only need a few of the tokens spread out amongst the 64 on the line, then the solution is marginally easier in that you might be able to avoid using crazy characters as FOR variables. But there is still careful bookkeeping to be done.

For example, the following will give you access to tokens 5, 27, 46 and 64

for /f "usebackq tokens=5,27,30* delims=," %%A in ("%filename%") do (
  for /f "tokens=16,30* delims=," %%E in ("%%D") do (
    for /f "tokens=4 delims=," %%H in ("%%G") do (
      rem Token  5 is in %%A
      rem Token 27 is in %%B
      rem Token 46 is in %%E
      rem Token 64 is in %%H
    )
  )
)

April 2016 Update - Based on investigative work by DosTips users Aacini, penpen, and aGerman, I have developed a relatively easy method to simultaneously access thousands of tokens using FOR /F. The work is part of this DosTips thread. The actual code can be found in these 3 posts:

  • Work with a fixed number of columns
  • Work with varying numbers of columns
  • Dynamically choose which tokens to expand within the DO clause

Original Answer FOR variables are limited to a single character, so your %%BL strategy can't work. The variables are case sensitive. According to Microsoft you are limited to capturing 26 tokens within one FOR statement, but it is possible to get more if you use more than just alpha. Its a pain because you need an ASCII table to figure out which characters go where. FOR does not allow just any character however, and the maximum number of tokens that a single FOR /F can assign is 31 +1. Any attempt to parse and assign more than 31 will quietly fail, as you have discovered.

Thankfully, I don't think you need that many tokens. You simply specify which tokens you want with the TOKENS option.

for /f "usebackq tokens=7,12,15,18 delims=," %%A in ("%filename%") do echo %%A,%%B,%%C,%%D

will give you your 7th, 12th, 15th and 18th tokens.

Addendum

April 2016 Update A couple weeks ago I learned that the following rules (written 6 years ago) are code page dependent. The data below has been verified for code pages 437 and 850. More importantly, the FOR variable sequence of extended ASCII characters 128-254 does not match the byte code value, and varies tremendously by code page. It turns out the FOR /F variable mapping is based on the underlying UTF-(16?) code point. So the extended ASCII characters are of limited use when used with FOR /F. See the thread at http://www.dostips.com/forum/viewtopic.php?f=3&t=7703 for more information.

I performed some tests, and can report the following (updated in response to jeb's comment):

Most characters can be used as a FOR variable, including extended ASCII 128-254. But some characters cannot be used to define a variable in the first part of a FOR statement, but can be used in the DO clause. A few can't be used for either. Some have no restrictions, but require special syntax.

The following is a summary of characters that have restrictions or require special syntax. Note that text within angle brackets like <space> represents a single character.

Dec  Hex   Character   Define     Access
  0  0x00  <nul>       No         No
 09  0x09  <tab>       No         %%^<tab>  or  "%%<tab>"
 10  0x0A  <LF>        No         %%^<CR><LF><CR><LF>  or  %%^<LF><LF>
 11  0x0B  <VT>        No         %%<VT>
 12  0x0C  <FF>        No         %%<FF>
 13  0x0D  <CR>        No         No
 26  0x1A  <SUB>       %%%VAR%    %%%VAR% (%VAR% must be defined as <SUB>)
 32  0x20  <space>     No         %%^<space>  or  "%%<space>"
 34  0x22  "           %%^"       %%"  or  %%^"
 36  0x24  $           %%$        %%$ works, but %%~$ does not
 37  0x25  %           %%%%       %%~%%
 38  0x26  &           %%^&       %%^&  or  "%%&"
 41  0x29  )           %%^)       %%^)  or  "%%)"
 44  0x2C  ,           No         %%^,  or  "%%,"
 59  0x3B  ;           No         %%^;  or  "%%;"
 60  0x3C  <           %%^<       %%^<  or  "%%<"
 61  0x3D  =           No         %%^=  or  "%%="
 62  0x3E  >           %%^>       %%^>  or  "%%>"
 94  0x5E  ^           %%^^       %%^^  or  "%%^"
124  0x7C  |           %%^|       %%^|  or  "%%|"
126  0x7E  ~           %%~        %%~~ (%%~ may crash CMD.EXE if at end of line)
255  0xFF  <NB space>  No         No

Special characters like ^ < > | & must be either escaped or quoted. For example, the following works:

for /f %%^< in ("OK") do echo "%%<" %%^<

Some characters cannot be used to define a FOR variable. For example, the following gives a syntax error:

for /f %%^= in ("No can do") do echo anything

But %%= can be implicitly defined by using the TOKENS option, and the value accessed in the DO clause like so:

for /f "tokens=1-3" %%^< in ("A B C") do echo %%^< %%^= %%^>

The % is odd - You can define a FOR variable using %%%%. But The value cannot be accessed unless you use the ~ modifier. This means enclosing quotes cannot be preserved.

for /f "usebackq tokens=1,2" %%%% in ('"A"') do echo %%%% %%~%%

The above yields %% A

The ~ is a potentially dangerous FOR variable. If you attempt to access the variable using %%~ at the end of a line, you can get unpredictable results, and may even crash CMD.EXE! The only reliable way to access it without restrictions is to use %%~~, which of course strips any enclosing quotes.

for /f %%~ in ("A") do echo This can crash because its the end of line: %%~

for /f %%~ in ("A") do echo But this (%%~) should be safe

for /f %%~ in ("A") do echo This works even at end of line: %%~~

The <SUB> (0x1A) character is special because <SUB> literals embedded within batch scripts are read as linefeeds (<LF>). In order to use <SUB> as a FOR variable, the value must be somehow stored within an environment variable, and then %%%VAR% will work for both definition and access.

As already stated, a single FOR /F can parse and assign a maximum of 31 tokens. For example:

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%n in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-31" %%A in ("!str!") do echo A=%%A _=%%_

The above yields A=1 _=31 Note - tokens 2-30 work just fine, I just wanted a small example

Any attempt to parse and assign more than 31 tokens will silently fail without setting ERRORLEVEL.

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%n in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-32" %%A in ("!str!") do echo this example fails entirely

You can parse and assign up to 31 tokens and assign the remainder to another token as follows:

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%0 in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-31*" %%@ in ("!str!") do echo @=%%A  ^^=%%^^  _=%%_

The above yields @=1 ^=31 _=32 33 34 35

And now for the really bad news. A single FOR /F can never parse more than 31 tokens, as I learned when I looked at Number of tokens limit in a FOR command in a Windows batch script

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%n in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1,31,32" %%A in ("!str!") do echo A=%%A  B=%%B  C=%%C

The very unfortunate output is A=1 B=31 C=%C

Solution 2:

My answer is comprised of two parts. The first one is a new answer I posted in help-in-writing-a-batch-script-to-parse-csv-file-and-output-a-text-file question that have not any limit in the number of fields.

The second part is a modification to that answer that allows to select which fields will be extracted from the csv file by additional parameters placed after the file name. The modified code is in UPPERCASE LETTERS.

@echo off
setlocal EnableDelayedExpansion

rem Create heading array:
set /P headingRow=< %1
set i=0
for %%h in (%headingRow%) do (
    set /A i+=1
    set heading[!i!]=%%~h
)


REM SAVE FILE NAME AND CREATE TARGET ELEMENTS ARRAY:
SET FILENAME=%1
IF "%2" == "" (FOR /L %%J IN (1,1,%i%) DO SET TARGET[%%J]=%%J) & GOTO CONTINUE
SET J=0
:NEXTTARGET
    SHIFT
    IF "%1" == "" GOTO CONTINUE
    SET /A J+=1
    SET TARGET[%J%]=%1
GOTO NEXTTARGET
:CONTINUE


rem Process the file:
call :ProcessFile < %FILENAME%
exit /B

:ProcessFile
set /P line=
:nextLine
    set line=:EOF
    set /P line=
    if "!line!" == ":EOF" goto :EOF
    set i=0
    SET J=1
    for %%e in (%line%) do (
        set /A i+=1
        FOR %%J IN (!J!) DO SET TARGET=!TARGET[%%J]!
        IF !i! == !TARGET! (
            for %%i in (!i!) do echo !heading[%%i]!%%~e
            SET /A J+=1
        )
    )
goto nextLine
exit /B

For example:

EXTRACTCSVFIELDS THEFILE.CSV 7 12 15 18

EDIT A simpler method

Below is a new version that is both simpler and easier to understand because it use a list of target elements instead of an array:

@echo off
setlocal EnableDelayedExpansion

rem Create heading array:
set /P headingRow=< %1
set i=0
for %%h in (%headingRow%) do (
    set /A i+=1
    set heading[!i!]=%%~h
)

REM CREATE TARGET ELEMENTS LIST:
IF "%2" == "" (
    SET TARGETLIST=
    FOR /L %%J IN (1,1,%i%) DO SET TARGETLIST=!TARGETLIST! %%J
) ELSE (
    SET TARGETLIST=%*
    SET TARGETLIST=!TARGETLIST:* =!
)

rem Process the file:
call :ProcessFile < %1
exit /B

:ProcessFile
set /P line=
:nextLine
    set line=:EOF
    set /P line=
    if "!line!" == ":EOF" goto :EOF
    set i=0
    for %%e in (%line%) do (
        set /A i+=1
        for %%i IN (!i!) DO (
            IF "!TARGETLIST:%%i=!" NEQ "!TARGETLIST!" (
                echo !heading[%%i]!%%~e
            )
        )
    )
goto nextLine
exit /B

Also, this version does not require the desired fields be given in order.

EDIT

Oops! The for parameters stuff distracted my attention, so I was not aware of your last request:

"Ultimately I want to make a string which will be something like:

field7,field12,field15,field18"

Just modify the last part of the program to do that:

:ProcessFile
set /P line=
:nextLine
    set line=:EOF
    set /P line=
    if "!line!" == ":EOF" goto :EOF
    set i=0
    set resultString=
    for %%e in (%line%) do (
        set /A i+=1
        for %%i IN (!i!) DO (
            IF "!TARGETLIST:%%i=!" NEQ "!TARGETLIST!" (
                set resultString=!resultString!%%~e,
            )
        )
    )
    set resultString=%resultString:~0,-1%
    echo Process here the "%resultString%"
goto nextLine
exit /B

You may also remove the creation of the heading array, because you want NOT the headings! ;)

Solution 3:

Using %%@ and %%` (not documented here) as start variables the max you can get is 71:

@echo off
for /f "tokens=1-31* delims=," %%@ in ("%filename%") do (
    echo:
    echo  1=%%@
    echo  2=%%A
    echo  3=%%B
    echo  4=%%C
    echo  5=%%D
    echo  6=%%E
    echo  7=%%F
    echo  8=%%G
    echo  9=%%H
    echo 10=%%I
    echo 11=%%J
    echo 12=%%K
    echo 13=%%L
    echo 14=%%M
    echo 15=%%N
    echo 16=%%O
    echo 17=%%P
    echo 18=%%Q
    echo 19=%%R
    echo 20=%%S
    echo 21=%%T
    echo 22=%%U
    echo 23=%%V
    echo 24=%%W
    echo 25=%%X
    echo 26=%%Y
    echo 27=%%Z
    echo 28=%%[
    echo 29=%%\
    echo 30=%%]
    echo 31=%%^^
    for /F "tokens=1-30* delims=," %%` in ("%%_") do (
        echo 32=%%`
        echo 33=%%a
        echo 34=%%b
        echo 35=%%c
        echo 36=%%d
        echo 37=%%e
        echo 38=%%f
        echo 39=%%g
        echo 40=%%h
        echo 41=%%i
        echo 42=%%j
        echo 43=%%k
        echo 44=%%l
        echo 45=%%m
        echo 46=%%n
        echo 47=%%o
        echo 48=%%p
        echo 49=%%q
        echo 50=%%r
        echo 51=%%s
        echo 52=%%t
        echo 53=%%u
        echo 54=%%v
        echo 55=%%w
        echo 56=%%x
        echo 57=%%y
        echo 58=%%z
        echo 59=%%{
        echo 60=%%^|
        echo 61=%%}
        for /F "tokens=1-9* delims=," %%0 in ("%%~") do (
            echo 62=%%0
            echo 63=%%1
            echo 64=%%2
            echo 65=%%3
            echo 66=%%4
            echo 67=%%5
            echo 68=%%6
            echo 69=%%7
            echo 70=%%8
            echo 71=%%9
        )
    )
)