How do I use awk under cygwin to print fields from an excel spreadsheet?

Solution 1:

The combination of the following VBS and shell scripts create a CSV file for each sheet in the Excel spreadsheet:

$ cat xls2csv.vbs
csv_format = 6

Dim strFilename
Dim objFSO
Set objFSO = CreateObject("scripting.filesystemobject")
strFilename = objFSO.GetAbsolutePathName(WScript.Arguments(0))
If objFSO.fileexists(strFilename) Then
  Call Writefile(strFilename)
Else
  wscript.echo "no such file!"
End If
Set objFSO = Nothing

Sub Writefile(ByVal strFilename)
Dim objExcel
Dim objWB
Dim objws

Set objExcel = CreateObject("Excel.Application")
Set objWB = objExcel.Workbooks.Open(strFilename)

For Each objws In objWB.Sheets
  objws.Copy
  objExcel.ActiveWorkbook.SaveAs objWB.Path & "\" & objws.Name & ".csv", csv_format
  objExcel.ActiveWorkbook.Close False
Next

objWB.Close False
objExcel.Quit
Set objExcel = Nothing
End Sub

.

$ cat xls2csv
PATH="$HOME:$PATH"

# the original XLS input file path components
inXlsPath="$1"
inXlsDir=$(dirname "$inXlsPath")
xlsFile=$(basename "$inXlsPath")
xlsBase="${xlsFile%.*}"

# The tmp dir we'll copy the XLS to and run the tool on
# to get the CSVs generated
tmpXlsDir="/usr/tmp/${xlsBase}.$$"
tmpXlsPath="${tmpXlsDir}/${xlsFile}"
absXlsPath="C:/cygwin64/${tmpXlsPath}" # need an absolute path for VBS to work

mkdir -p "$tmpXlsDir"

trap 'rm -f "${tmpXlsDir}/${xlsFile}"; rmdir "$tmpXlsDir"; exit' 0

cp "$inXlsPath" "$tmpXlsDir"

cygstart "$HOME/xls2csv.vbs" "$absXlsPath"

printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be created:\n" >&2
while [ ! -f "${tmpXlsDir}/~\$${xlsFile}" ]
do
    # VBS is done when this tmp file is created and later removed
    printf "." >&2
    sleep 1
done
printf " Done.\n" >&2

printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be removed:\n" >&2
while [ -f "${tmpXlsDir}/~\$${xlsFile}" ]
do
    # VBS is done when this tmp file is removed
    printf "." >&2
    sleep 1
done
printf " Done.\n" >&2

numFiles=0
for file in "$tmpXlsDir"/*.csv
do
    numFiles=$(( numFiles + 1 ))
done

if (( numFiles >= 1 ))
then
    outCsvDir="${inXlsDir}/${xlsBase}.csvs"
    mkdir -p "$outCsvDir"
    mv "$tmpXlsDir"/*.csv "$outCsvDir"
fi

Now we execute the shell script which internally calls cygstart to run the VBS script to generate the CSV files (one per sheet) in a subdirectory under the same directory where the Excel file exists named based on the Excel file name (e.g. Excel file staff.xlsx produces CSVs directory staff.csvs):

$ ./xls2csv "$D/staff.xlsx"
Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be created:
.. Done.
Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be removed:
. Done.

There is only one sheet with the default name Sheet1 in the target Excel file "$D/staff.xlsx" so the output of the above is a file "$D/staff.csvs/Sheet1.csv":

$ cat "$D/staff.csvs/Sheet1.csv"
Name,Position
Sue,Manager
Bill,Secretary
Pat,Engineer

$ awk -F, -v name="Bill" '$1==name{print $2}' "$D/staff.csvs/Sheet1.csv"
Secretary

Also see What's the most robust way to efficiently parse CSV using awk? for how to then operate on those CSVs.

See also https://stackoverflow.com/a/58879683/1745001 for how to do the opposite, i.e. call a cygwin bash command from a Windows batch file.