Combine Batch/WMIC + ANSI/UNICODE Output formatting

In creating an auditing tool for my network, I'm finding that WMIC is outputting with spaces in between each character when accompanied by echoing regular text. For example,

This:

@echo off
echo Foo >> "C:\test.txt"
wmic CPU Get AddressWidth >> "C:\test.txt"
wmic CPU Get Description >> "C:\test.txt"

Returns this:

Foo 
A d d r e s s W i d t h     

 6 4                         

 D e s c r i p t i o n                                                       

 I n t e l 6 4   F a m i l y   6   M o d e l   6 9   S t e p p i n g   1     

If I remove (rem) the echo Foo line, the output is formatted nicely since there is only one output type:

AddressWidth  
64            
Description                           
Intel64 Family 6 Model 69 Stepping 1  

I'm reading that this is because WMIC outputs to UNICODE, while standard batch commands output to ANSI. Can both be joined to share a common format? Can someone please explain in more depth the different format types, why WMIC would output to a different type, and/or any other contributing factors to this output? I've found some bread crumbs, but nothing concrete.


Solution 1:

Pipe the output from Wmic through more:
wmic CPU Get AddressWidth |more >> "C:\test.txt"

Edit for some more background: the issue you see is due to wmic output being unicode utf-16. This means that each character (or more correctly, most of them) is encoded in two bytes. wmic also puts a so called BOM (byte order mark) at the beginning of the output. See byte content below:

FF FE 44 00 65 00 73 00-63 00 72 00 69 00 70 00 ..D.e.s.c.r.i.p.

Those first two bytes (FF FE) specify endianness for UTF-16 and allow data processing tools to recognize the encoding [being UTF-16 little endian].
Obviously type does this check and if it finds the BOM then properly recognizes the encoding.
On the other hand, if you first echo text and then append Wmic output - there is no BOM at the beginning and you can see inconsistent encoding:
74 65 78 74 20 0D 0A 44-00 65 00 73 00 63 00 72 text ..D.e.s.c.r

If you put it through type it cannot infer how to interpret, /most likely/ assumes single byte ('ANSI') and this results in spaces produced for non printable characters (zeros, being in fact high order bytes of two byte character encoding).

more handles more (pun intended) cases and produces correct output for basic ASCII chars that's why it's commonly used as a hack for this purpose.

One additional note: some editors (notepad being simplest example) will properly display utf-16 encoded file if it is consistent - even without BOM. There is a way to force echo to produce unicode output (but beware it does not produce BOM) - using cmd /u causes output for internal commands to be unicode.

I can't really say why cmd unicode support is so limited (or as most would say - broken...) - probably historical/compatibility issues.

Last thing - if you need better unicode support (among many other benefits) I would recommend migrating to powershell

Solution 2:

The more command does not seem to do the conversion well. Note the double CR (\r) in the x2.txt output file.

C:>wmic diskdrive where "model = 'HGST HTS725050A7E630 ATA Device'" get index >x1.txt
C:>wmic diskdrive where "model = 'HGST HTS725050A7E630 ATA Device'" get index | more >x2.txt
C:>odd x1.txt
000000    ff    fe    49    00    6e    00    64    00    65    00    78    00    20    00    20    00
       377 376   I  \0   n  \0   d  \0   e  \0   x  \0      \0      \0
000010    0d    00    0a    00    30    00    20    00    20    00    20    00    20    00    20    00
        \r  \0  \n  \0   0  \0      \0      \0      \0      \0      \0
000020    20    00    0d    00    0a    00
            \0  \r  \0  \n  \0
000026

C:>odd x2.txt
000000    49    6e    64    65    78    20    20    0d    0d    0a    30    20    20    20    20    20
         I   n   d   e   x          \r  \r  \n   0
000010    20    0d    0d    0a    0d    0d    0a    0d    0a
            \r  \r  \n  \r  \r  \n  \r  \n

Update It appears that PowerShell may handle this better.

Get-WmiObject Win32_diskdrive |
    Where-Object { $_.Model -like '*WD*' } |
    Select-Object -Property Model |
    Out-File -PSPath t1.txt

Get-WmiObject Win32_diskdrive |
    Where-Object { $_.Model -like '*WD*' } |
    Select-Object -Property Model |
    Out-File -PSPath t2.txt -Encoding default

It is clear that CIM is the direction PowerShell is going in the future. Better to start using it now.

Get-CimInstance CIM_DiskDrive |
    Where-Object { $_.Model -like '*WD*' } |
    Select-Object -Property Model |
    Out-File -PSPath t1.txt

Get-CimInstance CIM_DiskDrive |
    Where-Object { $_.Model -like '*WD*' } |
    Select-Object -Property Model |
    Out-File -PSPath t2.txt -Encoding default