Get list of processes on Windows in a charset-safe way
This post gives a solution to retrieve the list of running processes under Windows. In essence it does:
String cmd = System.getenv("windir") + "\\system32\\" + "tasklist.exe";
Process p = Runtime.getRuntime().exec(cmd);
InputStreamReader isr = new InputStreamReader(p.getInputStream());
BufferedReader input = new BufferedReader(isr);
then reads the input.
It looks and works great but I was wondering if there is a possibility that the charset used by tasklist might not be the default charset and that this call could fail?
For example this other question about a different executable shows that it could cause some issues.
If that is the case, is there a way to determine what the appropriate charset would be?
Can break this into 2 parts:
-
The windows part
From java you're executing a Windows command - externally to the jvm in "Windows land". When java Runtime class executes a windows command, it uses the DLL for consoles & so appears to windows as if the command is running in a console
Q: When I run C:\windows\system32\tasklist.exe in a console, what is the character encoding ("code page" in windows terminology) of the result?- windows "chcp" command with no argument gives the active code page number for the console (e.g. 850 for Multilingual-Latin-1, 1252 for Latin-1). See Windows Microsoft Code Pages, Windows OEM Code Pages, Windows ISO Code Pages
The default system code page is originally setup according to your system locale (type systeminfo to see this or Control Panel-> Region and Language). - the windows OS/.NET function getACP() also gives this info
- windows "chcp" command with no argument gives the active code page number for the console (e.g. 850 for Multilingual-Latin-1, 1252 for Latin-1). See Windows Microsoft Code Pages, Windows OEM Code Pages, Windows ISO Code Pages
-
The java part:
How do I decode a java byte stream from the windows code page of "x" (e.g. 850 or 1252)?- the full mapping between windows code page numbers and equivalent java charset names can be derived from here - Code Page Identifiers (Windows)
- However, in practice one of the following prefixes can be added to achieve the mapping:
"" (none) for ISO, "IBM" or "x-IBM" for OEM, "windows-" OR "x-windows-" for Microsoft/Windows.
E.g. ISO-8859-1 or IBM850 or windows-1252
Full Solution:
String cmd = System.getenv("windir") + "\\system32\\" + "chcp.com";
Process p = Runtime.getRuntime().exec(cmd);
// Use default charset here - only want digits which are "core UTF8/UTF16";
// ignore text preceding ":"
String windowsCodePage = new Scanner(
new InputStreamReader(p.getInputStream())).skip(".*:").next();
Charset charset = null;
String[] charsetPrefixes =
new String[] {"","windows-","x-windows-","IBM","x-IBM"};
for (String charsetPrefix : charsetPrefixes) {
try {
charset = Charset.forName(charsetPrefix+windowsCodePage);
break;
} catch (Throwable t) {
}
}
// If no match found, use default charset
if (charset == null) charset = Charset.defaultCharset();
cmd = System.getenv("windir") + "\\system32\\" + "tasklist.exe";
p = Runtime.getRuntime().exec(cmd);
InputStreamReader isr = new InputStreamReader(p.getInputStream(), charset);
BufferedReader input = new BufferedReader(isr);
// Debugging output
System.out.println("matched codepage "+windowsCodePage+" to charset name:"+
charset.name()+" displayName:"+charset.displayName());
String line;
while ((line = input.readLine()) != null) {
System.out.println(line);
}
Thanks for the Q! - was fun.
Actually, the charset used by tasklist
is always different from the system default.
On the other hand, it's quite safe to use the default as long as the output is limited to ASCII. Usually executable modules have only ASCII characters in their names.
So to get the correct Strings, you have to convert (ANSI) Windows code page to OEM code page, and pass the latter as charset to InputStreamReader
.
It seems there's no comprehensive mapping between the these encodings. The following mapping can be used:
Map<String, String> ansi2oem = new HashMap<String, String>();
ansi2oem.put("windows-1250", "IBM852");
ansi2oem.put("windows-1251", "IBM866");
ansi2oem.put("windows-1252", "IBM850");
ansi2oem.put("windows-1253", "IBM869");
Charset charset = Charset.defaultCharset();
String streamCharset = ansi2oem.get(charset.name());
if (streamCharset) {
streamCharset = charset.name();
}
InputStreamReader isr = new InputStreamReader(p.getInputStream(),
streamCharset);
This approach worked for me with windows-1251
and IBM866
pair.
To get the current OEM encoding used by Windows, you can use GetOEMCP
function. The return value depends on Language for non-Unicode programs setting on Administrative tab in Region and Language control panel. Reboot is required to apply the change.
There are two kinds of encodings on Windows: ANSI and OEM.
The former is used by non-Unicode applications running in GUI mode.
The latter is used by Console applications. Console applications cannot display characters that cannot be represented in the current OEM encoding.
Since tasklist
is console mode application, its output is always in the current OEM encoding.
For English systems, the pair is usually Windows-1252 and CP850.
As I am in Russia, my system has the following encodings: Windows-1251 and CP866.
If I capture output of tasklist
into a file, the file can't display Cyrillic characters correctly:
I get
ЏаЁўҐв
instead ofПривет
(Hi!) when viewed in Notepad.
AndµTorrent
is displayed asзTorrent
.
You cannot change the encoding used by tasklist
.
However it's possible to change the output encoding of cmd
. If you pass /u
switch to it, it will output everything in UTF-16 encoding.
cmd /c echo Hi>echo.txt
The size of echo.txt
is 4 bytes: two bytes for Hi
and two bytes for new line (\r
and \n
).
cmd /u /c echo Hi>echo.txt
Now the size of echo.txt
is 8 bytes: each character is represented with two bytes.
Why not use the Windows API via JNA, instead of spawning processes? Like this:
import com.sun.jna.platform.win32.Kernel32;
import com.sun.jna.platform.win32.Tlhelp32;
import com.sun.jna.platform.win32.WinDef;
import com.sun.jna.platform.win32.WinNT;
import com.sun.jna.win32.W32APIOptions;
import com.sun.jna.Native;
public class ListProcesses {
public static void main(String[] args) {
Kernel32 kernel32 = (Kernel32) Native.loadLibrary(Kernel32.class, W32APIOptions.UNICODE_OPTIONS);
Tlhelp32.PROCESSENTRY32.ByReference processEntry = new Tlhelp32.PROCESSENTRY32.ByReference();
WinNT.HANDLE snapshot = kernel32.CreateToolhelp32Snapshot(Tlhelp32.TH32CS_SNAPPROCESS, new WinDef.DWORD(0));
try {
while (kernel32.Process32Next(snapshot, processEntry)) {
System.out.println(processEntry.th32ProcessID + "\t" + Native.toString(processEntry.szExeFile));
}
}
finally {
kernel32.CloseHandle(snapshot);
}
}
}
I posted a similar answer elsewhere.