Pathname too long to open?
Solution 1:
Regular DOS paths are limited to MAX_PATH
(260) characters, including the string's terminating NUL
character. You can exceed this limit by using an extended-length path that starts with the \\?\
prefix. This path must be a Unicode string, fully qualified, and only use backslash as the path separator. Per Microsoft's file system functionality comparison, the maximum extended path length is 32760 characters. A individual file or directory name can be up to 255 characters (127 for the UDF filesystem). Extended UNC paths are also supported as \\?\UNC\server\share
.
For example:
import os
def winapi_path(dos_path, encoding=None):
if (not isinstance(dos_path, unicode) and
encoding is not None):
dos_path = dos_path.decode(encoding)
path = os.path.abspath(dos_path)
if path.startswith(u"\\\\"):
return u"\\\\?\\UNC\\" + path[2:]
return u"\\\\?\\" + path
path = winapi_path(os.path.join(u"JSONFiles",
item["category"],
item["action"],
item["source"],
fileName + ".json"))
>>> path = winapi_path("C:\\Temp\\test.txt")
>>> print path
\\?\C:\Temp\test.txt
See the following pages on MSDN:
- Naming Files, Paths, and Namespaces
- Defining an MS-DOS Device Name
- Kernel object namespaces
Background
Windows calls the NT runtime library function RtlDosPathNameToRelativeNtPathName_U_WithStatus
to convert a DOS path to a native NT path. If we open
(i.e. CreateFile
) the above path with a breakpoint set on the latter function, we can see how it handles a path that starts with the \\?\
prefix.
Breakpoint 0 hit
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus:
00007ff9`d1fb5880 4883ec58 sub rsp,58h
0:000> du @rcx
000000b4`52fc0f60 "\\?\C:\Temp\test.txt"
0:000> r rdx
rdx=000000b450f9ec18
0:000> pt
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x66:
00007ff9`d1fb58e6 c3 ret
The result replaces \\?\
with the NT DOS devices prefix \??\
, and copies the string into a native UNICODE_STRING
:
0:000> dS b450f9ec18
000000b4`536b7de0 "\??\C:\Temp\test.txt"
If you use //?/
instead of \\?\
, then the path is still limited to MAX_PATH
characters. If it's too long, then RtlDosPathNameToRelativeNtPathName
returns the status code STATUS_NAME_TOO_LONG
(0xC0000106).
If you use \\?\
for the prefix but use slash in the rest of the path, Windows will not translate the slash to backslash for you:
Breakpoint 0 hit
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus:
00007ff9`d1fb5880 4883ec58 sub rsp,58h
0:000> du @rcx
0000005b`c2ffbf30 "\\?\C:/Temp/test.txt"
0:000> r rdx
rdx=0000005bc0b3f068
0:000> pt
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x66:
00007ff9`d1fb58e6 c3 ret
0:000> dS 5bc0b3f068
0000005b`c3066d30 "\??\C:/Temp/test.txt"
Forward slash is a valid object name character in the NT namespace. It's reserved by Microsoft filesystems, but you can use a forward slash in other named kernel objects, which get stored in \BaseNamedObjects
or \Sessions\[session number]\BaseNamedObjects
. Also, I don't think the I/O manager enforces the policy on reserved characters in device and filenames. It's up to the device. Maybe someone out there has a Windows device that implements a namespace that allows forward slash in names. At the very least you can create DOS device names that contain a forward slash. For example:
>>> kernel32 = ctypes.WinDLL('kernel32')
>>> kernel32.DefineDosDeviceW(0, u'My/Device', u'C:\\Temp')
>>> os.path.exists(u'\\\\?\\My/Device\\test.txt')
True
You may be wondering what \??
signifies. This used to be an actual directory for DOS device links in the object namespace, but starting with NT 5 (or NT 4 w/ Terminal Services) this became a virtual prefix. The object manager handles this prefix by first checking the logon session's DOS device links in the directory \Sessions\0\DosDevices\[LOGON_SESSION_ID]
and then checking the system-wide DOS device links in the \Global??
directory.
Note that the former is a logon session, not a Windows session. The logon session directories are all under the DosDevices
directory of Windows session 0 (i.e. the services session in Vista+). Thus if you have a mapped drive for a non-elevated logon, you'll discover that it's not available in an elevated command prompt, because your elevated token is actually for a different logon session.
An example of a DOS device link is \Global??\C:
=> \Device\HarddiskVolume2
. In this case the DOS C:
drive is actually a symbolic link to the HarddiskVolume2
device.
Here's a brief overview of how the system handles parsing a path to open a file. Given we're calling WinAPI CreateFile
, it stores the translated NT UNICODE_STRING
in an OBJECT_ATTRIBUTES
structure and calls the system function NtCreateFile
.
0:000> g
Breakpoint 1 hit
ntdll!NtCreateFile:
00007ff9`d2023d70 4c8bd1 mov r10,rcx
0:000> !obja @r8
Obja +000000b450f9ec58 at 000000b450f9ec58:
Name is \??\C:\Temp\test.txt
OBJ_CASE_INSENSITIVE
NtCreateFile
calls the I/O manager function IoCreateFile
, which in turn calls the undocumented object manager API ObOpenObjectByName
. This does the work of parsing the path. The object manager starts with \??\C:\Temp\test.txt
. Then it replaces that with \Global??\C:Temp\test.txt
. Next it parses up to the C:
symbolic link and has to start over (reparse) the final path \Device\HarddiskVolume2\Temp\test.txt
.
Once the object manager gets to the HarddiskVolume2
device object, parsing is handed off to the I/O manager, which implements the Device
object type. The ParseProcedure
of an I/O Device
creates the File
object and an I/O Request Packet (IRP) with the major function code IRP_MJ_CREATE
(an open/create operation) to be processed by the device stack. This is sent to the device driver via IoCallDriver
. If the device implements reparse points (e.g. junction mountpoints, symbolic links, etc) and the path contains a reparse point, then the resolved path has to be resubmitted to the object manager to be parsed from the start.
The device driver will use the SeChangeNotifyPrivilege
(almost always present and enabled) of the process token (or thread if impersonating) to bypass access checks while traversing directories. However, ultimately access to the device and target file has to be allowed by a security descriptor, which is verified via SeAccessCheck
. Except simple filesystems such as FAT32 don't support file security.
Solution 2:
below is Python 3 version regarding @Eryk Sun's solution.
def winapi_path(dos_path, encoding=None):
if (not isinstance(dos_path, str) and encoding is not None):
dos_path = dos_path.decode(encoding)
path = os.path.abspath(dos_path)
if path.startswith(u"\\\\"):
return u"\\\\?\\UNC\\" + path[2:]
return u"\\\\?\\" + path
#Python 3 renamed the unicode type to str, the old str type has been replaced by bytes. NameError: global name 'unicode' is not defined - in Python 3