Pathname too long to open?

Solution 1:

Regular DOS paths are limited to MAX_PATH (260) characters, including the string's terminating NUL character. You can exceed this limit by using an extended-length path that starts with the \\?\ prefix. This path must be a Unicode string, fully qualified, and only use backslash as the path separator. Per Microsoft's file system functionality comparison, the maximum extended path length is 32760 characters. A individual file or directory name can be up to 255 characters (127 for the UDF filesystem). Extended UNC paths are also supported as \\?\UNC\server\share.

For example:

import os

def winapi_path(dos_path, encoding=None):
    if (not isinstance(dos_path, unicode) and 
        encoding is not None):
        dos_path = dos_path.decode(encoding)
    path = os.path.abspath(dos_path)
    if path.startswith(u"\\\\"):
        return u"\\\\?\\UNC\\" + path[2:]
    return u"\\\\?\\" + path

path = winapi_path(os.path.join(u"JSONFiles", 
                                item["category"],
                                item["action"], 
                                item["source"], 
                                fileName + ".json"))
>>> path = winapi_path("C:\\Temp\\test.txt")
>>> print path
\\?\C:\Temp\test.txt

See the following pages on MSDN:

  • Naming Files, Paths, and Namespaces
  • Defining an MS-DOS Device Name
  • Kernel object namespaces

Background

Windows calls the NT runtime library function RtlDosPathNameToRelativeNtPathName_U_WithStatus to convert a DOS path to a native NT path. If we open (i.e. CreateFile) the above path with a breakpoint set on the latter function, we can see how it handles a path that starts with the \\?\ prefix.

Breakpoint 0 hit
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus:
00007ff9`d1fb5880 4883ec58        sub     rsp,58h
0:000> du @rcx
000000b4`52fc0f60  "\\?\C:\Temp\test.txt"
0:000> r rdx
rdx=000000b450f9ec18
0:000> pt
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x66:
00007ff9`d1fb58e6 c3              ret

The result replaces \\?\ with the NT DOS devices prefix \??\, and copies the string into a native UNICODE_STRING:

0:000> dS b450f9ec18
000000b4`536b7de0  "\??\C:\Temp\test.txt"

If you use //?/ instead of \\?\, then the path is still limited to MAX_PATH characters. If it's too long, then RtlDosPathNameToRelativeNtPathName returns the status code STATUS_NAME_TOO_LONG (0xC0000106).

If you use \\?\ for the prefix but use slash in the rest of the path, Windows will not translate the slash to backslash for you:

Breakpoint 0 hit
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus:
00007ff9`d1fb5880 4883ec58        sub     rsp,58h
0:000> du @rcx
0000005b`c2ffbf30  "\\?\C:/Temp/test.txt"
0:000> r rdx
rdx=0000005bc0b3f068
0:000> pt
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x66:
00007ff9`d1fb58e6 c3              ret
0:000> dS 5bc0b3f068
0000005b`c3066d30  "\??\C:/Temp/test.txt"

Forward slash is a valid object name character in the NT namespace. It's reserved by Microsoft filesystems, but you can use a forward slash in other named kernel objects, which get stored in \BaseNamedObjects or \Sessions\[session number]\BaseNamedObjects. Also, I don't think the I/O manager enforces the policy on reserved characters in device and filenames. It's up to the device. Maybe someone out there has a Windows device that implements a namespace that allows forward slash in names. At the very least you can create DOS device names that contain a forward slash. For example:

>>> kernel32 = ctypes.WinDLL('kernel32')
>>> kernel32.DefineDosDeviceW(0, u'My/Device', u'C:\\Temp')
>>> os.path.exists(u'\\\\?\\My/Device\\test.txt')
True

You may be wondering what \?? signifies. This used to be an actual directory for DOS device links in the object namespace, but starting with NT 5 (or NT 4 w/ Terminal Services) this became a virtual prefix. The object manager handles this prefix by first checking the logon session's DOS device links in the directory \Sessions\0\DosDevices\[LOGON_SESSION_ID] and then checking the system-wide DOS device links in the \Global?? directory.

Note that the former is a logon session, not a Windows session. The logon session directories are all under the DosDevices directory of Windows session 0 (i.e. the services session in Vista+). Thus if you have a mapped drive for a non-elevated logon, you'll discover that it's not available in an elevated command prompt, because your elevated token is actually for a different logon session.

An example of a DOS device link is \Global??\C: => \Device\HarddiskVolume2. In this case the DOS C: drive is actually a symbolic link to the HarddiskVolume2 device.

Here's a brief overview of how the system handles parsing a path to open a file. Given we're calling WinAPI CreateFile, it stores the translated NT UNICODE_STRING in an OBJECT_ATTRIBUTES structure and calls the system function NtCreateFile.

0:000> g
Breakpoint 1 hit
ntdll!NtCreateFile:
00007ff9`d2023d70 4c8bd1          mov     r10,rcx
0:000> !obja @r8
Obja +000000b450f9ec58 at 000000b450f9ec58:
        Name is \??\C:\Temp\test.txt
        OBJ_CASE_INSENSITIVE

NtCreateFile calls the I/O manager function IoCreateFile, which in turn calls the undocumented object manager API ObOpenObjectByName. This does the work of parsing the path. The object manager starts with \??\C:\Temp\test.txt. Then it replaces that with \Global??\C:Temp\test.txt. Next it parses up to the C: symbolic link and has to start over (reparse) the final path \Device\HarddiskVolume2\Temp\test.txt.

Once the object manager gets to the HarddiskVolume2 device object, parsing is handed off to the I/O manager, which implements the Device object type. The ParseProcedure of an I/O Device creates the File object and an I/O Request Packet (IRP) with the major function code IRP_MJ_CREATE (an open/create operation) to be processed by the device stack. This is sent to the device driver via IoCallDriver. If the device implements reparse points (e.g. junction mountpoints, symbolic links, etc) and the path contains a reparse point, then the resolved path has to be resubmitted to the object manager to be parsed from the start.

The device driver will use the SeChangeNotifyPrivilege (almost always present and enabled) of the process token (or thread if impersonating) to bypass access checks while traversing directories. However, ultimately access to the device and target file has to be allowed by a security descriptor, which is verified via SeAccessCheck. Except simple filesystems such as FAT32 don't support file security.

Solution 2:

below is Python 3 version regarding @Eryk Sun's solution.

def winapi_path(dos_path, encoding=None):
    if (not isinstance(dos_path, str) and encoding is not None): 
        dos_path = dos_path.decode(encoding)
    path = os.path.abspath(dos_path)
    if path.startswith(u"\\\\"):
        return u"\\\\?\\UNC\\" + path[2:]
    return u"\\\\?\\" + path

#Python 3 renamed the unicode type to str, the old str type has been replaced by bytes. NameError: global name 'unicode' is not defined - in Python 3