FTP directory partial listing with wildcards
First I asked that: ftp directory listing timeout. Huge number of subdirs. I got the answer.
Still because I can have hundred thousands of FTP objects in the directory it could take really long time to scan that. However I thought it might be possible to retrieve all the objects that begin with 'A' and then 'B' and so on... As it retrieves directories it could start processing them on the other thread without waiting till it gets the entire list.
Is it possible to do FTP directory listing with wildcards using standard FtpWebRequest
?
Solution 1:
The most recent update to the FTP specification (RFC 3659) explicitly forbids it. From section 2.2.2 of that specification, titled "Wildcarding" (emphasis mine):
For the commands defined in this specification, all pathnames are to be treated literally. That is, for a pathname given as a parameter to a command, the file whose name is identical to the pathname given is implied. No characters from the pathname may be treated as special or "magic", thus no pattern matching (other than for exact equality) between the pathname given and the files present in the NVFS of the server-FTP is permitted.
Clients that desire some form of pattern matching functionality must obtain a listing of the relevant directory, or directories, and implement their own file name selection procedures.
That said, if your server supports it, you could still use the FtpWebRequest
class, but you'd have to process the response yourself to handle the list of items, as the .NET classes won't understand your server-specific extensions.
Solution 2:
The FTP specification says that the argument to file listing commands (LIST
, NLIST
, MLSD
, etc) is a pathname. So there should be NO wildcard, whatsoever.
RFC 959 (LIST
+ NLIST
):
2.2. TERMINOLOGY
...
pathname
Pathname is defined to be the character string which must be input to a file system by a user in order to identify a file. Pathname normally contains device and/or directory names, and file name specification. FTP does not yet specify a standard pathname convention. Each user must follow the file naming conventions of the file systems involved in the transfer.
...
5.3.1. FTP COMMANDS
...
LIST [<SP> <pathname>] <CRLF>
NLST [<SP> <pathname>] <CRLF>
RFC 3659 (MLSD
):
2.2.2. Wildcarding
For the commands defined in this specification, all pathnames are to be treated literally. That is, for a pathname given as a parameter to a command, the file whose name is identical to the pathname given is implied. No characters from the pathname may be treated as special or "magic", thus no pattern matching (other than for exact equality) between the pathname given and the files present in the NVFS of the server-FTP is permitted.
...
7.1. Format of MLSx Requests
...
The syntax for the MLSx command is:
mlst = "MLst" [ SP pathname ] CRLF mlsd = "MLsD" [ SP pathname ] CRLF
In practice though many FTP servers do support wildcards in the argument. But as the specification does not allow that, there's obviously no set standard for the wildcards supported.
vsftpd supports *
, ?
and {}
with the LIST
. vsftpd does not support the modern MLSD
.
proftpd supports *
, ?
and []
. But for the LIST
only. It explicitly does not allow wildcards with the modern MLSD
with a comment:
RFC3659 explicitly does NOT support glob characters. So warn about this, but let the command continue as is.
pureftpd supports *
, ?
and []
for both the LIST
and the MLSD
.
FileZilla server supports *
only for both the LIST
and the MLSD
.
But in general, you should not rely on the FTP server to support any wildcards at all.
The only reliable approach to is to retrieve a complete directory listing and filter the files locally. For example you can use a regular expression (the Regex
class)