How to extract a complete list of extension types within a directory?

Within a directory, and recursively within it's sub-directories, meaning every directory within a directory is processed, how do I compile a complete list of unique extensions within the directory?

OS is Windows XP with all the current updates, but I okay running script if I'm able to tell what it's doing, though I would prefer not to have to install dot-net, since I really do not like it.


This batch script will do it.

@echo off

set target=%~1
if "%target%"=="" set target=%cd%

setlocal EnableDelayedExpansion

set LF=^


rem Previous two lines deliberately left blank for LF to work.

for /f "tokens=*" %%i in ('dir /b /s /a:-d "%target%"') do (
    set ext=%%~xi
    if "!ext!"=="" set ext=FileWithNoExtension
    echo !extlist! | find "!ext!:" > nul
    if not !ERRORLEVEL! == 0 set extlist=!extlist!!ext!:
)

echo %extlist::=!LF!%

endlocal

Save it as any .bat file, and run it with the command batchfile (substitute whatever you named it) to list the current directory, or specify a path with batchfile "path". It will search all subdirectories.

If you want to export to a file, use batchfile >filename.txt (or batchfile "path" >filename.txt).

Explanation

Everything before the for /f... line just sets things up: it gets the target directory to search, enables delayed expansion which lets me do update variables in the loop and defines a newline (LF) that I can use for neater output. Oh, and the %~1 means "get the first argument, removing quotes" which prevents doubled-up quotes - see for /?.

The loop uses that dir /b /s /a:-d "%target%" command, grabbing a list of all files in all subdirectories under the target.

%%~xi extracts the extension out of the full paths the dir command returns.

An empty extension is replaced with "FileWithNoExtension", so you know there is such a file - if I added an empty line instead, it's not quite as obvious.

The whole current list if sent through a find command, to ensure uniqueness. The text output of the find command is sent to nul, essentially a black hole - we don't want it. Since we always append a : at the end of the list, we should also make sure the search query ends with a : so it doesn't match partial results - see comments.

%ERRORLEVEL% is set by the find command, a value of 0 indicates there was a match. So if it's not 0, the current extension is not on the list so far and should be added.

The echo line basically outputs, and I also replace my placeholders (:) with newlines to make it look nice.


Although not strictly meeting the requirement for a batch script, I have used a single-line PowerShell script:

Get-Childitem C:\MyDirectory -Recurse | WHERE { -NOT $_.PSIsContainer } | Group Extension -NoElement | Sort Count -Desc > FileExtensions.txt

You could potentially run it from the command line/batch file:

Powershell -Command "& Get-Childitem C:\MyDirectory -Recurse | WHERE { -NOT $_.PSIsContainer } | Group Extension -NoElement | Sort Count -Desc > FileExtensions.txt"

If you remove C:\MyDirectory it will execute in the current directory.

Edit 2021-04-20: As per the comment from @ManSamVampire, if you want to find hidden files as well, you should add -Force before -Recurse in the above command.

At the end it will produce a FileExtensions.txt containing something like the following:

+-------+------+
| Count | Name |
+-------+------+
| ----- | ---- |
| 8216  | .xml |
| 4854  | .png |
| 4378  | .dll |
| 3565  | .htm |
| ...   | ...  |
+-------+------+

Depending on your folder structure, you may occasionally get errors notifying you that you have a long path.

Get-ChildItem : The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.

Any subdirectories in there will also not be parsed but the results for everything else will still show.

Notes

You will of course need PowerShell which you can grab from here. It can also run on multiple operating systems.


Here's a detailed answer using PowerShell (with Windows XP you'll have to install PowerShell):

Hey, Scripting Guy! How Can I Use Windows PowerShell to Pick Out the Unique File Extensions Used in a Collection of Files?


To list all unique extensions from cmd under the path your on use:

Powershell -Command "Get-ChildItem . -Include *.* -Recurse | Select-Object Extension | Sort-Object -Property Extension -Unique"