Why Powershell is so slow?
Solution 1:
PowerShell is a program written in .Net, but it leverages interfaces to many different interpreters and run-times when it's actually running. It's a Shell, so just like BASH, even though it is written in C, that says nothing about the binaries and scripts executed within it. Executables might be .Net code, VDM/CMD commands, *nix shell commands, VB/C/WSScript, WMI invocations, unmanaged API interfaces, jar files, or anything else. These choices are what affect the performance of code running within the shell, not the language the shell is written in.
Now, it sounds like you are having difficulties with the implementation of a specific command. So the better question is, why is ls
slow to sort when invoked from within PowerShell. When we dig deeper, we find that ls
is an alias for 'Get-ChildItem' which returns an object array containing System.IO.DirectoryInfo objects.
PS C:\Windows\system32> $x=Get-ChildItem ./
PS C:\Windows\system32> $x.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
PS C:\Windows\system32> $x[1].GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True DirectoryInfo System.IO.FileSystemInfo
PS C:\Windows\system32>
You can retrieve the ls
result, and then pipe that into a Sort-Object
call and it will behave largely the way an IEnumerable does.
Note that IEnumerable doesn't do anything for performance. You may be confusing it with IQueryable, which defines but does not perform a query until the very last second, presumably after it has been decorated with filtering and sorting operations, the way .Net does via LinQ to Objects. In this case, since Get-ChildItem does not offer an optimized query engine or indexed datasource, you cannot really compare modern database operations with directory listings.
So, ultimately, try something like:
ls ./ -recurse | Sort-Object Name -descending
For me, targeting System32, this takes about 20 seconds to process and sort 54430 files.
Finally, note, that you take a big performance hit when you try to enumerate a directory that you don't personally have access to, so make sure you are not recursing into places you are not allowed to go, or you will suffer a 2+ second wait for each.
Hope that helps.
Solution 2:
PowerShell is built to be convenient rather than fast. It's a tradeoff - it does work behind the scenes, so the user has to do less. Doing more work makes it slower.
See that your PowerShell code is one line, to do more than your C# code does in 15 lines.
It does more - even though you aren't using that.
ls
on Linux returns strings, strings are simple and fast. Your .Net code doesn't even keep the filename it just keeps the size, and numbers are smaller again so that's even faster.
ls
in PowerShell, returns [FileInfo] and [DirectoryInfo] objects - each one has to be created, and each one has to query the file to fill in the other fields like CreationTime and LastWriteTime and Extension and Length, and the time fields have to create [DateTime] objects.
That's a lot slower for every file. That costs to enable other options, even when you aren't using them - your PowerShell code could change to take the size of the first 10 files made in January with a simple change, no other cmdlets or tools, and still be one line, the C# code would have to be extensively rewritten, query the creation time, carry both creation time and size out to the sort, and so on.
The reason you don't see results immediately is because you | sort
. That makes it impossible. What if you started outputting results immediately but the last file found needs to sort to the front? Then the output would be wrong - IEnumerable can do nothing about this, | sort
has to gather up every input before it can output anything. Your sort is faster because it's sorting small things
Your .Net code can do the sorting itself more quickly because it's sorting an enumerable of [long], it doesn't have to do any property lookups.
Overall, your code does a lot less, and doing less takes less time. But it took you longer to write and is less flexible and more narrowly focused. A tradeoff.