Select-Object -First affects prior cmdlet in the pipeline

The PowerShell Strongly Encouraged Development Guidelines that cmdlets should Implement for the Middle of a Pipeline but I suspect that isn't doable for a parameter as -Last for the Select-Object. Simply because you can't determine the last entry upfront. In other words: you will need to wait for the input stream to finish until you define the last entry.
To prove this, I wrote a little script:

$Data = 1..5 | ForEach-Object {[pscustomobject]@{Index = "$_"}}

$Data | ForEach-Object { Write-Host 'Before' $_.Index; $_ } |
Select-Object -Last 5 | ForEach-Object { Write-Host 'After' $_.Index }

and compared this to Select-Object *:

$Data | ForEach-Object { Write-Host 'Before' $_.Index; $_ } |
Select-Object * | ForEach-Object { Write-Host 'After' $_.Index }

With results (right: Select-Object -Last 5, left: Select-Object *):

-Last 5  *
-------  -
Before 1 Before 1
Before 2 After 1
Before 3 Before 2
Before 4 After 2
Before 5 Before 3
After 1  After 3
After 2  Before 4
After 3  After 4
After 4  Before 5
After 5  After 5

Despite this isn't documented I think that I can conclude from this that the -Last parameter indeed chokes the pipeline.
This is not a big deal, but I also tested it against the -First parameter and got some disturbing results. To better show this, I am not selecting all the objects but just the **-First 2**:

$Data | ForEach-Object { Write-Host 'Before' $_.Index; $_ } |
Select-Object -First 2 | ForEach-Object { Write-Host 'After' $_.Index }

Before 1
After 1
Before 2
After 2

Note that with the -First 2 parameter not only the following cmdlet shows two objects but also the preceding cmdlet (ForEach-Object { Write-Host 'Before' $_.Index; $_ }) shows only 2 objects (instead of 5).

Apparently, the -First parameter references directly into the object of the prior cmdlet which is different then e.g. using the -Last 2 parameter:

$Data | ForEach-Object { Write-Host 'Before' $_.Index; $_ } |
Select-Object -Last 2 | ForEach-Object { Write-Host 'After' $_.Index }

Before 1
Before 2
Before 3
Before 4
Before 5
After 4
After 5

This also happens when using the Out-Host instead of the Write-Host cmdlet or sending the results to a variable, like:

$Before = ""; $After = ""
$Data | ForEach-Object { $Before += $_.Index; $_ } | Select-Object -First 2 | ForEach-Object { $After += $_.Index }
$Before
$After

This shows on both Windows Powershell (5.1.18362.628) and PowerShell Core (7.0.0).
Is this a bug?


Solution 1:

Select-Object affects the upstream commands by cheating

That might sound like a joke, but it's not.

To optimize pipeline streaming performance, Select-Object uses a trick not available to a regular user developing a Cmdlet - it throws a StopUpstreamCommandsException.

Once caught, the runtime (indirectly) calls StopProcessing() on all the preceding commands, but does not treat it as a terminating error event, allowing the downstream cmdlets to continue executing.

This is extremely useful when you have slow or computationally heavy command early in a pipeline:

# this will only take ~3 seconds to return with the StopUpstreamCommand behavior
# but would have incurred 8 extra seconds of "waiting to discard" otherwise
Measure-Command {
  1..5 |ForEach-Object { Start-Sleep -Seconds 1; $_ } |Select-Object -First 3
}