Can the following Nested foreach loop be simplified in PowerShell?

Solution 1:

To give your task a name: You're looking for the relative complement aka set difference between two arrays:

In set-theory notation, it would be $ItemArray \ $ExclusionArray, i.e., those elements in $ItemArray that aren't also in $ExclusionArray.

This related question is looking for the symmetric difference between two sets, i.e., the set of elements that are unique to either side - at last that's what the Compare-Object-based solutions there implement, but only under the assumption that each array has no duplicates.


EyIM's helpful answer is conceptually simple and concise.

A potential problem is performance: a lookup in the exclusion array must be performed for each element in the input array.

With small arrays, this likely won't matter in practice.

With larger arrays, LINQ offers a substantially faster solution:

Note: In order to benefit from the LINQ solution, your arrays should be in memory already, and the benefit is greater the larger the exclusion array is. If your input is streaming via the pipeline, the overhead from executing the pipeline may make attempts to optimize array processing pointless or even counterproductive, in which case sticking with the native PowerShell solution makes sense - see iRon's answer.

# Declare the arrays as [string[]]
# so that calling the LINQ method below works as-is.
# (You could also cast to [string[]] ad hoc.)
[string[]] $ItemArray = 'a','b','c','d'
[string[]] $exclusionArray = 'b','c'

# Return only those elements in $ItemArray that aren't also in $exclusionArray
# and convert the result (a lazy enumerable of type [IEnumerable[string]])
# back to an array to force its evaluation
# (If you directly enumerate the result in a pipeline, that step isn't needed.)
[string[]] [Linq.Enumerable]::Except($ItemArray, $exclusionArray) # -> 'a', 'd'

Note the need to use the LINQ types explicitly, via their static methods, because PowerShell, as of v7, has no support for extension methods. However, there is a proposal on GitHub to add such support; this related proposal asks for improved support for calling generic methods.

See this answer for an overview of how to currently call LINQ methods from PowerShell.


Performance comparison:

Tip of the hat to iRon for his input.

The following benchmark code uses the Time-Command function to compare the two approaches, using arrays with roughly 4000 and 2000 elements, respectively, which - as in the question - differ by only 2 elements.

Note that in order to level the playing field, the .Where() array method (PSv4+) is used instead of the pipeline-based Where-Object cmdlet, as .Where() is faster with arrays already in memory.

Here are the results averaged over 10 runs; note the relative performance, as shown in the Factor columns; from a single-core Windows 10 VM running Windows PowerShell v5.1.:

Factor Secs (10-run avg.) Command                              TimeSpan
------ ------------------ -------                              --------
1.00   0.046              # LINQ...                            00:00:00.0455381
8.40   0.382              # Where ... -notContains...          00:00:00.3824038

The LINQ solution is substantially faster - by a factor of 8+ (though even the much slower solution only took about 0.4 seconds to run).

It seems that the performance gap is even wider in PowerShell Core, where I've seen a factor of around 19 with v7.0.0-preview.4.; interestingly, both tests ran faster individually than in Windows PowerShell.

Benchmark code:

# Script block to initialize the arrays.
# The filler arrays are randomized to eliminate caching effects in LINQ.
$init = {
  $fillerArray = 1..1000 | Get-Random -Count 1000
  [string[]] $ItemArray = $fillerArray + 'a' + $fillerArray + 'b' + $fillerArray + 'c' + $fillerArray + 'd'
  [string[]] $exclusionArray = $fillerArray + 'b' + $fillerArray + 'c'
}

# Compare the average of 10 runs.
Time-Command -Count 10 { # LINQ
  . $init
  $result = [string[]] [Linq.Enumerable]::Except($ItemArray, $exclusionArray)
}, { # Where ... -notContains
  . $init
  $result = $ItemArray.Where({ $exclusionArray -notcontains $_ })
}

Solution 2:

You can use Where-Object with -notcontains:

$ItemArray | Where-Object { $exclusionArray -notcontains $_ }

Output:

a, d

Solution 3:

Advocating native PowerShell:
As per @mklement0's answer, with no doubt, Language Integrated Query (LINQ) is //Fast...
But in some circumstances, native PowerShell commands using the pipeline as suggested by @EylM can still beat LINQ. This is not just theoretical but might happen in used cases where the concerned process is idle and waiting for a slow input. E.g. where the input comes from:

  • A remote server (e.g. Active Directory)
  • A slow device
  • A separate thread that has to make a complex calculation
  • The internet ...

Despite I haven't seen an easy prove for this yet, this is suggested at several sites and can be deducted from sites as e.g. High Performance PowerShell with LINQ and Ins and Outs of the PowerShell Pipeline.

Prove

To prove the above thesis, I have created a small Slack cmdlet that slows down each item dropped into the pipeline with 1 millisecond (by default):

Function Slack-Object ($Delay = 1) {
    process {
        Start-Sleep -Milliseconds $Delay
        Write-Output $_
    }
}; Set-Alias Slack Slack-Object

Now let's see if native PowerShell can actually beat LINQ:
(To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.)

[string[]] $InputArray = 1..200
[string[]] $ExclusionArray = 100..300

(Measure-Command {
    $Result = [Linq.Enumerable]::Except([string[]] ($InputArray | Slack), $ExclusionArray)
}).TotalMilliseconds

(Measure-Command {
    $Result = $InputArray | Slack | Where-Object {$ExclusionArray -notcontains $_}
}).TotalMilliseconds

Results:

      LINQ: 411,3721
PowerShell: 366,961

To exclude the LINQ cache, a single run test should be done but as commented by @mklement0, the results of single runs might vary each run.
The results also highly depend on the size of the input arrays, the size of the result, the slack, the test system, etc.

Conclusion:

PowerShell might still be faster than LINQ in some scenarios!

Quoting mklement0's comment:
"Overall, it's fair to say that the difference in performance is so small in this scenario that it's not worth picking the approach based on performance - and it makes sense to go with the more PowerShell-like approach (Where-Object), given that the LINQ approach is far from obvious. The bottom line is: choose LINQ only if you have large arrays that are already in memory. If the pipeline is involved, the pipeline overhead alone may make optimizations pointless."