Comparing two arrays & get the values which are not common
i wanted a small logic to compare contents of two arrays & get the value which is not common amongst them using powershell
example if
$a1=@(1,2,3,4,5)
$b1=@(1,2,3,4,5,6)
$c which is the output should give me the value "6
" which is the output of what's the uncommon value between both the arrays.
Can some one help me out with the same! thanks!
PS > $c = Compare-Object -ReferenceObject (1..5) -DifferenceObject (1..6) -PassThru
PS > $c
6
$a = 1..5
$b = 4..8
$Yellow = $a | Where {$b -NotContains $_}
$Yellow
contains all the items in $a
except the ones that are in $b
:
PS C:\> $Yellow
1
2
3
$Blue = $b | Where {$a -NotContains $_}
$Blue
contains all the items in $b
except the ones that are in $a
:
PS C:\> $Blue
6
7
8
$Green = $a | Where {$b -Contains $_}
Not in question, but anyways; Green
contains the items that are in both $a
and $b
.
PS C:\> $Green
4
5
Note: Where
is an alias of Where-Object
. Alias can introduce possible problems and make scripts hard to maintain.
Addendum 12 October 2019
As commented by @xtreampb and @mklement0: although not shown from the example in the question, the task that the question implies (values "not in common") is the symmetric difference between the two input sets (the union of yellow and blue).
Union
The symmetric difference between the $a
and $b
can be literally defined as the union of $Yellow
and $Blue
:
$NotGreen = $Yellow + $Blue
Which is written out:
$NotGreen = ($a | Where {$b -NotContains $_}) + ($b | Where {$a -NotContains $_})
Performance
As you might notice, there are quite some (redundant) loops in this syntax: all items in list $a
iterate (using Where
) through items in list $b
(using -NotContains
) and visa versa. Unfortunately the redundancy is difficult to avoid as it is difficult to predict the result of each side. A Hash Table is usually a good solution to improve the performance of redundant loops. For this, I like to redefine the question: Get the values that appear once in the sum of the collections ($a + $b
):
$Count = @{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}
By using the ForEach
statement instead of the ForEach-Object
cmdlet and the Where
method instead of the Where-Object
you might increase the performance by a factor 2.5:
$Count = @{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1})
LINQ
But Language Integrated Query (LINQ) will easily beat any native PowerShell and native .Net methods (see also High Performance PowerShell with LINQ and mklement0's answer for Can the following Nested foreach loop be simplified in PowerShell?:
To use LINQ you need to explicitly define the array types:
[Int[]]$a = 1..5
[Int[]]$b = 4..8
And use the [Linq.Enumerable]::
operator:
$Yellow = [Int[]][Linq.Enumerable]::Except($a, $b)
$Blue = [Int[]][Linq.Enumerable]::Except($b, $a)
$Green = [Int[]][Linq.Enumerable]::Intersect($a, $b)
$NotGreen = [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
Benchmark
Benchmark results highly depend on the sizes of the collections and how many items there are actually shared, as a "average", I am presuming that half of each collection is shared with the other.
Using Time
Compare-Object 111,9712
NotContains 197,3792
ForEach-Object 82,8324
ForEach Statement 36,5721
LINQ 22,7091
To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.
$a = 1..1000
$b = 500..1500
(Measure-Command {
Compare-Object -ReferenceObject $a -DifferenceObject $b -PassThru
}).TotalMilliseconds
(Measure-Command {
($a | Where {$b -NotContains $_}), ($b | Where {$a -NotContains $_})
}).TotalMilliseconds
(Measure-Command {
$Count = @{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}
}).TotalMilliseconds
(Measure-Command {
$Count = @{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1})
}).TotalMilliseconds
[Int[]]$a = $a
[Int[]]$b = $b
(Measure-Command {
[Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
}).TotalMilliseconds
Look at Compare-Object
Compare-Object $a1 $b1 | ForEach-Object { $_.InputObject }
Or if you would like to know where the object belongs to, then look at SideIndicator:
$a1=@(1,2,3,4,5,8)
$b1=@(1,2,3,4,5,6)
Compare-Object $a1 $b1
Try:
$a1=@(1,2,3,4,5)
$b1=@(1,2,3,4,5,6)
(Compare-Object $a1 $b1).InputObject
Or, you can use:
(Compare-Object $b1 $a1).InputObject
The order doesn't matter.