In Powershell, what's the best way to join two tables into one?
I'm fairly new to Powershell, and am wondering if someone knows of any better way to accomplish the following example problem.
I have an array of mappings from IP address to host-name. This represents a list of active DHCP leases:
PS H:\> $leases
IP Name
-- ----
192.168.1.1 Apple
192.168.1.2 Pear
192.168.1.3 Banana
192.168.1.99 FishyPC
I have another array of mappings from MAC address to IP address. This represents a list of IP reservations:
PS H:\> $reservations
IP MAC
-- ---
192.168.1.1 001D606839C2
192.168.1.2 00E018782BE1
192.168.1.3 0022192AF09C
192.168.1.4 0013D4352A0D
For convenience, I was able to produce a third array of mappings from MAC address to IP address and host name using the following code. The idea is that $reservations
should get a third field, "Name", which is populated whenever there's a matching "IP" field:
$reservations = $reservations | foreach {
$res = $_
$match = $leases | where {$_.IP -eq $res.IP} | select -unique
if ($match -ne $NULL) {
"" | select @{n="IP";e={$res.IP}}, @{n="MAC";e={$res.MAC}}, @{n="Name";e={$match.Name}}
}
}
The desired output is something like this:
PS H:\> $ideal
IP MAC Name
-- --- ----
192.168.1.1 001D606839C2 Apple
192.168.1.2 00E018782BE1 Pear
192.168.1.3 0022192AF09C Banana
192.168.1.4 0013D4352A0D
Is there any better way of doing this?
After 1.5 years, the cmdlet I had pasted in the original answer has undergone so many updates that it has become completely outdated. Therefore I have replaced the code and the ReadMe with a link to the latest version.
Join-Object
Combines two object lists based on a related property between them.
Description
Combines properties from one or more objects. It creates a set that can be saved as a new object or used as it is. An object join is a means for combining properties from one (self-join) or more object lists by using values common to each.
Main features
- Intuitive (SQL like) syntax
- Smart property merging
- Predefined join commands for updating, merging and specific join types
- Well defined pipeline for the (left) input objects and output objects (preserves memory when correctly used)
- Performs about 40% faster than Compare-Object on large object lists
- Supports (custom) objects, data tables and dictionaries (e.g. hash tables) for input
- Smart properties and calculated property expressions
- Custom relation expressions
- Easy installation (dot-sourcing)
- Supports PowerShell for Windows (5.1) and PowerShell Core
The Join-Object cmdlet reveals the following proxy commands with their own (-JoinType
and -Property
) defaults:
-
InnerJoin-Object
(AliasInnerJoin
orJoin
), combines the related objects -
LeftJoin-Object
(AliasLeftJoin
), combines the related objects and adds the rest of the left objects -
RightJoin-Object
(AliasRightJoin
), combines the related objects and adds the rest of the right objects -
FullJoin-Object
(AliasFullJoin
), combines the related objects and adds the rest of the left and right objects -
CrossJoin-Object
(AliasCrossJoin
), combines each left object with each right object -
Update-Object
(AliasUpdate
), updates the left object with the related right object -
Merge-Object
(AliasMerge
), updates the left object with the related right object and adds the rest of the new (unrelated) right objects
ReadMe
The full ReadMe (and source code) is available from GitHub: https://github.com/iRon7/Join-Object
Installation
There are two versions of this Join-Object
cmdlet (both versions supply the same functionality):
- Join Module
Install-Module -Name JoinModule
- Join Script
Install-Script -Name Join
(or rename the Join.psm1
module to a Join.ps1
script file)
and invoked the script by dot sourcing:
. .\Join.ps1
Answer
To answer the actual example in the question:
$reservations |LeftJoin $leases -On IP
IP MAC Name
-- --- ----
192.168.1.1 001D606839C2 Apple
192.168.1.2 00E018782BE1 Pear
192.168.1.3 0022192AF09C Banana
192.168.1.4 0013D4352A0D
Performance
A little word on performance measuring:
The PowerShell pipeline is designed to stream objects (which safes memory), meaning that both¹ lists of input objects usually aren't (shouldn't be) resident in memory. Normally they are retrieved from somewhere else (i.e. a remote server, a disk). Also, the output usually matters where linq
solutions are fast but might easily put you on the wrong foot in drawing conclusions because linq
literally defers the execution (lazy evaluation), see also: fastest way to get a uniquely index item from the property of an array.
In other words, if it comes to (measuring) performance in PowerShell, it is important to look to the complete end-to-end solution, which is more likely to look something like:
import-csv .\reservations.csv |LeftJoin (import-csv .\leases.csv) -On IP |Export-Csv .\results.csv
(1) Note: unfortunately, there is no easy way to build two parallel input streams (see: #15206
Deferred input pipelines)
(more) Examples
More examples can be found in the related Stackoverflow questions at:
- Combining Multiple CSV Files
- Combine two CSVs - Add CSV as another Column
- CMD or Powershell command to combine (merge) corresponding lines from two files
- Can I use SQL commands (such as join) on objects in powershell, without any SQL server/database involved?
- CMD or Powershell command to combine (merge) corresponding lines from two files
- Compare Two CSVs, match the columns on 2 or more Columns, export specific columns from both csvs with powershell
- Merge two CSV files while adding new and overwriting existing entries
- Merging two CSVs and then re-ordering columns on output
- Merge two CSV files while adding new and overwriting existing entries
- Efficiently merge large object datasets having multiple matching keys
- Is there a PowerShell equivalent of
paste
(i.e., horizontal file concatenation)? - combine (merge) corresponding lines from two files
And in the Join-Object test script.
Please give a 👍 if you support the proposal to Add a Join-Object cmdlet to the standard PowerShell equipment (#14994
)
This can also be done using my module Join-Object
Install-Module 'Join-Object'
Join-Object -Left $leases -Right $reservations -LeftJoinProperty 'IP' -RightJoinProperty 'IP'
Regarding performance, I tested against a sample data of 100k lines:
- Hashtable example posted by @js2010 run in 8 seconds.
-
Join-Object
by me run in 14 seconds. -
LeftJoin
by @iRon run in 1 minute and 50 seconds
Here's a simple example using a hashtable. With big arrays, this turns out to be faster.
$leases =
'IP,Name
192.168.1.1,Apple
192.168.1.2,Pear
192.168.1.3,Banana
192.168.1.99,FishyPC' | convertfrom-csv
$reservations =
'IP,MAC
192.168.1.1,001D606839C2
192.168.1.2,00E018782BE1
192.168.1.3,0022192AF09C
192.168.1.4,0013D4352A0D' | convertfrom-csv
$hashRes=@{}
foreach ($resRecord in $reservations) {
$hashRes[$resRecord.IP] = $resRecord
}
$leases | foreach {
$other = $hashRes[$_.IP]
[pscustomobject]@{IP=$_.IP
MAC=$other.MAC
Name=$_.name}
}
IP MAC Name
-- --- ----
192.168.1.1 001D606839C2 Apple
192.168.1.2 00E018782BE1 Pear
192.168.1.3 0022192AF09C Banana
192.168.1.99 FishyPC