matching range of dates in two different excel tables
If I understand what you want to do correctly, this can be done in Power Query, available in Windows Excel 2010+ (and Office 365).
But some things left out of your description for which I made assumptions
- Landing date cannot be before Logbook date (how can you land fish that you haven't caught)
- You sometimes have multiple landings on the same day for the same ID, Species and Gear.
- I chose to match only the landing that was closest to the logged amount
- Sometimes this resulted in large differences between the landing and logbook
- Not sure what this is due to (consolidation of landing from other days, maybe)?
- If that is the case, probably you need to inspect manually.
- PQ takes a long time to run on your data (about 20 sec on my machine). This is probably due to the multiple table filtering operations. Might be able to speed it up if otherwise OK.
For PQ in Excel 2010, you will need to download and install the Microsoft Power Query Add-in.
To use Power Query
- Make both of your data tables into Tables
- This will either be on the
Insert
tab; or on theHome
tab (format as table).
- This will either be on the
- Select some cell in your Table1 (or whatever you've named the first table)
-
Data => Get&Transform => from Table/Range
(*might be different with the Add-in) - When the PQ Editor opens:
Home => Advanced Editor
- Make note of the Table Name in Line 2
- Paste the M Code below in place of what you see
- Change the Table name in line 2 back to what was generated originally.
- Read the comments and explore the
Applied Steps
to understand the algorithm
M Code
let
//Read in table 1 and set data types
//Be sure to change `Name` of table to the actual table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//Buffering this table seems to reduce speed of execution from 60 sec to about 20 sec
Table1 = Table.Buffer(Table.TransformColumnTypes(Source,{
{"ID", type text},
{"DATE", type date},
{"SPECIES", type text},
{"GEAR", type text},
{"LOGBOOK", Int64.Type}})),
//Read in table 2 and set data types
//Be sure to change `Name` of table to the actual table name in your workbook
Source2 = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
Table2 = Table.Buffer(Table.TransformColumnTypes(Source2,{
{"ID", type text},
{"DATE", type date},
{"SPECIES", type text},
{"GEAR", type text},
{"LANDING", Int64.Type}})),
//Join the two tables based on ID, SPECIES and GEAR
Joined = Table.NestedJoin(Table1, {"ID", "SPECIES", "GEAR"}, Table2, {"ID","SPECIES","GEAR"},"Joined"),
//filter Joined subtable by minimum landing date that is >= logged date
// **I assumed that landing date MUST be AFTER log date.**
//If that is not the case, then code change will be needed
filteredTbl = Table.AddColumn(Joined, "Filtered Table", each
let
dt = [DATE],
minDT = List.Min(List.Select([Joined][DATE],each _ >= dt))
in
Table.SelectRows([Joined], each [DATE] = minDT)),
#"Removed Columns" = Table.RemoveColumns(filteredTbl,{"Joined"}),
//landing date will be any row in filtered table unless table is empty
landingDT = Table.AddColumn(#"Removed Columns", "LANDING DATE",
each if Table.IsEmpty([Filtered Table]) then null
else [Filtered Table][DATE]{0}, Date.Type),
//landings will be the value with the least difference from that logged on the landingDt
landings = Table.AddColumn(landingDT, "LANDING", each
if [LANDING DATE]=null then null
else let
lb = [LOGBOOK],
ld = List.Transform([Filtered Table][LANDING], each Number.Abs(_ - lb)),
posMinDiff = List.PositionOf(ld,List.Min(ld))
in
[Filtered Table][LANDING]{posMinDiff},Int64.Type),
//Match if landing date LTE log date + 7 days
match = Table.AddColumn(landings, "MATCH", each Duration.TotalDays([LANDING DATE]-[DATE])<=7,Logical.Type),
//calculte difference between landings and logged, if there is a Match
//sometimes the differences are quite large.
//maybe an error, or consolidation of multiple logged valaues?
//not sure what do with large differences
landingsDiff = Table.AddColumn(match, "Landing Diff", each if [MATCH] = true then [LANDING] - [LOGBOOK] else null, Int64.Type),
#"Removed Columns1" = Table.RemoveColumns(landingsDiff,{"Filtered Table"})
in
#"Removed Columns1"
Initial Data Tables (Partial)
As an example of what I am writing about with regard to large mismatches in landing/logbook amounts, it appears that your LOGBOOK entries for 17-Jun and 23-Jun ID **31/SBS/PS
may have been combined into a single landing on 24-Jun
Results (Partial)
Post back with any comments/questions.