Make an array formula in excel calculate only on active rows
When excel calculates formulas over a column (e.g. a lookup), it's smart enough to only calculate until the last active row, rather than also running on all the blank rows.
With array formulas, it seems this behaviour changes, and it calculates over the entire column (a million rows in excel 2010). Is there a way to force an array formula to calculate over only the active range? (other than replacing column references with a generous reference that will cover the active range, e.g. A:1:A10000 replacing A:A in a spreadsheet you know will never have 10000 rows.
For example, is there a function you can execute on a row that will return the reference of the last active row? Something like =Last(A:A) that would return either a reference or a row number
If you have a worksheet with a contiguous range from the top to the bottom, then I suppose you could use =INDIRECT("B1:B"&COUNTA(B:B)))
, but it isn't ideal (being volatile, not updating when columns are added to the spreadsheet and only working on spreadsheets with a column containing no blank rows.
Solution 1:
Sadly, the use of the volatile INDIRECT
and OFFSET
are almost ubiquitously recommended in such cases on the various Excel sites around, even though there exists a perfectly good, minimally volatile (in fact, volatile "at workbook open" only, as I understand) set-up using INDEX
.
To determine the last non-empty row in a column, column A say, it will be useful to know the datatype of the entries within that column, and also whether there are any null strings ("") present.
Although there are set-ups available which will work whatever the datatype of the entries in question, they are necessarily of a type such that they must process each element within the range passed to them. As such - and particularly if the range over which it is necessary to calculate is quite large - they can be extremely resource-heavy (the second suggestion by kyle being a good example, having to process more than a million cells, irrespective of whether the last non-empty cell is in row 1 or row 1048576).
It is also worthwhile storing the value of the last non-empty row as a Defined Name, LRow say.
The possible definitions for LRow are then as follows:
1) If there are no null strings and all entries are non-numerical, you can define LRow as:
=MATCH("Ω",A:A)
2) If there are no null strings and all entries are numerical, you can define LRow as:
=MATCH(9.9E+307,A:A)
3) If there are no null strings and entries are of a mixed datatype, i.e. some numerical, some non-numerical, you can define LRow as:
=MAX(MATCH("Ω",A:A),MATCH(9.9E+307,A:A))
If there are null strings within the range then the solutions become necessarily more complex, and it is no longer possible to avoid a construction in which each cell within the array is processed individually. As such, the best we can do is to minimize the range which we pass to be processed.
Hence, we have:
4) If there are null strings, you can define LRow as:
=MATCH(1,0/(LEN(A1:INDEX(A:A,MAX(MATCH("Ω",A:A),MATCH(9.9E+307,A:A))))>0))
(You could of course make this even more efficient if the datatype is consistently numerical or non-numerical, as above.)
which requires array-entry (CSE) and which, in general, will vastly reduce the range being processed, especially compared to something such as:
=MATCH(1,0/(LEN(A:A)>0))
which is disastrous in terms of efficiency.
Note that the additional functions within the former are of very little detriment: it is far more important to reduce the number of cells being processed than to worry about an additional function call or two.
You can now use this within an INDEX
construction to dynamically define your range. For example:
=A1:INDEX(A:A,LRow)
which, as mentioned, is barely volatile at all.
Regards