Unpivot an Excel matrix/pivot-table?
Is there a quick way to "unpivot" an Excel matrix/pivot-table (in Excel or elsewhere), without writing macros or other code ?
Again, I can write code (C# or VBA or whatever) that does that myselfs.
I want to know if it is possible to do it without code, quickly ?
E.g. I need to convert this permission matrix (given as Excel-table/matrix)
into this half-normalized table (so I can insert it into a SQL database):
e.g. in SQL I could do it like this:
CREATE TABLE dbo.T_DocumentMatrix
(
[Function] [varchar](255) NULL,
[GROUP-Admin] [varchar](255) NULL,
[GROUP-SuperUser] [varchar](255) NULL,
[GROUP-Manager] [varchar](255) NULL,
[GROUP-OLAP] [varchar](255) NULL,
[GROUP-1] [varchar](255) NULL,
[GROUP-2] [varchar](255) NULL,
[GROUP-3] [varchar](255) NULL,
[GROUP-4] [varchar](255) NULL,
[GROUP-5] [varchar](255) NULL,
[GROUP-6] [varchar](255) NULL,
[GROUP-7] [varchar](255) NULL,
[GROUP-8] [varchar](255) NULL,
[Externals] [varchar](255) NULL
);
copy-paste the data from excel, and then
SELECT *
FROM
(
SELECT
[Function]
,[GROUP-Admin]
,[GROUP-SuperUser]
,[GROUP-Manager]
,[GROUP-OLAP]
,[GROUP-1]
,[GROUP-2]
,[GROUP-3]
,[GROUP-4]
,[GROUP-5]
,[GROUP-6]
,[GROUP-7]
,[GROUP-8]
,[Externals]
FROM T_DocumentMatrix
) AS p
UNPIVOT
(
Rights FOR GroupName IN
(
[GROUP-Admin]
,[GROUP-SuperUser]
,[GROUP-Manager]
,[GROUP-OLAP]
,[GROUP-1]
,[GROUP-2]
,[GROUP-3]
,[GROUP-4]
,[GROUP-5]
,[GROUP-6]
,[GROUP-7]
,[GROUP-8]
,[Externals]
)
) AS unpvt
;
However, that requires I change the table-create script and the unpivot-script for every change in groups...
Solution 1:
Oh, well, it's a little complicated. One of the problems is, the wizard-callup shortcuts don't work in non-english versions of excels (damn, at home I would have the English version, but here at work...)
Here's a good video: https://www.youtube.com/watch?v=pUXJLzqlEPk
But youtube videos can be deleted, so to make it a solid SO answer:
First, you need to go to "Options", and add the menuband-item "Pivot table and PivotChart Wizard".
Create a multiple consolidation pivot table
and use the custom variant
and select the range, and in new work sheet
then delete rows and columns fields
Double click on the NUMBER (54 in the picture)
and excel will give you the halfway normalized data.
Solution 2:
While this is a really old question and Stefan found an enlightened answer back in the day, it could be worth a revisit. I have run into a need for such a codeless, dynamic Unpivot method myself and a Google search brought me here. Yes, Power Query does the job, but that is not exactly codeless in that there is a scripted background solution running in Power BI, it requires the user to Refresh the data (so, it is not automatic within the workbook) and it will not run on Excel for Mac (tmk).
The following is an approach based on Dynamic Arrays and using the LET function, so it will require Excel 2016 or Microsoft 365.
Let's say that Stefan's data are in cells A1 to N8. In Power Query parlance, we would say that Stefan wants to "Unpivot B1:N8 By A1:A8".
NB: The following approach would also accept multiple columns for By, e.g. you could have a need to "Unpivot D1:N8 By A1:C8".
=LET( unPivMatrix, B1:N8,
byMatrix, A1:A8,
upC, COLUMNS( unPivMatrix ),
byC, COLUMNS( byMatrix ),
dmxR, MIN( ROWS( unPivMatrix ), ROWS( byMatrix ) ) - 1,
upCells, dmxR * upC,
upSeq, SEQUENCE( upCells,, 0 ),
upHdr, INDEX( INDEX( unPivMatrix, 1, ), 1, SEQUENCE( upC ) ),
upBody, INDEX( unPivMatrix, SEQUENCE( dmxR ) + 1, SEQUENCE( 1, upC ) ),
byBody, INDEX( byMatrix, SEQUENCE( dmxR ) + 1, SEQUENCE( 1, byC ) ),
attr, INDEX( upHdr, MOD( upSeq, upC ) + 1 ),
mux, INDEX( upBody, upSeq/upC + 1, MOD( upSeq, upC ) + 1 ),
demux, IFERROR( INDEX(
IFERROR( INDEX( byBody,
IFERROR( INT( SEQUENCE( upCells, byC,0 )/byC/upC ) + 1, MOD( upSeq, upC ) + 1 ),
SEQUENCE( 1, byC + 1 ) ),
attr ),
upSeq + 1, SEQUENCE( 1, byC + 2 ) ),
mux ),
demux
)
How this works - Reading the inputs
The inputs are the range that you want to unpivot that I called unPivMatrix B1:N8 (which can be of any dimensions you need) and the columns that you want to unpivot them by that I called byMatrix A1:A8.
As a rule, the rows of the byMatrix need to be the same as the unPivMatrix, so you must start with A1 and not A2. I decided to take this convention because their might be header in A1 and with some small mods, this formula could produce exactly the same output as Power Query, but that is not what Stefan asked for.
The formula first calculates the number of columns in each matrix: upC unpivot columns and byC by columns. From these it calculates dmxR (demultiplexed rows): how many rows of values will be delivered in the unpivot by taking the minimum of the rows of unPivMatrix and byMatrix and subtracting 1 because unPivMatrix has a header. Taking the MIN is an error prevention step in the event that the inputs have a different number of rows, which is an error by definition.
These values are used to create shaping variables that will be used later to form the output. upCells is the number of values that will be unpivoted and is used to generate an index pattern called upSeq that counts off the number of values from 0 which will used inside of the INDEX functions later. We start with 0 because upSeq will be modulated to form correct indexation of the inputs and outputs.
Now we will break down the parts of the matrices so that we can multiplex them. The parts look like this:
There is an unpivot header (upHdr) that contains the identifier of the value data (in Stefan's case, GROUP-Admin, GROUP-SuperUser, etc.). These will be multiplexed into a column that will later be placed next to each value that is unpivoted. upHdr is created by putting the whole unPivMatrix into an INDEX function and reading Row 1 and all of the columns. I nested that INDEX function into another INDEX that reshapes the horizontal array to a vertical array using a vertical SEQUENCE of size upC.
upBody contains the values that we want to unpivot via multiplexing. It is created by putting the upMatrix into an INDEX and shaping it by the number of rows that will be output SEQUENCE( dmxR ) + 1 § against each of the columns to be read SEQUENCE( upC ). byBody contains the data that will be multiplexed against each value in the upBody. It is created the same way as upBody.
§ - add 1 to skip the header row
How this works - Shaping and writing the outputs
The output will be shaped like this:
We now multiplex the upHdr into attr or attributes (using Power Query terminology) by putting upHdr into an INDEX and applying a modulated sequence based on upSeq that repeats every upC times (e.g. {1;2;3;4;5;6;7;8;1;2;...}
). Note: this is where starting with 0 in upSeq mattered. The output of attr would look like (in Stefan's case) {GROUP-Admin; GROUP-SuperUser, etc.}
.
mux is the values (using Power Query terminology) that will be multiplexed against each attribute and byBody row. It is created by putting valBody into INDEX and then reshaping it into a multiplex pattern created by rows of
upSeq/upC + 1
which yields a row of {1,1,1,1,1,1,1,1,2,2,...}
and columns of
MOD( upSeq, upC )
+ 1 which yields a column of {1;2;3;4;5;6;7;8;1;...}
.
The output for mux will be the content of the unPivMatrix. In Stefan's case, that will be a little special because he is using * and blank as data. This formula will convert blanks into 0's. So, if that is a problem, you could either wrap mux into an IF( ISBLANK( mux ), "", mux )
, but I am not adding that because I want a general unpivot as I am sure Stefan has long since moved on.
How this works - Demultiplexing the parts into the output
Now that the easy part is done, it is time to go after the hard stuff - bringing this all together into a single dynamic array. Putting multiple arrays together requires a trick and this trick has to be applied twice because, as you can see, we are bringing three tables together. The trick is like having an APPEND function like:
APPEND( APPEND( table1, table2 ), table3 )
.
To combine two arrays, you put the first array into an INDEX and then reference cells that are outside of the array to force a #REF! error. For example, if I have a 3 x 2 array of letters A through F and I reference cell 3, 3, it will throw a reference error.
Now you can exploit the errors by replacing them with the table that you want to append by wrapping the INDEX in an IFERROR. It is kind of like:
IFERROR( INDEX( table1,
SEQUENCE( table1.rows ),
SEQUENCE( 1, table1.columns + table2.columns ) ),
table2 )
In this sense, the formula above is the equivalent of APPEND( table1, table2 ) where APPEND is a row-wise append of two tables which is what we want. (NB: switch the sequence patterns and you can make a column-wise append.)
So, hopefully this explanation makes it clear what is happening in the final stages of a variable called demux which delivers the result. I named the result and then referenced it so that you can easily explore, modify or optimize the formula. So, demux is really like:
APPEND( byBody, APPEND( attr, mux ) )
I won't go into the mechanics of how this final stage works because this is already a really long answer, but a short summary is that this append is using the dimensions created by upCells, upC and byC to form the output.
I've tested this, but I've not performance streamlined it or brought it to #SwissEngineering standards.