Expanding column cells for each column cell

I have 3 different sets of data (in different columns)

  1. Animals (5 different kinds) in column A
  2. Fruits (1000 different kinds) in column B
  3. Countries (10 different kinds) in column C

With these 3 data collections I would like to receive 5×1000×10 for a total of 50k corresponding elements in col. E F G (each animal who corresponds with each fruit and each country).

It might be done by manually copying and pasting values, but it will take ages. Is there any way to automate it by VBA code or

Is there any universal formula for unlimited data sets like the one presented above? Please let me know if something is not clear.

Here is a smaller example of data and how the results should turn out:

        Expanding data sets for each in other


I gather by universal, you want this to accommodate any number of columns and any number of entries in each. A few variant arrays should provide the dimensions necessary to calculate the cycles of repetition for each value.

Option Explicit

Sub main()
    Call for_each_in_others(rDATA:=Worksheets("Sheet3").Range("A3"), bHDR:=True)
End Sub

Sub for_each_in_others(rDATA As Range, Optional bHDR As Boolean = False)
    Dim v As Long, w As Long
    Dim iINCROWS As Long, iMAXROWS As Long, sErrorRng As String
    Dim vVALs As Variant, vTMPs As Variant, vCOLs As Variant

    On Error GoTo bm_Safe_Exit
    appTGGL bTGGL:=False

    With rDATA.Parent
        With rDATA(1).CurrentRegion
            'Debug.Print rDATA(1).Row - .Cells(1).Row
            With .Resize(.Rows.Count - (rDATA(1).Row - .Cells(1).Row), .Columns.Count).Offset(2, 0)
                sErrorRng = .Address(0, 0)
                vTMPs = .Value2
                ReDim vCOLs(LBound(vTMPs, 2) To UBound(vTMPs, 2))
                iMAXROWS = 1
                'On Error GoTo bm_Output_Exceeded
                For w = LBound(vTMPs, 2) To UBound(vTMPs, 2)
                    vCOLs(w) = Application.CountA(.Columns(w))
                    iMAXROWS = iMAXROWS * vCOLs(w)
                Next w

                'control excessive or no rows of output
                If iMAXROWS > Rows.Count Then
                    GoTo bm_Output_Exceeded
                ElseIf .Columns.Count = 1 Or iMAXROWS = 0 Then
                    GoTo bm_Nothing_To_Do
                End If

                On Error GoTo bm_Safe_Exit
                ReDim vVALs(LBound(vTMPs, 1) To iMAXROWS, LBound(vTMPs, 2) To UBound(vTMPs, 2))
                iINCROWS = 1
                For w = LBound(vVALs, 2) To UBound(vVALs, 2)
                    iINCROWS = iINCROWS * vCOLs(w)
                    For v = LBound(vVALs, 1) To UBound(vVALs, 1)
                        vVALs(v, w) = vTMPs((Int(iINCROWS * ((v - 1) / UBound(vVALs, 1))) Mod vCOLs(w)) + 1, w)
                    Next v
                Next w
            End With
        End With
        .Cells(2, UBound(vVALs, 2) + 2).Resize(1, UBound(vVALs, 2) + 2).EntireColumn.Delete
        If bHDR Then
            rDATA.Cells(1, 1).Offset(-1, 0).Resize(1, UBound(vVALs, 2)).Copy _
                Destination:=rDATA.Cells(1, UBound(vVALs, 2) + 2).Offset(-1, 0)
        End If
        rDATA.Cells(1, UBound(vVALs, 2) + 2).Resize(UBound(vVALs, 1), UBound(vVALs, 2)) = vVALs
    End With

    GoTo bm_Safe_Exit
bm_Nothing_To_Do:
    MsgBox "There is not enough data in  " & sErrorRng & " to perform expansion." & Chr(10) & _
           "This could be due to a single column of values or one or more blank column(s) of values." & _
            Chr(10) & Chr(10) & "There is nothing to expand.", vbInformation, _
           "Single or No Column of Raw Data"
    GoTo bm_Safe_Exit
bm_Output_Exceeded:
    MsgBox "The number of expanded values created from " & sErrorRng & _
           " (" & Format(iMAXROWS, "\> #, ##0") & " rows × " & UBound(vTMPs, 2) & _
           " columns) exceeds the rows available (" & Format(Rows.Count, "#, ##0") & ") on this worksheet.", vbCritical, _
           "Too Many Entries"
bm_Safe_Exit:
    appTGGL
End Sub

Sub appTGGL(Optional bTGGL As Boolean = True)
    Application.EnableEvents = bTGGL
    Application.ScreenUpdating = bTGGL
End Sub

Put the column header labels in row 2 starting in column A and the data directly below that.

I have added some error control to warn of exceeding the number of rows on a worksheet. This is not normally something that is likely to be a consideration but multiplying the number of values in an undetermined number of columns against each other can quickly produce a large number of results. It is not unforeseeable that you would exceed 1,048,576 rows.

        Variant Array expansion


Classic example of a non-join select SQL statement which returns the Cartesian Product of all combination outcomes of listed tables.

SQL Database Solution

Simply import Animals, Fruit, Country as separate tables into any SQL database like MS Access, SQLite, MySQL, etc. and list tables without joins including implicit (WHERE) and explicit (JOIN) joins:

SELECT Animals.Animal, Fruits.Fruit, Countries.Country
FROM Animals, Countries, Fruits;

Cartesian SQL

Excel Solution

Same concept with running the non-join SQL statement in VBA using an ODBC connection to workbook containing ranges of Animals, Countries, and Fruits. In example, each data grouping is in its own worksheet of same name.

Sub CrossJoinQuery()

    Dim conn As Object
    Dim rst As Object
    Dim sConn As String, strSQL As String

    Set conn = CreateObject("ADODB.Connection")
    Set rst = CreateObject("ADODB.Recordset")

    sConn = "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
               & "DBQ=C:\Path To\Excel\Workbook.xlsx;"
    conn.Open sConn

    strSQL = "SELECT * FROM [Animals$A1:A3], [Fruits$A1:A3], [Countries$A1:A3] "
    rst.Open strSQL, conn

    Range("A1").CopyFromRecordset rst

    rst.Close
    conn.Close

    Set rst = Nothing
    Set conn = Nothing

End Sub

Cartesian SQL in VBA


My first approach to this problem was similar to the one posted by @Jeeped:

  1. load input columns to array and count rows in each column
  2. fill array with all combinations
  3. assign array to output range

Using MicroTimer I have calculated average times taken by each part of the above algorithm. Part 3. took 90%-93% of total execution time for bigger input data.

Below is my attempt to improve the speed of writing data to worksheet. I have defined a constant iMinRSize=17. Once it is possible to fill more than iMinRSize consecutive rows with the same value, the code stops filiing array and writes directly to worksheet range.

Sub CrossJoin(rSrc As Range, rTrg As Range)

  Dim vSrc() As Variant, vTrgPart() As Variant
  Dim iLengths() As Long
  Dim iCCnt As Integer, iRTrgCnt As Long, iRSrcCnt As Long
  Dim i As Integer, j As Long, k As Long, l As Long
  Dim iStep As Long

  Const iMinRSize As Long = 17
  Dim iArrLastC As Integer

  On Error GoTo CleanUp
  Application.ScreenUpdating = False
  Application.EnableEvents = False

  vSrc = rSrc.Value2
  iCCnt = UBound(vSrc, 2)
  iRSrcCnt = UBound(vSrc, 1)
  iRTrgCnt = 1
  iArrLastC = 1
  ReDim iLengths(1 To iCCnt)
  For i = 1 To iCCnt
    j = iRSrcCnt
    While (j > 0) And IsEmpty(vSrc(j, i))
      j = j - 1
    Wend
    iLengths(i) = j
    iRTrgCnt = iRTrgCnt * iLengths(i)
    If (iRTrgCnt < iMinRSize) And (iArrLastC < iCCnt) Then iArrLastC = iArrLastC + 1
  Next i

  If (iRTrgCnt > 0) And (rTrg.row + iRTrgCnt - 1 <= rTrg.Parent.Rows.Count) Then
    ReDim vTrgPart(1 To iRTrgCnt, 1 To iArrLastC)

    iStep = 1
    For i = 1 To iArrLastC
      k = 0
      For j = 1 To iRTrgCnt Step iStep
        k = k + 1
        If k > iLengths(i) Then k = 1
        For l = j To j + iStep - 1
          vTrgPart(l, i) = vSrc(k, i)
        Next l
      Next j
      iStep = iStep * iLengths(i)
    Next i

    rTrg.Resize(iRTrgCnt, iArrLastC) = vTrgPart

    For i = iArrLastC + 1 To iCCnt
      k = 0
      For j = 1 To iRTrgCnt Step iStep
        k = k + 1
        If k > iLengths(i) Then k = 1
        rTrg.Resize(iStep).Offset(j - 1, i - 1).Value2 = vSrc(k, i)
      Next j
      iStep = iStep * iLengths(i)
    Next i
  End If

CleanUp:
  Application.ScreenUpdating = True
  Application.EnableEvents = False
End Sub

Sub test()
  CrossJoin Range("a2:f10"), Range("k2")
End Sub

If we set iMinRSize to Rows.Count, all data is written to array. Below are my sample test results:

enter image description here

The code works best if input columns with highest number of rows come first, but it wouldn't be a big problem to modify code to rank columns and process in right order.


You can do this with worksheet formulas. If you have NAME'd ranges -- Animals, Fruits and Countries, the "trick" is to generate indexes into that array to provide all the various combinations.

For example:

=CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)

will generate a 1-based series of numbers that repeats for the number entries in Fruits * Countries -- which gives you how many rows you need for each animal.

=MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1

will generate a 1-based series that repeats each Fruit for the number of countries.

=MOD(ROWS($1:1)-1,ROWS(Countries))+1))

Generates a repeating sequence of 1..n where n is the number of countries.

Putting these into formulas (with some error checking)

D3:  =IFERROR(INDEX(Animals,CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)),"")
E3:  =IF(E3="","",INDEX(Fruits,MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1))
F3:  =IF(E3="","",INDEX(Countries,MOD(ROWS($1:1)-1,ROWS(Countries))+1))

enter image description here