Excel formula to prefix a column based on another column?
Is there a way to do so in Excel?
My Excel:
My desired output:
=TEXTJOIN(";",TRUE,"AVS"&LET(list,FILTERXML("<a><z>"&SUBSTITUTE(B1,CHAR(10),"</z><z>")&"</z></a>","//z"),TRIM(LEFT(list,IFERROR(FIND(" ",list),LEN(list))))))
This will take the text in B1
, separate it by line breaks, pull all the text before the first space on each line, add AVS
to the beginning, and concatenate them all with semicolons. Add this formula to A1
and copy / paste down.
-
FILTERXML("<a><z>"&SUBSTITUTE(B1,CHAR(10),"</z><z>")&"</z></a>","//z")
is where most of the magic happens. The function is not explicitly meant for this purpose but you can use it to turn a delimited string into an array instead. This lets us split up the string into individual cells. All the XML tags like<a>
and<z>
are to make the string look like XML. This is why we used</z><z>
earlier. (Reference) -
LET(list,FILTERXML(~)
defineslist
to be equal to the array of text we get out of the function above. It lets us reference it later by this name instead of pasting the entire function each time. -
IFERROR(FIND(" ",list),LEN(list))
finds the first space in each item in the list. This will error if space isn't found at all, so we return the entire string in that case. If you just want the first 5 characters no matter want, you could replace this section with just the number 5 instead. -
TRIM(LEFT(list,IFERROR(~)))
pulls the first however many characters and then trims off any white space. You could account for this trimming inside theIFERROR(~)
by subtracting one but, in the edge case where there is no space found, that would make the function return all but the last character in the string. -
LET(list,FILTERXML(~),TRIM(~))
, then, returns a neat list of just the first however many characters from each line of the original text. -
TEXTJOIN(";",TRUE,"AVS"&LET(~))
adds "AVS" to the beginning of each of the items in that neat list and then combines the whole list into one string with a semicolon between each item.
Here's a way to do it with PowerQuery. It may seem like a long post, but I like to add a lot of images. It's point and click mostly and doesn't involve formulas, so it will handle large datasets more easily. As the source data grows, all you will need to do is right-click and refresh the query from the workbook. You won't need to repeat these steps nor expand any ranges in formulas.
Add a header to your data, then put your cursor in your data and use Data>Get & Transform Data>From Table/Range.
When the PowerQuery Editor opens, you'll see this:
Use Add Column>Index Column>From 1, so that you see this:
Now expand the Queries pane on the left of the Power Query Editor, right-click the query and choose 'Duplicate', so that you have two versions of the same query.
In the first query, do the following:
Now right-click the column header and use Split Column>By Delimiter, configured like this:
You'll see this:
Now use Add Column>Custom Column, configured like this:
You'll see this:
Now use Home>Group By, configured like this:
You'll see this:
This is because we tried to sum some text.
Go to Home>Advanced Editor. Change this:
#"Grouped Rows" = Table.Group(#"Added Custom", {"Index"}, {{"unique_ids", each List.Sum([unique_id]), type text}})
To this:
#"Grouped Rows" = Table.Group(#"Added Custom", {"Index"}, {{"unique_ids", each Text.Combine( [unique_id] , ";"), type text}})
Note that List.Sum
was changed to Text.Combine
.
You'll see this:
Now use Home>Merge Queries, configured like this:
Click the double-arrow at the top of the 'Table' column and configure like this:
You'll see this:
Now you can either remove the index or not, then use Home>Close & Load to get your results back into the workbook.
Here's the full query from the advanced editor for the first query:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type),
#"Changed Type" = Table.TransformColumnTypes(#"Added Index",{{"orig", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"orig", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "orig"),
#"Added Custom" = Table.AddColumn(#"Split Column by Delimiter", "unique_id", each "AVS" & Text.Start([orig],5)),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Index"}, {{"unique_ids", each Text.Combine( [unique_id] , ";"), type text}}),
#"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"Index"}, #"Table1 (2)", {"Index"}, "Table1 (2)", JoinKind.Inner),
#"Expanded Table1 (2)" = Table.ExpandTableColumn(#"Merged Queries", "Table1 (2)", {"orig"}, {"orig"})
in
#"Expanded Table1 (2)"
And for the second query:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type),
#"Changed Type" = Table.TransformColumnTypes(#"Added Index",{{"orig", type text}})
in
#"Changed Type"