Split a string with spaces on a new line in PowerShell
I'm working on a PowerShell script where I take an input of a long string (from a CSV file) in the format:
Group One Name
Group Two Name
Group Three Name
...
I'm trying to parse it with
($entry.'Group Name').split("`n ") | %{
if ($_) {
# Do something with the group name
$_
}
}
I want to get output like:
Group One Name
Group Two Name
Group Three Name
...
But it comes out as:
Group
One
Name
Group
Two
...
Solution 1:
By accepting Bacon Bits' helpful answer you've indicated that it solves your problem, but that still leaves the question what you meant to happen when you passed "`n "
- i.e., a 2-character PowerShell string - to the [string]
class's .Split()
method.
This answer makes the case for routinely using PowerShell's own -split
operator instead of the .Split()
method, because it:
- uses regular PowerShell operator syntax
- offers many more features
- has fewer surprises
- provides long-term behavioral stability
There are key differences between -split
and the .Split()
method:
-
By default,
-split
uses regular expressions to specify the split criterion; use the'SimpleMatch'
option as the 3rd RHS argument to use literal strings instead; by contrast, the.Split()
method only accepts literal strings. -
There's also a unary form of
-split
that splits by any runs of whitespace and ignores leading and trailing whitespace, similar toawk
's default behavior; this is equivalent to calling'...'.Split([string[]] $null, 'RemoveEmptyEntries')
-
-split
is case-insensitive by default (as is typical in PowerShell); use the-csplit
form for case-sensitive matching; by contrast,.Split()
is invariably case-sensitive. -
-split
accepts an array-valued LHS, returning a concatenation of the token arrays resulting from splitting the LHS's elements. -
-split
implicitly converts the LHS to string(s); by contrast,.Split()
can only be invoked on something that already is a[string]
.
Note: Both -split
and .Split()
allow you to limit the number of tokens returned with an optional 2nd argument, which only splits part of the input string, reporting the rest of the input string in the last element of the return array.
For the full story, see Get-Help about_Split
.
The .Split()
method has one advantage, though: it is faster than the -split
operator; so, if .Split()
's features are sufficient in a given scenario, you can speed things up with it.
Examples:
Note: In the examples below that use regular expressions, single-quoted strings are used, with LF characters represented as regular-expression escape sequence \n
rather than the `n
escape sequences PowerShell supports in any double-quoted strings, because it is preferable to specify regular expressions as single-quoted strings, to avoid confusion between what PowerShell expands up front and what -split
ends up seeing.
-
Split by any in a set of characters, as a regular expression:
"`n"
(LF) and also" "
(single space):-
"one two`n three four" -split '[\n ]'
yields the equivalent of@( 'one', 'two', '', 'three', 'four' )
-
-
Split by a string, specified as a regular expression:
"`n "
:-
"one two`n three four" -split '\n '
yields the equivalent of@( 'one two', 'three four' )
-
-
Split by a string literal:
"`n "
, using theSimpleMatch
option:-
"one two`n three four" -split "`n ", 0, 'SimpleMatch'
yields the same as above; note that0
is the number-of-tokens-to-return argument, which must be specified for syntax reasons here;0
indicates that all tokens should be returned.
-
-
Use capture groups (
(...)
) in the separator regex to include (parts of) separators in the result array:-
'a/b' -split '(/)'
yields the equivalent of@('a', '/', 'b')
- Alternatively, use a positive lookahead assertion (
(?=...)
) to make the separators part of the elements:'a/b/c' -split '(?=/)'
yields the equivalent of@( 'a', '/b', '/c' )
-
-
Limit the number of tokens:
-
'one two three four' -split ' ', 3
yields the equivalent of@( 'one', 'two', 'three four' )
, i.e., the 3rd token received the remainder of the input string. -
Caveat: elements that are (parts of) separators captured via a capture group in the separator regex do not count toward the specified limit; e.g.,
'a/b/c' -split '(/)', 2
yields@( 'a', '/', 'b/c' )
, i.e. 3 elements in total.
-
-
Split by any run of whitespace (unary form):
-
-split "`n one `n`n two `t `t three`n`n"
yields the equivalent of@( 'one', 'two', 'three' )
-
String.Split()
-method pitfalls:
Having access to the .NET Framework's method if needed is a wonderful option that allows you to do in PowerShell most of what compiled .NET languages can do.
However, there are things that PowerShell has to do behind the scenes that are typically helpful, but can also be pitfalls:
For instance, 'foo'.Split("`n ")
causes PowerShell to implicitly convert the string "`n "
to a character array ([char[]]
) before calling .Split()
(the closest match among the method overloads), which may be unexpected.
Your intent may have been to split by string "`n "
, but the method overload invoked ended up interpreting your string as a a set of individual characters any one of which to split the input by.
Incidentally, the cross-platform PowerShell Core edition has an additional .Split()
overload that does now directly take a [string]
argument, so the same call behaves differently there.
This changing behavior outside the control of PowerShell is in itself a good reason to prefer PowerShell-only solutions - for an explanation of why such changes are outside PowerShell's control, see this GitHub issue.
You can avoid such pitfalls by explicit typing, but that is both cumbersome and easy to forget.
Case in point:
In Windows PowerShell, if you truly wanted to split by string "`n "
, this is what you'd need to do:
PS> "one`n two".Split([string[]] "`n ", 'None')
one
two
Note the necessary cast to [string[]]
- even though only one string is passed - and the required use of the option parameter (None
).
Conversely, if you wanted to split by a set of characters in PowerShell Core:
PS> "one`ntwo three".Split([char[]] "`n ")
one
two
three
Without the [char[]]
cast, "`n "
would be considered a single string to split by.
Solution 2:
The string argument in String.Split()
is a list of characters to split on, not a sequence of characters to match and then split on. Your existing code will split on newline, and will split on space.
If you only want to split on newline, use:
.split("`n")
If you want to split on the character sequence of a newline followed immediately by a space, you can use Regex.Split()
:
[Regex]::Split($entry.'Group Name',"`n ") | ...
Alternately, you can use the -split
operator, which also splits by a string and not a list of characters:
$entry.'Group Name' -split "`n "
Solution 3:
If I'm reading correctly, your call to .Split
is passing in both `n
and the space character. So, you are actually asking PowerShell to turn a string like "Group One Name"
into a list like @("Group", "One", "Name")
.
If $entry
is a single record, and you are running this line once for each of "Group One Name", "Group Two Name" and "Group Three Name", then you probably don't need the call to .Split
at all -- simply use $entry.'Group Name'
directly.