RegEx string formatting in Notepad++
I'm pretty good in RegEx, but there's one thing I can't seem to figure out how it works.
How would one search/replace in NotePad++ and make sure that the output has a fixed length, while the input can be flexible?
For example, doing a regex on this: 23-6-2016
to become: 23-06-2016
(extra 0 for 06, but not if its 12 for example)
Another option is to create this:
TestString
and Test
would becomeTestString______________________
(extra spaces.)Test____________________________
(extra spaces.)
Of course, the idea here is to do a mass search/replace where the output all has the same length.
Please explain the thought behind it.
EDIT: to give an idea of the data I'm working with, here's an example row that I need to process:
12345678 TXT 19700101 0 100 20160624 100 Comment text
12345678 TXT 19700101 100 100,25 20160624 0,25 Comment text
12345678 TXT 19700101 100,25 100,5 20160624 0,25 Comment text
Note that these are separated by tabs. The first 0 in the first line should be formatted as 0,00, the 100 as 100,00, but the 12345678 and the dates should not be formatted with ,00 The last 100,5 should be formatted as 100,50
I got around the date stuff, so that is less important right now.
In respond to:
12345678 TXT 19700101 0 100 20160624 100 Comment text 12345678 TXT 19700101 100 100,25 20160624 0,25 Comment text 12345678 TXT 19700101 100,25 100,5 20160624 0,25 Comment text
For 4th column:^((?:\S+\s+){3}\d+)(\s)
to \1,0\2
^((?:\S+\s+){3}\d+,\d)(\s)
to \10\2
For 5th/7th column:
similar to above, just replace {3}
with {4}
/{6}
in the rule respectively
Explanation
The 1st rule appends ,0
to numbers without ,
. Now all numbers must have ,\d
.
The 2nd rule appends a 0
to those with single digit after comma.
As for (?:)
:non-capture group, the previous columns are already captured as \1
so additional capturing is unnecessary.
This only pads number to 2 decimal places. To pad an arbitrary amount, use the pad excessively, then trim
approach.
Final word?
In my opinion, plain regex as in notepad++ is inadequate for this task. Some basic scripting like bash or perl would have handled this with much higher readability.
Section A: Pad to specific length
To right-pad lines with N characters using regular expressions, add N spaces to the end of the line, then group the first N characters replacing the rest.
Pass 1: Add padding characters
Find: $
Replace: ______________________________
At the end of the line add 30 spaces. (I used underscores since spaces wouldn't format on the post).
Pass 2: Trim left 30 characters
To pad a dash-delimited date at the beginning of a line, match each section accordingly.
Find: ^([[:print:]]{0,30}).*$
Replace with \1
At the beginning of the line, replace a group up to thirty printable characters followed by any remaining characters with the group.
To pick a different line-length, use n-spaces in Pass 1 then replace 30 with the length in Pass 2.
Section B: Line starting with date
Pass 1 (day of month):
Find what: ^([0-9])-
Replace with: 0\1-
Replace the pattern (line starting with a single digit followed by a dash) with the padded zero, the digit, and the dash.
Pass 2 (month):
Find what: -([0-9])-
Replace with: -0\1-
Replace the pattern (a single digit between two dashes) with a dash, the padded zero, the digit, and the dash.