Notepad++ - Removing the first column in a comma separated file
I have a large CSV file that I need to remove the first column of data. I cannot open it in Excel because Excel converts some of the values in the columns to scientific numbers.
I am using Notepad++, and I am trying to string the first column from the file EXE,
1,Value1,value2,value3,value4,value5
3445,Value1,value2,value3,value4,value5
12345,Value1,value2,value3,value4,value5
1234,Value1,value2,value3,value4,value5
11,Value1,value2,value3,value4,value5
to look like
Value1,value2,value3,value4,value5
Value1,value2,value3,value4,value5
Value1,value2,value3,value4,value5
Value1,value2,value3,value4,value5
Value1,value2,value3,value4,value5
Notepad++'s search and replace supports regular expressions (regex) which can be easily used for this.
Use the following regex to search for:
^[^,]+,(.+)
This matches the start of the line followed by as much characters as possible not being a comma followed by a comma followed by the rest of the line. The rest of the line is grouped as first submatch.
Globally replace with this:
\1
This denotes the first submatch (rest of the line). By that each line is replaced by everything after the first column and comma.
After I found the above way to do it in a single global replace (and updated my reply accordingly), I noticed that this reply is basically identical but also gives a comprehensive explanation of the regex used.
Note: The shorter regex ^[^,]+,
can't be used for global replace with an empty string since Notepad++ will then replace all columns except the last: After replacing the first column, the second column (which now is the first and matches exactly the regex) will be replaced, then the third, and so on. However, the shorter regex works perfectly with other editors (e.g. with PSPad or vim).
Press Ctrl + H and perform the following replace:
Find what: .*?,(.*)
Replace with: \1
Wrap around: checked
Regular expression: selected
. matches newline: unchecked
Now press Alt + A to replace all occurrences.
How it works
-
The regular expression
.*?,(.*)
matches an entire line:-
.*?,
matches everything before the first comma, including the comma itself..*
means any number of occurrences of any character, and the question mark makes the quantifier lazy, i.e., it matches as few characters as possible. -
(.*)
matches everything after the first comma.Enclosing
.*
in parentheses converts it into a subpattern, so the mast can be accessed in the replace field.
-
-
\1
represents the first submatch (match for(.*)
).As a result, Notepad++ replaces the line by everything that follows the first comma.
In Windows, you can do it as follows.
for /F "tokens=2,3,4,5,6 delims=," %i in (Input.csv) do @echo %i,%j,%k,%l,%m >> output.csv
I assumed that you have only 6 columns. If you have many more columns, try experimenting with * in tokens field. Idea is taken from Windows for command
Assuming you have a linux system or some unix style environment (I like gow, or you can snarf the utilities off unixutils) I believe running the file through cut -d , -f2-6
should do the trick - it should, if i recall correctly will do the trick - -d
sets the deliminator, and f2-6
prints out the second to 6th character.
cat input.csv | cut -d , -f2-6 > output.csv
would do the trick taking input file and kicking out an output file. Its not using notepad, but its fast and really simple.