Split one file into multiple files based on delimiter
I have one file with -|
as delimiter after each section...need to create separate files for each section using unix.
example of input file
wertretr
ewretrtret
1212132323
000232
-|
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|
Expected result in File 1
wertretr
ewretrtret
1212132323
000232
-|
Expected result in File 2
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
Expected result in File 3
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|
Solution 1:
A one liner, no programming. (except the regexp etc.)
csplit --digits=2 --quiet --prefix=outfile infile "/-|/+1" "{*}"
tested on:
csplit (GNU coreutils) 8.30
Notes about usage on Apple Mac
"For OS X users, note that the version of csplit
that comes with the OS doesn't work. You'll want the version in coreutils (installable via Homebrew), which is called gcsplit
." — @Danial
"Just to add, you can get the version for OS X to work (at least with High Sierra). You just need to tweak the args a bit csplit -k -f=outfile infile "/-\|/+1" "{3}"
. Features that don't seem to work are the "{*}"
, I had to be specific on the number of separators, and needed to add -k
to avoid it deleting all outfiles if it can't find a final separator. Also if you want --digits
, you need to use -n
instead." — @Pebbl
Solution 2:
awk '{f="file" NR; print $0 " -|"> f}' RS='-\\|' input-file
Explanation (edited):
RS
is the record separator, and this solution uses a gnu awk extension which allows it to be more than one character. NR
is the record number.
The print statement prints a record followed by " -|"
into a file that contains the record number in its name.
Solution 3:
Debian has csplit
, but I don't know if that's common to all/most/other distributions. If not, though, it shouldn't be too hard to track down the source and compile it...
Solution 4:
I solved a slightly different problem, where the file contains a line with the name where the text that follows should go. This perl code does the trick for me:
#!/path/to/perl -w
#comment the line below for UNIX systems
use Win32::Clipboard;
# Get command line flags
#print ($#ARGV, "\n");
if($#ARGV == 0) {
print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename. All of the contents of filename.txt are written to that file until another mff is found.\n";
exit;
}
# this package sets the ARGV count variable to -1;
use Getopt::Long;
my $mff = "";
GetOptions('mff' => \$mff);
# set a default $mff variable
if ($mff eq "") {$mff = "-#-"};
print ("using file switch=", $mff, "\n\n");
while($_ = shift @ARGV) {
if(-f "$_") {
push @filelist, $_;
}
}
# Could be more than one file name on the command line,
# but this version throws away the subsequent ones.
$readfile = $filelist[0];
open SOURCEFILE, "<$readfile" or die "File not found...\n\n";
#print SOURCEFILE;
while (<SOURCEFILE>) {
/^$mff (.*$)/o;
$outname = $1;
# print $outname;
# print "right is: $1 \n";
if (/^$mff /) {
open OUTFILE, ">$outname" ;
print "opened $outname\n";
}
else {print OUTFILE "$_"};
}