Split one file into multiple files based on delimiter

I have one file with -| as delimiter after each section...need to create separate files for each section using unix.

example of input file

wertretr
ewretrtret
1212132323
000232
-|
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

Expected result in File 1

wertretr
ewretrtret
1212132323
000232
-|

Expected result in File 2

ereteertetet
232434234
erewesdfsfsfs
0234342343
-|

Expected result in File 3

jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

Solution 1:

A one liner, no programming. (except the regexp etc.)

csplit --digits=2  --quiet --prefix=outfile infile "/-|/+1" "{*}"

tested on: csplit (GNU coreutils) 8.30

Notes about usage on Apple Mac

"For OS X users, note that the version of csplit that comes with the OS doesn't work. You'll want the version in coreutils (installable via Homebrew), which is called gcsplit." — @Danial

"Just to add, you can get the version for OS X to work (at least with High Sierra). You just need to tweak the args a bit csplit -k -f=outfile infile "/-\|/+1" "{3}". Features that don't seem to work are the "{*}", I had to be specific on the number of separators, and needed to add -k to avoid it deleting all outfiles if it can't find a final separator. Also if you want --digits, you need to use -n instead." — @Pebbl

Solution 2:

awk '{f="file" NR; print $0 " -|"> f}' RS='-\\|'  input-file

Explanation (edited):

RS is the record separator, and this solution uses a gnu awk extension which allows it to be more than one character. NR is the record number.

The print statement prints a record followed by " -|" into a file that contains the record number in its name.

Solution 3:

Debian has csplit, but I don't know if that's common to all/most/other distributions. If not, though, it shouldn't be too hard to track down the source and compile it...

Solution 4:

I solved a slightly different problem, where the file contains a line with the name where the text that follows should go. This perl code does the trick for me:

#!/path/to/perl -w

#comment the line below for UNIX systems
use Win32::Clipboard;

# Get command line flags

#print ($#ARGV, "\n");
if($#ARGV == 0) {
    print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename.  All of the contents of filename.txt are written to that file until another mff is found.\n";
    exit;
}

# this package sets the ARGV count variable to -1;

use Getopt::Long;
my $mff = "";
GetOptions('mff' => \$mff);

# set a default $mff variable
if ($mff eq "") {$mff = "-#-"};
print ("using file switch=", $mff, "\n\n");

while($_ = shift @ARGV) {
    if(-f "$_") {
    push @filelist, $_;
    } 
}

# Could be more than one file name on the command line, 
# but this version throws away the subsequent ones.

$readfile = $filelist[0];

open SOURCEFILE, "<$readfile" or die "File not found...\n\n";
#print SOURCEFILE;

while (<SOURCEFILE>) {
  /^$mff (.*$)/o;
    $outname = $1;
#   print $outname;
#   print "right is: $1 \n";

if (/^$mff /) {

    open OUTFILE, ">$outname" ;
    print "opened $outname\n";
    }
    else {print OUTFILE "$_"};
  }