Split log file by date

Solution 1:

A perl solution, taking advantage of GNU date to convert the dates:

perl -ne 'if(/^###<(.*)>/){
            chomp($d=`date -d \"$1\" +%Y_%m_%d`);
            $name="$d.log"
          } 
          open(my $fh,">>","$name"); 
          print $fh $_;' file.log 

Explanation

  • -ne : read the input file line by line (saving each line as the special variable $_) and apply the script given by -e to each line.
  • if(/^###<(.*)>/) : if the line starts with ###<, capture everything between the <> as $1 (that's what the parentheses do).
  • chomp($d=date -d \"$1\" +%Y_%m_%d); : the date command reformats the date. For example:

    $ date -d "Sep 1, 2016 1:00:01 AM" +%Y_%m_%d
    2016_09_01
    

    The chomp removes the final newline from the result of date so we can use it later.

  • $name="$d.log" : we save the result of the date command plus .log as the variable $name.
  • open(my $fh,">>","$name"); : open the file $name as the file handle $fh. Don't worry if you don't know what a file handle is, this just means that print $fh "foo" will print foo into $name.
  • print $fh $_; : print the current line into the file that the file handle $fh points to. So, print the line into whatever is currently saved as $name.

Solution 2:

One approach for solving this could be to use awk. For example, this command:

awk -F'[ <,]+' '/^###/{close(f);f=$4"_"$2"_"$3".log"}{print >> f}END{close(f)}' file

should split the file into the files, using the date fields as filenames

Solution 3:

With awk:

awk '/^#+<[^>]+>$/ {if (lines) print lines >file; \
     dt=gensub("^#+<([^>]+)>$", "\\1", $0)
     dt_cmd="date -d \""dt"\" +%Y_%m_%d.log" \
     dt_cmd | getline file; lines=$0; next}; \
     {lines=lines ORS $0} END {print lines >file}' file.log

Readable form:

awk '
      /^#+<[^>]+>$/ {
                    if (lines) 
                        print lines >file
                    dt=gensub("^#+<([^>]+)>$", "\\1", $0)
                    dt_cmd="date -d \""dt"\" +%Y_%m_%d.log"
                    dt_cmd | getline file; lines=$0
                    next
                    }
      {
      lines=lines ORS $0
      } 
      END {
          print lines >file
          }' file.log
  • /^#+<[^>]+>$/ matches the lines containing dates, the chunk surrounded by {} will only be run if the condition matches. If matches, we are getting the date in desired format by using external date command and saving the output in variable file, and saving the content of variable lines so far as file file (from previous chunk), and then instantiate the variable lines again with the line

  • For all other lines, we concatenating the lines as variable lines

  • The last chunk is saved by putting in the END block