How to retain first and last line and remove in between in logfile

Solution 1:

You can try this regex:

^(\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}(.*$)\s*)(\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}\2\s*)+

Click for Demo


Explanation:

  • ^ - matches the start of a line
  • (\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}(.*$)\s*) - First log line is stored as group 1
    • (\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2} - matches the pattern of format XX/XX/XX XX:XX:XX where X is a digit
    • (.*$) - matches everything until the end of the line. Whatever is matched is stored in Group 2. The actual log(without the timestamp) is stored in this group.
    • \s* - matches 0 or more whitespaces
  • (\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}\2\s*)+ - matches all the remaining continuous log lines starting with the format XX/XX/XX XX:XX:XX followed by contents of group 2 but only the last such log line will be stored in group 3

Now, replace each match with contents of group 1 followed by group 3 $1$3

Solution 2:

While using a regex may be possible, this can easily be solved with normal Perl code. I think the code is clearer and easier to maintain. I added 3 lines to your sample input to test for the edge case that we end on a line which matches our search.

use strict;
use warnings;

# This string can be replaced as needed
my $str = "INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again";

my ($first, $last);

while (<DATA>) {
    if (/\Q$str/) {                      # if pattern matches current line
        if ($first) {                    # if this is an "in between" line
            $last = $_;                  # save line and go next
        } else {                         # if this is the first line
            print if not eof;            # print it..
            $first = $_;                 # ...save line and go next
        }
        print if eof;                    # print last line to avoid edge cases
    } elsif ($first && $last) {          # $str didn't match: finished a range of lines
        print $last, $_;                 # print and reset
        $first = undef;
        $last = undef;
    } else {
        print;                           # print everything else
    }
}

__DATA__
sometext1
sometext2
22/01/03 14:42:25 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:27 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:29 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:31 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:33 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:35 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:37 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:39 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
sometext3
sometext4
22/01/03 14:42:49 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:53 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
sometext5
sometext6
22/01/03 14:42:49 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again

Output:

sometext1
sometext2
22/01/03 14:42:25 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:39 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
sometext3
sometext4
22/01/03 14:42:49 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:53 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
sometext5
sometext6
22/01/03 14:42:49 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again