How to retain first and last line and remove in between in logfile
Solution 1:
You can try this regex:
^(\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}(.*$)\s*)(\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}\2\s*)+
Click for Demo
Explanation:
-
^
- matches the start of a line -
(\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}(.*$)\s*)
- First log line is stored as group 1-
(\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}
- matches the pattern of formatXX/XX/XX XX:XX:XX
whereX
is a digit -
(.*$)
- matches everything until the end of the line. Whatever is matched is stored in Group 2. The actual log(without the timestamp) is stored in this group. -
\s*
- matches 0 or more whitespaces
-
-
(\d{2}(?:\/\d{2}){2} \d{2}(?::\d{2}){2}\2\s*)+
- matches all the remaining continuous log lines starting with the formatXX/XX/XX XX:XX:XX
followed by contents of group 2 but only the last such log line will be stored in group 3
Now, replace each match with contents of group 1 followed by group 3 $1$3
Solution 2:
While using a regex may be possible, this can easily be solved with normal Perl code. I think the code is clearer and easier to maintain. I added 3 lines to your sample input to test for the edge case that we end on a line which matches our search.
use strict;
use warnings;
# This string can be replaced as needed
my $str = "INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again";
my ($first, $last);
while (<DATA>) {
if (/\Q$str/) { # if pattern matches current line
if ($first) { # if this is an "in between" line
$last = $_; # save line and go next
} else { # if this is the first line
print if not eof; # print it..
$first = $_; # ...save line and go next
}
print if eof; # print last line to avoid edge cases
} elsif ($first && $last) { # $str didn't match: finished a range of lines
print $last, $_; # print and reset
$first = undef;
$last = undef;
} else {
print; # print everything else
}
}
__DATA__
sometext1
sometext2
22/01/03 14:42:25 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:27 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:29 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:31 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:33 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:35 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:37 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:39 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
sometext3
sometext4
22/01/03 14:42:49 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:53 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
sometext5
sometext6
22/01/03 14:42:49 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
Output:
sometext1
sometext2
22/01/03 14:42:25 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:39 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
sometext3
sometext4
22/01/03 14:42:49 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:53 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
sometext5
sometext6
22/01/03 14:42:49 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again
22/01/03 14:42:51 INFO rpc.b: Could not get any http protocol, using HTTP and will try to get protocol again