How to get text from range of dates using grep/sed in large text file?
With grep
if you know the number of lines you want you can use context option -A
to print lines after the pattern
grep -A 3 2016-07-13 file
that will give you the line with 2013-07-13 and the next 3 lines
with sed
you can use the dates to delimit like this
sed -n '/2016-07-13/,/2016-07-19/p' file
which will print all lines from the first line with 2016-07-13 up to and including the first line with 2016-07-19. But that assumes you have only one line with 2016-07-19 (it will not print the next line). If there are multiple lines use the next date instead and use d
to delete the output from it
sed -n '/2016-07-13/,/2016-07-20/{/2016-07-20/d; p}' file
This simple grep one liner will be enough:
grep -E ^2016-07-1[3-9] filename
Works nicely here and there is no need for sed :)
References:
- Matching Numeric Ranges with a Regular Expression
awk
solution:
$ awk '/^2016-07-13.*/,/2016-07-19.*/' input.txt
2016-07-13 < ?xml version>
2016-07-18 < ?xml version>
2016-07-18 < ?xml version>
2016-07-19 < ?xml version>
Basically prints any line from the one that starts with 2016-07-13
to the one that starts with 2016-07-19
All the other current answers rely on the fact that the log file entries are sorted chronologically or the fact that the date range can be matched easily with regular expressions. If you want a more generic solution, we need to do some more programming.
I present this GNU AWK script:
#!/usr/bin/gawk -f
BEGIN {
starttime = mktime(starttime)
endtime = mktime(endtime)
}
func in_range(n, start, end) {
return start <= n && n < end
}
match($0, /^([0-9]{4})-([0-9]{2})-([0-9]{2})\s/, m) &&
in_range(mktime(m[1] " " m[2] " " m[3] " 00 00 00"), starttime, endtime)
You supply the start and end time through the variables starttime
and endtime
in a format that mktime
understands (YYYY MM DD hh dd ss
). Thus you run the awk
command like so, assuming that the above Awk script is in an executable file filter-log-dates.awk
in the current working directory and the log file is mylog.txt
:
./filter-log-dates.awk -v starttime='2016 07 13 00 00 00' -v endtime='2016 07 20 00 00 00' mylog.txt
Note that the end time is exclusive, i. e. valid log records must have a time stamp before the end time.
If your time stamp format is different, you can adjust the regular expression passed to the match
function to suit it.