Logstash grok pattern for nginx error dosen't work for all lines
I have a grok pattern for nginx error log:
(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{GREEDYDATA:text}, client: %{IP:client}, server: %{GREEDYDATA:server}, request: \"(?:%{WORD:requesttype} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion}))\"(, upstream: "%{GREEDYDATA:upstream}\")?, host: "%{DATA:host}\"(, referrer: \"%{GREEDYDATA:referrer}\")?
And I have this kind of errors
2022/01/17 08:23:39 [error] 8#8: *0000016 this is my error message, client: 1.2.3.4, server: my.server.name, request: "GET /my/url/ HTTP/1.1", upstream: "http://upstream.server.name", host: "my.server.name", referrer: "https://referrer.server.name/"
2022/01/17 12:30:41 [error] 8#8: *0000016 access forbidden by rule, client: 1.2.3.4, server: my.server.name, request: "GET / HTTP/2.0", host: "my.server.name"
2022/01/17 08:23:39 [error] 8#8: *0000016 could not be resolved
For testing I'm using grokconstructor in appspot and my grok pattern works great for first two lines, but I can't make it works for the last line:
What pattern will work for all three lines?
this would work:
(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{DATA:text}(, client: %{IP:client}, server: %{GREEDYDATA:server}, request: \"(?:%{WORD:requesttype} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion}))\"(, upstream: "%{GREEDYDATA:upstream}\")?, host: "%{DATA:host}\"(, referrer: \"%{GREEDYDATA:referrer}\")?)?$.
The last part is made optional with ()?
.
Just a note, for the key-value pairs at the end of the first two lines, you should use the kv plugin instead of doing it in grok. So store that part using the grok filter in a temporary field like this:
(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{DATA:text}(, %{GREEDYDATA:kv_value})?$
And then you use the kv filter on kv_value:
kv {
field_split => " ,"
value_spliy => ": "
trim_value => "\""
source => "kv_value"
}
In both cases, don't forget the $
at the end of the grok pattern, as it will fail to work properly without.