How to parse audit.log using logstash
A quick search finds this on github
AUDIT type=%{WORD:audit_type} msg=audit\(%{NUMBER:audit_epoch}:%{NUMBER:audit_counter}\): user pid=%{NUMBER:audit_pid} uid=%{NUMBER:audit_uid} auid=%{NUMBER:audit_audid} subj=%{WORD:audit_subject} msg=%{GREEDYDATA:audit_message}
AUDITLOGIN type=%{WORD:audit_type} msg=audit\(%{NUMBER:audit_epoch}:%{NUMBER:audit_counter}\): login pid=%{NUMBER:audit_pid} uid=%{NUMBER:audit_uid} old auid=%{NUMBER:old_auid} new auid=%{NUMBER:new_auid} old ses=%{NUMBER:old_ses} new ses=%{NUMBER:new_ses}
A cursory review suggests it's probably what you're looking for.
The audit logs are written as a series of key=value pairs which are easily extracted using the kv filter. However I have noticed that the key msg
is sometimes used twice and is also a series of key=value pairs.
First grok is used to get the fields audit_type
, audit_epoch
, audit_counter
and sub_msg
(the 2nd msg field)
grok {
pattern => [ "type=%{DATA:audit_type}\smsg=audit\(%{NUMBER:audit_epoch}:%{NUMBER:audit_counter}\):.*?( msg=\'(?<sub_msg>.*?)\')?$" ]
named_captures_only => true
}
kv is used to extract all of the key=value pairs except for msg and type since we have already obtained that data with grok:
kv {
exclude_keys => [ "msg", "type" ]
}
kv is used again to parse the key=value pairs in sub_msg ( if it exists ):
kv {
source => "sub_msg"
}
date is used to set the date to the value in audit_epoch, using the date format UNIX
will parse float or integer timestamps:
date {
match => [ "audit_epoch", "UNIX" ]
}
Lastly mutate is used to remove redundant fields:
mutate {
remove_field => ['sub_msg', 'audit_epoch']
}
You could also rename fields like sysadmin1138 suggested:
mutate {
rename => [
"auid", "uid_audit",
"fsuid", "uid_fs",
"suid", "uid_set",
"ses", "session_id"
]
remove_field => ['sub_msg', 'audit_epoch']
}
All combined the filter looks like this:
filter {
grok {
pattern => [ "type=%{DATA:audit_type}\smsg=audit\(%{NUMBER:audit_epoch}:%{NUMBER:audit_counter}\):.*?( msg=\'(?<sub_msg>.*?)\')?$" ]
named_captures_only => true
}
kv {
exclude_keys => [ "msg", "type" ]
}
kv {
source => "sub_msg"
}
date {
match => [ "audit_epoch", "UNIX" ]
}
mutate {
rename => [
"auid", "uid_audit",
"fsuid", "uid_fs",
"suid", "uid_set",
"ses", "session_id"
]
remove_field => ['sub_msg', 'audit_epoch']
}
}
A better solution than grok may be to use the kv filter. This parses fields configured in "key=value" format, which most audit-log entres are. Unlike Grok, this will handle strings with sometimes-there-sometimes-not fields. However, the field-names are in their less-useful short-forms, so you may need to do some field-renaming.
filter {
kv { }
}
That would get you most of it, and the fields would match what shows up in the logs. All the data-types would be string
. To go to all the trouble to humanize the fields:
filter {
kv { }
mutate {
rename => {
"type" => "audit_type"
"auid" => "uid_audit"
"fsuid => "uid_fs"
"suid" => "uid_set"
"ses" => "session_id"
}
}
}
The msg
field, which contains the timestamp and event-Id, will still need to be grokked, though. The other answers show how to do that.
filter {
kv { }
grok {
match => { "msg" => "audit\(%{NUMBER:audit_epoch}:%{NUMBER:audit_counter}\):"
}
mutate {
rename => {
"type" => "audit_type"
"auid" => "uid_audit"
"fsuid => "uid_fs"
"suid" => "uid_set"
"ses" => "session_id"
}
}
}