How to configure a log aggregator to authenticate data?
This is a great question.
I use logstash to accomplish something like what you're proposing. Using logstash (or logstash-forwarder) to ship logs to your central collection system, add a logstash configuration to add a key field to the message, with its value being a long, random string that is unique to each server.
Then on the receiving side, you can add a corresponding rule to discard (or alert on) any messages where a specific hosts's key doesn't match what you expect for its hostname.
This is not bullet-proof, but it's a solid step in the right direction.
The right thing to use for this is TLS with machine client certificates.
rsyslog is doing this since about 2008, and has great instructions: http://www.rsyslog.com/doc/v8-stable/tutorials/tls_cert_summary.html
The process is extremely simple, as these things go:
- Set up a CA
- Issue certificates to all your computers that you want logs from
- Configure rsyslog to use that authentication
Then, your computers can't impersonate each other and nobody can log to your log server without one of your certificates.
I see you found that already, but you're still worried about their caveat. I wouldn't worry too much about that. Log injection is certainly a thing, but it is many things, including injection through the application and injection into the logging process. Authenticated rsyslog won't protect you if someone has a log injection attack in your application software, but nothing will or can; only fixing the application can help that. This will just protect you against spoofed logs.
The other caveats can be easily mitigated by not using relays, which there is really little reason to do anyway. If you don't have relays, and you use the x509/name option to the gtls connection driver in the rsyslog server, you should have no trouble.
See also the gtls config doc: http://www.rsyslog.com/doc/v8-stable/concepts/ns_gtls.html