syslog-ng: How to reduce high latency when forwarding logs to a syslog tcp consumer?
UPDATE 2: I've answered this via my new question at the link below. The root cause is behaviour by telegraf where by default it disconnects the TCP connection 5 seconds after the last received message. This may be by design, however I have an issue with their documentation which made this difficult for me to spot as a potential fix.
Perhaps this question can now be deleted?
UPDATE 1: rather than edit this question extensively, making the current answers make no sense, I have posed a new question based on new information I received as a result of posting this question.
syslog-ng / telegraf : EOF occurred when idle - incompatible?
I'm using syslog-ng Open-Source Edition (OSE) v3.31.2 in a docker-compose stack.
I have syslog messages arriving over the network from various hosts via UDP (which I'm constrained to because my clients use Boost::Log and this does not support syslog over TCP, only UDP), and I have syslog-ng set to forward these to another service downstream. This happens to be telegraf utilising a inputs.syslog
module, but I'm not sure that matters yet.
My config looks like this:
@version: 3.29
@include "scl.conf"
options {
flush-lines(1);
};
source s_network {
udp(ip(0.0.0.0) port(514));
};
destination d_file {
file("/var/log/messages");
};
destination d_telegraf {
syslog("telegraf" port(6514) transport(tcp));
};
log {
source(s_network);
destination(d_telegraf);
destination(d_file);
};
I have explicitly set the global flush-lines
value to 1. I think this is the default, but I want to be sure. I want log messages to be forwarded as soon as they are received.
Most of the time this works - individual "lines" of logs arrive into syslog-ng via UDP 514, and are immediately written to the file /var/log/messages
, and in almost all cases they are also immediately forwarded to telegraf on TCP port 6514.
The problem I'm seeing is that quite often syslog-ng is holding back many lines of incoming logs for up to around 30-60 seconds, then delivering them to telegraf in a big chunk. There doesn't seem to be much pattern to this, but it happens a lot. The odd thing is that the /var/log/messages
file has the missing log entries written immediately, it's just the network delivery that is delayed. I had thought that flush-lines(1)
would avoid this buffering, but it doesn't seem to.
I've used Wireshark to determine where the delay is, and it's in the output of packets from syslog-ng, between syslog-ng and telegraf TCP port 6514.
I did wonder if this might be a TCP Nagle's Algorithm thing - if so, is there a way to turn on the TCP_NO_DELAY socket option for syslog-ng's syslog destination driver?
Ultimately what I'm looking for is a fast, low-latency syslog service that can aggregate and relay logs as quickly as possible for real-time review downstream.
EDIT: I tried switching over to UDP transport between syslog-ng and telemetry and this seems to be much more responsive and the long, occasional delays have disappeared. However this will make it difficult to secure the connection in future.
Solution 1:
What you experience is not normal. The above configuration should forward logs to d_telegraf
and d_file
at the same time, as soon as possible.
I believe you are having connection issues, that must be the reason for the 60-second delay, which is the default value of the reconnection timer.
You can lower this value using the time-reopen()
global option, for example:
options {
time-reopen(1);
};
You can also start syslog-ng in the foreground (in debug mode) to investigate the connection issues:
$ syslog-ng -Fdev
Solution 2:
Try flush-lines(0) by just deleting that line all together.
How does syslog-ng handles flush_lines(0)?
https://github.com/syslog-ng/syslog-ng/issues/1411