syslog-ng: How to reduce high latency when forwarding logs to a syslog tcp consumer?

UPDATE 2: I've answered this via my new question at the link below. The root cause is behaviour by telegraf where by default it disconnects the TCP connection 5 seconds after the last received message. This may be by design, however I have an issue with their documentation which made this difficult for me to spot as a potential fix.

Perhaps this question can now be deleted?


UPDATE 1: rather than edit this question extensively, making the current answers make no sense, I have posed a new question based on new information I received as a result of posting this question.

syslog-ng / telegraf : EOF occurred when idle - incompatible?


I'm using syslog-ng Open-Source Edition (OSE) v3.31.2 in a docker-compose stack.

I have syslog messages arriving over the network from various hosts via UDP (which I'm constrained to because my clients use Boost::Log and this does not support syslog over TCP, only UDP), and I have syslog-ng set to forward these to another service downstream. This happens to be telegraf utilising a inputs.syslog module, but I'm not sure that matters yet.

My config looks like this:

@version: 3.29
@include "scl.conf"

options {
    flush-lines(1);
};
    
source s_network {
    udp(ip(0.0.0.0) port(514));
};

destination d_file {
    file("/var/log/messages");
};
    
destination d_telegraf {
    syslog("telegraf" port(6514) transport(tcp));
};
    
log {
    source(s_network);
    destination(d_telegraf);
    destination(d_file);
};

I have explicitly set the global flush-lines value to 1. I think this is the default, but I want to be sure. I want log messages to be forwarded as soon as they are received.

Most of the time this works - individual "lines" of logs arrive into syslog-ng via UDP 514, and are immediately written to the file /var/log/messages, and in almost all cases they are also immediately forwarded to telegraf on TCP port 6514.

The problem I'm seeing is that quite often syslog-ng is holding back many lines of incoming logs for up to around 30-60 seconds, then delivering them to telegraf in a big chunk. There doesn't seem to be much pattern to this, but it happens a lot. The odd thing is that the /var/log/messages file has the missing log entries written immediately, it's just the network delivery that is delayed. I had thought that flush-lines(1) would avoid this buffering, but it doesn't seem to.

I've used Wireshark to determine where the delay is, and it's in the output of packets from syslog-ng, between syslog-ng and telegraf TCP port 6514.

I did wonder if this might be a TCP Nagle's Algorithm thing - if so, is there a way to turn on the TCP_NO_DELAY socket option for syslog-ng's syslog destination driver?

Ultimately what I'm looking for is a fast, low-latency syslog service that can aggregate and relay logs as quickly as possible for real-time review downstream.

EDIT: I tried switching over to UDP transport between syslog-ng and telemetry and this seems to be much more responsive and the long, occasional delays have disappeared. However this will make it difficult to secure the connection in future.


Solution 1:

What you experience is not normal. The above configuration should forward logs to d_telegraf and d_file at the same time, as soon as possible.

I believe you are having connection issues, that must be the reason for the 60-second delay, which is the default value of the reconnection timer.

You can lower this value using the time-reopen() global option, for example:

options {
  time-reopen(1);
};

You can also start syslog-ng in the foreground (in debug mode) to investigate the connection issues:

$ syslog-ng -Fdev

Solution 2:

Try flush-lines(0) by just deleting that line all together.

How does syslog-ng handles flush_lines(0)?

https://github.com/syslog-ng/syslog-ng/issues/1411