How to maximize utilization of fluentd server?

Solution 1:

First, I would check what are the bottlenecks:

If your app is not overloading the fluentd service, then why even use 32 cores?

If fluentd output is the bottleneck, you can use multi threading with the num_thread option; that way you may want to use like 5 threads on 6 fluentd instances, adding up to 30 cores, instead of 32 single instances where only 6 are used.

As for input, if your servers keep connections open, then indeed this is your bottleneck, and then you might want to deploy more of those services to increase the number of logging outputs to your fluentd inputs.