Removing duplicate messages with syslog-ng in a redundant logging environment

19 Sep 2023

Creating highly available servers is difficult. Sending logs to two (or more) servers and hoping that at least one of them can collect logs any time is a lot easier. Since network traffic and storage are cheap, redundancy is usually not a problem. However, once you also want to analyze your log messages using a SIEM or other software, you do not want duplicate log messages.

In this blog, I show you how you can make sure that each log message produced on your network reaches your SIEM system only once. Sending logs to a SIEM is expensive: both resource usage and licensing costs are much higher than at the log management level.

Before you begin

Unless you use a really ancient Linux distribution, like RHEL 7 or SLES 12, you do not have to care about the syslog-ng version. Any feature I mention should work out of the box. If you use one of these dinosaurs, check https://www.syslog-ng.com/community/b/blog/posts/installing-latest-syslog-ng-on-rhel-and-other-rpm-distributions for up-to-date syslog-ng RPM packages.

The use case

Your company has at least two departments (or offices). Each department collects its own log messages. You want to make sure that no log messages are lost. Solving high availability is difficult and expensive. It is much easier to make sure that logs are sent not only to the local department server, but also to an other department’s server. If the local server has problems, you can still check the logs on the other server. It is easy to implement, and even if it doubles network and storage usage, both are relatively cheap nowadays.

However, once you want to analyze your logs using a SIEM or other software tools, duplicate logs are suddenly a problem. SIEM systems are a lot more resource intensive than log management, so the fewer logs, the less hardware resources you need. Also, most SIEM systems are licensed based on the amount of log messages processed. So, even if your log management layer has all logs two or three times, you want to make sure that your SIEM gets them only once.

The solutions

Three solutions came to my mind. The last one was the easiest, while the first one required the most manual labor and was the most error-prone.

The first possibility is using the syslog-ng in-list() filter. You can create a text file listing a host name or IP address from your network on each line. Loading this file into the in-list() filter, you can check if the content of the HOST macro matches the list. If all your departments forward only local logs to the SIEM, then there are no more duplicate logs (even if your logs are saved to multiple syslog-ng servers). The in-list() filter is nice, but error-prone. It is easy to forget about it, and when your list of hosts changes, suddenly not all relevant logs are sent to the SIEM.

The second solution still checks the source of the logs, but it is more generic. You can use it if your department has its own subnet or has an easily identifiable host names. In this case, you can use the netmask() or the host() filter of syslog-ng. You can even combine multiple filters into one. This method is more flexible and a lot less work than listing each individual host. Still, it can lead to lost messages if you change naming conventions.

The third solution is to use separate sources for logs from the local and remote networks. You can collect logs from a remote department using a different IP address or port. Of course, you might need some extra work: re-configuring clients, changing the firewall, and so on. However, in the long term, it is the cleanest possible solution.

Using a separate source for logs from remote networks, you can handle those logs independently from local logs. Or you can also process them together with the rest of the logs, and not forward them to the SIEM. How does it work? For each incoming log message, syslog-ng generates a macro called SOURCE, which contains the name of the source in the syslog-ng configuration where the message arrived. You can create a filter to check the content of the SOURCE macro, and only forward logs from the name of the source matches the name of source where you collect logs from the local network.

You can see the SOURCE macro in action in one of my recent blogs, showing you how to build a configuration from the ground up: https://www.syslog-ng.com/community/b/blog/posts/developing-a-syslog-ng-configuration

What is next?

Obviously, removing duplicate logs is just part of the possibilities how you can optimize your SIEM systems. There are many filters in syslog-ng to make sure that debug level logs of your application (which you cannot automatically analyze anyway) do not reach your SIEM, and so on. Here we only focused on making sure that a log message arrives only once. Message parsing and filtering provide you with precision tools to make sure that all relevant logs reach your SIEM systems, but only those.

Finally: an apology…

This question came up quite a few times, and I always responded with the in-list(), host() and netmask() filters. The easiest solution is the last one: using a separate source.

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/syslog-ng/syslog-ng. On Twitter, I am available as @PCzanik, on Mastodon as @Pczanik@fosstodon.org.