What syslog-ng relays are good for

17 Apr 2019

While there are some users who run syslog-ng as a stand-alone application, the main strength of syslog-ng is central log collection. In this case the central syslog-ng instance is called the server, while the instances sending log messages to the central server are called the clients. There is a (somewhat lesser known) third type of instance called the relay, too. The relay collects log messages via the network and forwards them to one or more remote destinations after processing (but without writing them onto the disk for storage).A relay can be used for many different use cases. We will discuss a few typical examples below.

Note that the syslog-ng application has an open source edition (OSE) and a premium edition (PE). Most of the information below applies to both editions. Some features are only available in syslog-ng PE and some scenarios need additional licenses when implemented using syslog-ng PE.

UDP-only source devices

Typically, most network devices send log messages over UDP only. Even though some of them support TCP-based logging, vendors recommend not to use it (as in many cases the TCP logging implementation is extremely buggy). UDP does not guarantee that all UDP packets will be delivered, so it is a weak point of the system. To ensure at least a best effort level of reliability, it is recommended to deploy a relay on the network, closeto these source devices. With the least possible (and, more importantly, the most reliable) hops between the source and the relay, the risk of losing UDP packets can be minimized. Once the packet arrives at the relay, we can ensure the messages are delivered to the central server in a reliable manner, based on TCP/TLS and ALTP (syslog-ng PE only: Advanced Log Transfer Protocol).

Too many source devices

Depending on the hardware and configuration, an average syslog-ng instance can usually handle the following number of concurrent connections:

1. If the maximum message rate is lower than 200,000 messages per second:

◦ maximum ca. 5,000 TCP connections

◦ maximum ca. 1,000 TLS connections

◦ maximum ca. 1,000 ALTP connections

2. If the message rate is higher than 200,000 messages per second, always contact One Identity.

As a result, if you have more source devices, it is required to deploy a relay machine at least per 5,000 sources and batch up all the logs into a single TCP connection that connects the relay to the server. If TLS or ALTP is used, relays should be deployed per 1,000 source devices.

Collecting logs from remote sites (especially over public WAN)

It is quite common that companies need to collect log messages from geographically remote sites (sometimes in global distance), and sometimes over public WAN. In this case it is recommended to install a relay nodeper each remote site at least. The relay can be the last outgoing hop for all the messages of the remote site, which has several benefits:

Maintenance: you only need to change the configuration of the relayif you want to re-route the logs of some/all sources of the remote site. Plusou don't need to change each source’s configuration one by one.

Security: If you trust your internal network, it is not necessary to hold encrypted connections within the LAN of the remote site, as the messages can get to the relay without encryption. Naturally, messages should be sent in an encrypted way over the public WAN, and it is enough the hold only a single TCP/TLS connection between the sites (that is,between the remote relay and the central server). This eliminates the wasting of resources as holding several TLS connections directly from the clients is more costly than holding a single connection from the relay.

Reliability: It is possible to setup a 'main' disk-buffer on the relay. This main disk-buffer is only responsible for buffering all the logs of the remote site if the central syslog-ng server is temporarily unavailable. Of course, it is easier to maintain this single large main disk-buffer instead of setting disk-buffers on individual client machines.

Separation / distribution / balancing of message processing tasks

Most Linux applications have their own human readable, but difficult to handle, log messages. Without parsing and normalization, it is difficult to alert and report on these log messages. Many syslog-ng users utilize the message parsing tools of syslog-ng to normalize their different log messages. Just like normalization, filtering can also be resource-heavy, depending on what the filtering is based on. In this case, it might be inefficient to perform all the message processing tasks on the server (which can result in decreased overall performance). It is a typical setup todeploy relays in front of the central server operating as a receiver front-end. Most resource-heavy tasks (for example, parsing, filtering, etc) are performed on this receiver layer. As all resource-heavy tasks are performed on the relay, the central server behind it only needs to get the messages from the relay and write them into the final text-based or tamper-proof format (logstore, PE only). As you have the means to run more relays, you can balance the resource-heavy tasks between more relays and a single server behind them can still be fast enough to write all the messages onto the disk.

Acting as a relay depends on the functionality. Namely, a relay doesn't have to be a dedicated relay machine at all. In fact, it can be one of the clients with a relay configuration in terms of log collection. On the other hand, in a robust log collection infrastructure the relays have their own purpose, therefore it is recommended to run dedicated relay machines in such cases.

When it comes to the commercial PE version of syslog-ng, the relays are included in the price (at least until the licensed LSH number). Hence, you can run several parallel relays to ensure horizontal redundancy. Let's say each of the relays has the very same configuration and if one goes down, an other relay can take over processing. Distribution of the logs can be done by the built-in client-side failover functionality and by a general load-balancer as well. The latter is also used to serve N+1 redundant relay deployments (in this case, switching from one relay to an other relay is done not only due to an outage, but due to real load-balancing purposes, too).

What syslog-ng relays are NOT good for

The purpose of the relay is to buffer the logs for short term (for example, a few minutes or a few hours long, depending on the log volume) outages. It is not designed to buffer logs generated by the sources during a very long (for example, up to a few days long) server or connection outage.

If you expect extended outages, we recommend that you deploy servers instead of relays. There are many deployments where long term storage and archiving are performed on the central syslog-ng server, but relays also do short-term log storage. From the syslog-ng PE point of view, these are servers, and thus need separate server licenses.

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/balabit/syslog-ng. On Twitter, I am available as @PCzanik.

frankhansen over 5 years ago

Is there enough benefit in having a load balancing...like an F5 manage the failover of one relays message stream to another or would it be better to use the built-in client-side failover functionality?
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
daby cheng over 6 years ago in reply to Balzs Scheidler

ThanksBalzs Scheidler Will looking minimum 8 Cores and 16 GB ram as a single PE relay will complete the following tasks.

- Accept around 5000 TCP syslog sources as well as windows client TCP log sources and send all raw to SSB destination via TCP syslog.

- log filter, rewrite to Splunk destination via http HEC destination.

- Output file as text to integrate solution.
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
Balzs Scheidler over 6 years ago in reply to daby cheng

it depends can depend on a lot of things, so it is best to experiment a bit. It depends a lot on what you want to do with those messages on the syslog-ng side. Start up with something like 4-8 cores and 8GB of memory, then generate load while measuring load on the box.
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
daby cheng over 6 years ago

What is recommended VM specification for running 200,000 MPS with 5000 TCP sources?
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel