Client-side failover and failback using syslog-ng

21 Aug 2018

Client-side failover and failback using syslog-ng

When you have multiple syslog servers collecting logs, syslog-ng on the client side can fail over to secondary servers if the primary one becomes unavailable. It can also fail back to the primary server soon after it is back on-line – if configured so. Using client-side failover can lower the risk of message loss and makes the maintenance of servers easier.

Basic failover was introduced in syslog-ng version 3.15, while failback arrived with version 3.17.

In this blog, I focus on failback mode. We’re going to look at the main difference between failover and failback, and then I’m going to show you how to set up your environment to enjoy the benefits of failback mode.

Modes of operation

Depending on your needs, you can configure failover in two different ways. Either way you choose, you configure a primary server and one or more failover servers. When syslog-ng starts up, logs start flowing to the primary server.

round-robin mode: If the primary server becomes inaccessible, logs start to flow to the first failover server, then to the next one, and the next one after, until there are no other failover servers left. Then, logging is attempted again to the primary server.
This method is good if it does not matter which server receives the logs. For example, when any of the central servers simply dumps logs to a Hadoop data lake.

failback mode: If the primary server becomes inaccessible, logs start to flow to the first failover server, then to the next one, and so on, just as in round-robin mode. However, there is a difference: in the background, syslog-ng starts checking the availability of the primary server. If it is back on-line, logging goes back to the primary server.
This is the recommended method if it is important that your logs are collected by the same server when that is available. For example, when logs from a group of clients are correlated by the central syslog-ng server. Failover is good for avoiding message loss, but in certain scenarios, you want to fail back to your primary server as soon as possible.

Configuration

The first step is to configure failover servers. When failback is not configured, the syslog-ng client is changing servers in a round-robin fashion. In this example, we use the legacy syslog protocol over a TCP connection. As UDP does not maintain a connection, you cannot use it for failover because the client has no way of detecting whether or not the server is available.

destination d_failover {
      network("172.16.167.132"
            failover(
                  servers("172.16.167.133", "172.16.167.134")
            )
      transport("tcp")
      port(514)
      );
};

The next step is to configure failback. Using the tcp-probe-interval() and successful-probes-required() parameters, you can influence how quickly syslog-ng changes back to the primary server, once it becomes available.

destination d_failback {
      network("172.16.167.132"
            failover(
                  servers("172.16.167.133", "172.16.167.134")
                  failback(
                        tcp-probe-interval(60)
                        successful-probes-required(3)
                  )
            )
      transport("tcp")
      port(514)
      );
};

This already works pretty well, but depending on the amount of log messages your syslog-ng client (or relay) sends, you can still run into message loss. There are two methods for lessening the chance of message loss: using the disk-buffer and enabling flow-control:

destination d_failback {
      network("172.16.167.132"
            failover(
                  servers("172.16.167.133", "172.16.167.134")
                  failback(
                        tcp-probe-interval(60)
                        successful-probes-required(3)
                  )
            )
      transport("tcp")
      port(514)
      disk-buffer(
        mem-buf-length(10000)
        disk-buf-size(2000000)
        reliable(no)
        dir("/tmp/disk-buffer")
      )
      );
};
log { source(src); destination(d_failback); flags(flow-control); };

The use of disk-buffer makes sure that messages are queued to disk while syslog-ng is failing over to the next server. The use of flow-control slows down receiving messages on the source side when the destination is unavailable – as long as it is not an UDP source, where syslog-ng cannot influence the speed.

Testing

If you want to check how failback works using the above configuration, you need to change a few things:

Most importantly, the IP addresses used in the example will be different as per your environment.

You might also need to change the port number if you use something different on your servers.

Also, you need to point the dir() option of the disk-buffer to an existing directory, so either create the one included in the example or change the parameter.

Finally, the “src” source most likely does not exist in your configuration: change it to the source collecting your local log messages.

You also need syslog(-ng) servers, one primary server and one or more failover servers. The following simple configuration collects syslog messages on port 514 using the legacy syslog protocol and saves the results in /var/log/file. On most Linux distributions, you can save this configuration under the /etc/syslog-ng/conf.d directory using the .conf extension.

source s_net {
        network(
            ip("0.0.0.0")
            port("514")
            transport(tcp)
            ip-protocol(4)
        );
};
destination d_file {
  file("/var/log/file");
};
log { source(s_net); destination(d_file); };

You can use logger or loggen on the client machine to send logs regularly. By default, you should see the logs arrive on the primary server in /var/log/file. If you stop syslog-ng on the primary server, logs should arrive on the failover server after a short while. Once you start up syslog-ng on the primary server, you should see logs arriving there in 3 to 4 minutes.

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or you can even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/balabit/syslog-ng. On Twitter, I am available as @PCzanik.