Welcome to the second part of my syslog-ng tutorial series. In this part, we cover some of the basic concepts behind syslog-ng. You can watch the video or read the text below.
What is syslog-ng?
Last time we defined syslog-ng as an enhanced logging daemon with a strong focus on portability and high-performance central log collection.
Let us pull this sentence apart, as all words are here for a reason. The original syslog implementation was pretty simple: it collected log messages from applications and sorted them to various files. Syslog-ng enhanced this with message parsing, advanced filtering and many more log sources and destinations. Daemon means that it is an application normally running continuously in the background. Portability means that syslog-ng runs not just on Linux, but also on various BSD and UNIX systems as well. High performance means that syslog-ng is implemented in C and thus it is fast and resource efficient. Depending on the configuration, even a Raspberry Pi can collect tens of thousands of log messages a second.
Why central logging?
Central log collection is one of the key concepts behind syslog-ng. Instead of (or next to) saving log messages locally, syslog-ng sends log messages to a central syslog-ng server. Many organizations/sysadmins are reluctant to implement central logging. Why should they implement it? There are three key reasons:
Ease of use is probably the main reason why organizations implement central logging. Instead of having to log in to each individual server or workstation to find details about an event, there is a single place to check log messages. A single grep command can search in logs received from hundreds of hosts, instead of having to log in to each of them individually.
Central logging also means availability. Even if the sender host is down, you can check its log messages. You do not have to start repairing the machine to learn why it crashed, it is enough to check the centrally collected log messages.
Finally, central logging also means security. When a host is compromised, one of the first things on the to-do list of the attacker is removing or altering log messages. If logs are only available on the compromised host, this can sidetrack an investigation or make it completely impossible. With central logging you have much better chances to figure out how a host was compromised.
The four major roles of syslog-ng
Syslog-ng has four major roles. It collects log messages, processes them, filters them, and finally it stores them either locally or to a remote destination.
Role: data collector
The first role of syslog-ng is collecting data. Syslog-ng can collect system and application logs together. System and application logs can provide quite useful contextual data when trying to understand either side of the logs.
With a focus on portability, syslog-ng can collect log messages from a wide variety of platform-specific log sources. This includes /dev/log & Co., Sun streams, or Journal on systemd Linux systems.
As a central log collector, syslog-ng can receive log messages through the network using RFC3164, which is often called the legacy or BSD syslog protocol, and also using RFC5424, which is often referred to as the new syslog. It can use UDP, TCP and TLS encrypted connections.
Syslog-ng can also collect logs, or any kind of text data from applications using a wide variety of sources like files, sockets, pipes, and even application output.
If none of the built-in possibilities suit your need, you can use Python to create your own log sources. It is not as fast and efficient as C code, but you can easily implement for example an HTTP or Kafka source for syslog-ng.
The next role of syslog-ng is processing data. One of the major new features of syslog-ng 3 was message parsing. Syslog-ng can classify, normalize, and structure log messages using built-in parsers. The PatternDB message parser can find important data in unstructured, free-form log messages. There are also parsers for various structured log messages, like the CSV-parser for tabular data, the JSON parser, or the key=value parser, which is typically used on firewall logs.
You can also rewrite log messages. Here, I’m not referring to falsifying logs, but for example anonymization, which is often required by various compliance regulations.
You can also enrich log messages. One way is to use the GeoIP parser, which can add geo-location to log messages based on IP addresses. You can also create additional name-value pairs based on message content.
Using templates, syslog-ng can reformat log messages as required by various destinations. Some log analytics software needs more precise time stamps, others need JSON formatting, and so on. Templates allow you to comply with a wide variety of formatting requirements.
Recent versions of syslog-ng also include a Python parser. You can implement any of the previously mentioned features in Python. You can also use Python to enrich log messages from databases, and you can also use it for filtering.
This leads us to our next role: filtering.
Role: data filtering
Data filtering has two main uses. Most people only know that filtering can be used to discard surplus log messages. For example, debug-level messages are rarely used and can take up considerable disk space, so they are often filtered out.
An equally important use of filtering is message routing, making sure that the right messages reach the right destinations. For example, making sure that all authentication-related messages reach the SIEM system.
There are many different ways in which you can filter log messages using syslog-ng. Filtering can be based on message content or various message parameters. You can use comparisons, wildcards, regular expressions and various filtering functions. And the best of all is that you can combine any of these using Boolean operators.
Traditionally, log messages were saved to text files either locally or on a remote syslog server. Support for SQL was added first as an alternative, which was followed by many other possibilities. Today you can store log messages to various SIEM and log analytics systems, Hadoop, various NoSQL databases, like MongoDB or Elasticsearch, cloud services, like Sumo Logic or Slack, and also some message queuing systems, like Kafka. You can write your own destination using Java and Python.
Free-form log messages
When you look around in the /var/log directory, you will see that most log messages have a similar format, like this SSH login message. A date, a hostname, and some text.
Mar 11 13:37:56 linux-6965 sshd: Accepted keyboard-interactive/pam for root from 127.0.0.1 port 46048 ssh2
In many cases, the text part is an almost complete English sentence with some variable parts. These messages are human-readable texts. This format was pretty useful, when log messages were mostly read by humans. A few large machines administered by a large staff. However, a few decades later, both the number of hosts and log messages increased considerably. The amount of log messages could not be followed by humans anymore. Unfortunately, these kinds of messages are difficult to interpret by machines and thus difficult to create alerts or reports on.
Solution: structured logging
Luckily, there is a solution for this problem. Instead of freeform text, events can also be represented as name-value pairs. For example, you can describe an SSH login with an application name, a username and a source IP address:
app=sshd user=root source_ip=192.168.123.45
The good news is that syslog-ng was built with name-value pairs in mind, right from the start. By default, all incoming log messages are parsed by syslog-ng, and name-value pairs are created for date, facility, program name, and other parameters.
Various parsers in syslog-ng can turn unstructured and some of the structured log messages into name-value pairs, like CSV or JSON.
Name-value pairs allow for more useful alerting and reporting. They are also very useful when storing log messages into NoSQL databases or other services.
Which is the most widely used syslog-ng version?
I end the second part of my syslog-ng tutorial with a tricky question. Which do you think is the most widely used syslog-ng version?
Let me give you some hints:
The syslog-ng project started in 1998.
RHEL 8, the most popular platform for syslog-ng servers, has syslog-ng version 3.23 in EPEL 8.
The latest stable version right now is 3.38.
You can answer my question in a comment on the blog or on YouTube or on Twitter / Mastodon.
If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or even chat with us. For a list of possibilities, check our GitHub page under the “Community” section at https://github.com/syslog-ng/syslog-ng. On Twitter, I am available as @PCzanik, on Mastodon as @Pczanik@fosstodon.org.