Building blocks of syslog-ng

18 Jul 2019

Recently I gave a syslog-ng introductory workshop at Pass the SALT conference in Lille, France. I got a lot of positive feedback, so I decided to turn all that feedback into a blog post. Naturally, I shortened and simplified it, but still managed to get enough material for multiple blog posts.

This one gives you an overview of syslog-ng, its major features and an introduction to its configuration.

What is logging & syslog-ng?

Let’s start from the very beginning. Logging is the recording of events on a computer. And what is syslog-ng? It’s an enhanced logging daemon with a focus on portability and high-performance central log collection. It was originally developed in C.

Why is central logging so important? There are three major reasons:

Ease of use: you have only one location to check for your log messages instead of many.
Availability: logs are available even when the sender machine is unreachable.
Security: logs are often deleted or modified once a computer is breached. Logs collected on the central syslog-ng server, on the other hand, can be used to reconstruct how the machine was compromised.

There are four major roles of syslog-ng: collecting, processing, filtering, and storing (or forwarding) log messages.

The first role is collecting, where syslog-ng can collect system and application logs together. These two can provide useful contextual information for either side. Many platform-specific log sources are supported (for example, collecting system logs from /dev/log, the Systemd Journal or Sun Streams). As a central log collector, syslog-ng supports both the legacy/BSD (RFC 3164) and the new (RFC 5424) syslog protocols over UDP, TCP and encrypted connections. It can also collect logs or any kinds of text data through files, sockets, pipes and even application output. The Python source serves as a Jolly Joker: you can implement an HTTP server (similar to Splunk HEC), fetch logs from Amazon Cloudwatch, and implement a Kafka source, to mention only a few possibilities..

The second role is processing, which covers many different possibilities. For example, syslog-ng can classify, normalize, and structure logs with built-in parsers. It can rewrite log messages ( we aren’t talking about falsifying log messages here, but anonimization as required by compliance regulations, for example). It can also enrich log messages using GeoIP, or create additional name-value pairs based on message content. You can use templates to reformat log messages, as required by a specific destination (for example, you can use the JSON template function with Elasticsearch). Using the Python parser, you can do any of the above, and even filtering.

The third role is filtering, which has two main uses. The first one is, discarding surplus log messages, like debug level messages, for example. The second one is message routing: making sure that a given set of logs reaches the right destination (for example, authentication-related messages reach the SIEM). There are many possibilities, as message routing can be based on message parameters or content, using many different filtering functions. Best of all: any of these can be combined using Boolean operators.

The fourth role is storage. Traditionally, syslog-ng stored log messages to flat files, or forwarded them to a central syslog-ng server using one of the syslog protocols and stored them there to flat files. Over the years, an SQL destination, then different big-data destinations (Hadoop, Kafka, Elasticsearch), message queuing (like AMQP or STOMP), different logging as a service providers, and many other features were added. Nowadays you can also write your own destinations in Python or Java.

Log messages

If you take a look at your /var/log directory, where log messages are normally stored on a Linux/UNIX system, you will see that most log messages have the following format: date + hostname + text. For example, observe this ssh login message:

Mar 11 13:37:56 linux-6965 sshd[4547]: Accepted keyboard-interactive/pam for root from 127.0.0.1 port 46048 ssh2

As you can see, the text part is an almost complete English sentence with some variable parts in it. It is pretty easy to read for a human. However, as each application produces different messages, it is quite difficult to create reports and alerts based on these messages.

There is a solution for this problem: structured logging. Instead of free-form text messages, in this case events are described using name-value pairs. For example, an ssh login can be described with the following name-value pairs:

app=sshd user=root source_ip=192.168.123.45

The good news is that syslog-ng was built around name-value pairs right from the beginning, as both advanced filtering and templates required syslog header data to be parsed and available as name-value pairs. Parsers in syslog-ng can turn unstructured, and even some structured data (CSV, JSON, etc.) into name-value pairs as well.

Configuration

Configuring syslog-ng is simple and logical, even if it does not look so at first sight. My initial advice: Don’t panic! The syslog-ng configuration has a pipeline model. There are many different building blocks (like sources, destinations, filters and others), and all of these can be connected in pipelines using log statements.

By default, syslog-ng usually looks for its configuration in /etc/syslog-ng/syslog-ng.conf (configurable at compile time). Here you can find a very simple syslog-ng configuration showing you all the mandatory (and even some optional) building blocks:

@version:3.21
@include "scl.conf"

# this is a comment :)

options {flush_lines (0); keep_hostname (yes);};

source s_sys { system(); internal();};
destination d_mesg { file("/var/log/messages"); };
filter f_default { level(info..emerg) and not (facility(mail)); };

log { source(s_sys); filter(f_default); destination(d_mesg); };

The configuration always starts with a version number declaration. It helps syslog-ng to figure out what your original intention with the configuration was and also warns you if there was an important change in syslog-ng internals.

You can include other configuration files from the main syslog-ng configuration. The one included here is an important one: it includes the syslog-ng configuration library. It will be discussed later in depth. For now, it is enough to know that many syslog-ng features are actually defined there, including the Elasticsearch destination.

You can place comments in your syslog-ng configuration, which helps structure the configuration and remind you about your decisions and workarounds when you need to modify the configuration later.

The use of global options helps you make your configuration shorter and easier to maintain. Most settings here can be overridden later in the configuration. For example flush_lines() defines how many messages are sent to a destination at the same time. A larger value adds latency, but better performance and lower resource usage as well. Zero is a safe choice of value for most logs on a low traffic server, as it writes all logs to disk as soon as they arrive. On the other hand, if you have a busy mail server on that host, you might want to override this value for the mail logs only. Then later, when your server becomes busy, you can easily raise the value for all of your logs.

The next three lines are the actual building blocks. Two of these are mandatory: the source and the destination (as you need to collect logs and store them somewhere). The filter is optional but useful and highly recommended.

A source is a named collection of source drivers. In this case, its name is s_sys, and it is using the system() and internal() sources. The first one collects from local, platform-specific log sources, while the second one collects messages generated by syslog-ng.
A destination is a named collection of destination drivers. In this case, its name is d_mesg, and it stores files into a flat file called /var/log/messages.
A filter is a named collection of filter functions. You can have a single filter function or a collection of filter functions connected using Boolean operators. Here we have a function for discarding debug level messages and another one for finding facility mail.

There are a few more building blocks (parsers, rewrites and others) not shown here. They will be introduced later.

Finally, there is a log statement connecting all these building blocks. Here you refer to the different building blocks by their names. Naturally, in a real configuration you will have several of these building blocks to refer to, not only one of each. Unless you are machine generating a complex configuration, you do not have to count the number of items in your configuration carefully.

SCL: syslog-ng configuration library

The syslog-ng configuration library (SCL) contains a number of ready-to-use configuration snippets. From the user’s point of view, they are no different from any other syslog-ng drivers. For example, the new elasticsearch-http() destination driver also originates from here.

Application Adapters are a set of parsers included in SCL that automatically try to parse any log messages arriving through the system() source. These parsers turn incoming log messages into a set of name-value pairs. The names for these name-value pairs, containing extra information, start with a dot to differentiate them from name-value pairs created by the user. For example, names for values parsed from sudo logs start with the .sudo. prefix.

This also means that unless you really know what you are doing, you should include the syslog-ng configuration library from your syslog-ng.conf. If you do not do that, many of the documented features of syslog-ng will stop working for you.

As you have already seen it in the sample configuration, you can enable SCL with the following line:

@include "scl.conf"

Networking

One of the most important features of syslog-ng is central log collection. You can use either the legacy or the new syslog protocols to collect logs centrally over the network. The machines sending the logs are called clients, while those on the receiving end are called servers. There is a lesser known, but at least equally, if not even more, important variant as well: the relays. On larger networks (or even smaller networks with multiple locations) relays are placed between clients and servers. This makes your logging infrastructure hierarchical with one or more levels of relays.

Whyuse relays? There are three major reasons:

you can collect UDP logs as close to the source as possible
you can distribute processing of log messages
you can secure your infrastructure: have a relay for each department or physical location, so logs can be sent from clients in real-time even if the central server is inaccessible

Macros & templates

As a syslog message arrives, syslog-ng automatically parses it. Most macros or name-value pairs are variables defined by syslog-ng based on the results of parsing. There are some macros that do not come from the parsing directly, for example the date and time a message was received (as opposed to the value stored in the message), or from enrichment, like GeoIP.

By default, messages are parsed as legacy syslog, but by using flags you can change this to new syslog (flags(syslog-protocol)) or you can even disable parsing completely (flags(no-parse)). In the latter case the whole incoming message is stored into the MESSAGE macro.

Name-value pairs or macros have many uses. One of these uses is in templates. By using templates you can change the format of how messages are stored, (for example, use ISODATE instead of the traditional date format):

template t_syslog {
    template("$ISODATE $HOST $MSG\n");
};
destination d_syslog {
    file("/var/log/syslog" template(t_syslog));
};

Another use is making file names variable. This way you can store logs coming from different hosts into different files or implement log rotation by storing files into directories and files based on the current year, month and day. An external script can delete files older than required to keep due to compliance or other reasons.

destination d_messages {
    file("/var/log/$R_YEAR/$R_MONTH/$HOST_$R_DAY.log" create_dirs(yes));
};

Filters & if/else statements

By using filters you can fine-tune which messages can reach a given destination. You can combine multiple filter functions using Boolean operators in a single filter, and you can use multiple filters in a log path. Filters are declared similarly to any other building blocks: you have to name them and then use one more filter function combined with Boolean operators inside the filter. Here is the relevant part of the example configuration from above:

filter f_default { level(info..emerg) and not (facility(mail)); };

The level() filter function lets all messages through, except for those from debug level. The second one selects all messages with facility mail. The two filter functions are connected with a not operator, so in the end all debug level and all facility mail messages are discarded by this filter.

There are many more filters. The match() filter operates on the message content and there are many more that operate on different values parsed from the message headers. From the security point of view, the inlist() filter might be interesting. This filter can compare a field with a list of values (for example, it can compare IP addresses extracted from firewall logs with a list of malware command & control IP addresses).

Conditional expressions in the log path make using the results of filtering easier. What is possible now by using simple if / else statements used to require complex configuration. You can use conditional expressions with similar blocks within the log path:

if (filter()) { do this }; else { do that };

It can be used, for example, to apply different parsers to different log messages or to save a subset of log messages to a separate destination.

Below you can find a simplified example, showing the log statement only:

log {
    source(s_sys);
    filter(f_sudo);
    if (match("czanik" value(".sudo.SUBJECT"))) {
        destination { file("/var/log/sudo_filtered"); };
    };
    destination(d_sudoall);
};

The log statement in the example above collects logs from a source called s_sys. The next filter, referred from the log path, keeps sudo logs only. Recent versions of syslog-ng automatically parse sudo messages. The if statement here uses the results of parsing, and writes any log messages where the user name (stored in the .sudo.SUBJECT name-value pair) equals to my user name to a separate file. Finally, all sudo logs are stored to a log file.

Parsing

Parsers of syslog-ng can structure, classify and normalize log messages. There are multiple advantages of parsing:

instead of the whole message, only the relevant parts are stored
more precise filtering (alerting)
more precise searches in (no)SQL databases

By default, syslog-ng treats the message part of logs as strings even if the message part contains structured data. You have to parse the message parts in order to turn them into name-value pairs. The advantages listed above can only be used once you have turned the message into name-value pairs by using the parsers of syslog-ng..

One of the earliest parsers of syslog-ng is the PatternDB parser. This parser can extract useful information from unstructured log messages into name-value pairs. It can also add status fields based on the message text and classify messages (like LogCheck). The downside of PatternDB is that you need to know your log messages in advance and describe them in an XML database. It takes time and effort, and while some example log messages do exist, for your most important log messages you most likely need to create the XML yourself.

For example, in case of an ssh login failure the name-value pairs created by PatternDB could be:

parsed directly from the message: app=sshd, user=root, source_ip=192.168.123.45
added, based on the message content: action=login, status=failure
classified as “violation” in the end.

JSON is becoming very popular recently, even for log messages. The JSON parser of syslog-ng can turn JSON logs into name-value pairs.

The CSV parser can turn any kind of columnar log messages into name-value pairs. A popular example was the Apache web server access log.

If you are into IT security, you will most likely use the key=value parser a lot, as iptables and most firewalls store their log messages in this format.

There are many more lesser known parsers in syslog-ng as well. You can parse XML logs, logs from the Linux Audit subsystem, and even custom date formats, by using templates.

SCL contains many parsers that combine multiple parsers into a single one to parse more complex log messages. There are parsers for Apache access logs that also parse the date from the logs. In addition, they can also interpret most Cisco logs resembling syslog messages.

Enriching messages

You can create additional name-value pairs based on the message content. PatternDB, already discussed among the parsers, can not only parse messages, but can also create name-value pairs based on the message content.

The GeoIP parser can help you find the geo-location of IP addresses. The new geoip2() parser can show you more than just the country or longitude/latitude information: it can display the continent, the county, and even the city as well, in multiple languages. It can help you spot anomalies or display locations on a map.

By using add-contextual-data(), you can enrich log messages from a CSV file. You can add, for example, host role or contact person information, based on the host name. This way you have to spend less time on finding extra information, and it can also help you create more accurate dashboards and alerts.

parser p_kv {kv-parser(prefix("kv.")); };

parser p_geoip2 { geoip2( "${kv.SRC}", prefix( "geoip2." ) database( "/usr/share/GeoIP/GeoLite2-City.mmdb" ) ); };

source s_tcp { tcp(port(514)); };

destination d_file {
  file("/var/log/fromnet" template("$(format-json --scope rfc5424
  --scope dot-nv-pairs --rekey .* --shift 1 --scope nv-pairs
  --exclude DATE --key ISODATE @timestamp=${ISODATE})\n\n") );
};

log {
  source(s_tcp);
  parser(p_kv);
  parser(p_geoip2);
  destination(d_file);
};

The configuration above collects log messages from a firewall using the legacy syslog protocol on a TCP port. The incoming logs are first parsed with a key=value parser (using a prefix to avoid colliding macro names). The geoip2() parser takes the source IP address as input (stored in kv.SRC) and stores location data under a different prefix. By default, logs written to disk do not include the extracted name-value pairs. This is why logs are written here to a file using the JSON template function, which writes all syslog-related macros and any extracted name-value pairs into the file. Name-initial dots are removed from names and date is used as expected by Elasticsearch. The only difference is that there are two line feeds at the end, to make the file easier to read.

If you have questions or comments related to syslog-ng, do not hesitate to contact us. You can reach us by email or you can even chat with us. For a long list of possibilities, check our contact page at https://syslog-ng.org/contact-us/. On Twitter I am available as @Pczanik.

ceceliafredrick4841 over 4 years ago

I think there are many possibilities, as message routing can be based on message parameters or content, using many different filtering functions.

- Lyka @ https://yorbalindaconcrete.com/
- Cancel
- Up 0 Down
- More
- Cancel